GitHub Data Dictionary
Contains explanations and examples for all data fields available in the GitHub Users dataset. All personal/company information mentioned in this data dictionary is fictional and is solely intended for illustrative purposes.
Data points in the example snippets are rearranged for better grouping. To see where a specific data point stands, check the full data sample below:
Data point | Description | Data type |
meta | Contains information about the record | |
source | The record source | string |
object | The data object/entity | string |
created_at_date | The date when we first scraped the record | array of numbers |
created_at_timestamp | The date we first scraped the record (Unix time) | number |
updated_at_date | The date when we last scraped the record | array of numbers |
version_id | Dataset version ID | string |
updated_at_timestamp | The date when we last scraped the record (Unix time) | number |
See a snippet of the dataset for reference:
A null value means that the information was not available on GitHub.
Data point | Description | Data type |
doc | Start of the dataset: contains the first set of information points about the company | object |
source_id | Unique identifier of the record on GitHub | string |
id | Unique identifier of GitHub record in our database | string |
site_admin | Marks if the user is the site admin | boolean |
type | Marks the entity type (repository owner) | string |
See snippets of the dataset for reference:
Data point | Description | Data type |
events_url | GitHub REST API response | string |
node_id | ID assigned to objects by GitHub REST API | string |
See snippets of the dataset for reference:
Data point | Description | Data type |
image | Developer's avatar/logo | string |
bio | Developer's bio Note: contains control characters | string |
url | Developer's GitHub profile | string |
location | Developer's location | string |
See snippets of the dataset for reference:
Data point | Description | Data type |
username | Developer's username | string |
name | Developer's name Note: not necessarily the same as the username | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
contact_info | Contains the developer's publicly accessible contact information | object |
blog | Developer's blog | string |
Developer's Twitter handle | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
company | Company the user has listed on their profile | string |
hireable | Marks if the developer is hireable Note: Users select the option in their settings. Information can be retrieved by using the GitHub REST API. | - |
See snippets of the dataset for reference:
Data point | Description | Data type |
follower_count | Developer's follower count | number |
following_count | The number of people the developer follows | number |
See a snippet of the dataset for reference:
Data point | Description | Data type |
public_gist_count | The number of gists by the developer | number |
public_repo_count | The number of repositories owned by the developer | number |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo | Contains information on the developer's repositories | array of objects |
disabled | Marks if the repository was disabled when we last scraped it | boolean |
archived | Shows if the repository is archived and no longer accessible | boolean |
created_at | Time and date when the repository was created | string |
default_branch | Title of the repository default branch | string |
description | Repository description Note: may contain control characters | string |
fork | Marks if the repository in a record is a copy of another repository | boolean |
fork_count | The number of repository copies | number |
forked_from | The original repository the copy has been made from | string |
has_downloads | Shows if other users have downloaded the repository | boolean |
has_issues | Marks if the repository has the issues section enabled | boolean |
has_pages | Marks if the repository has the pages section enabled | boolean |
has_projects | Marks if the repository has the projects section enabled | boolean |
has_wiki | Shows if the repository has a wiki included | boolean |
website | Project website | string |
url | Repository GitHub page | string |
source_id | Unique identifier of the record on GitHub | number |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | The number of open issues in the repository | number |
pushed_at | Time and date the repository was published | string |
size | Repository size in MB | number |
stargazer_count | The number of people who have starred the repository | number |
updated_at | Time and date the repository was last updated | string |
watcher_count | The number of people who are following the repository updates | number |
topics | Topics covered in the repository | array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | The main programming language in the repository | string |
languages_distribution | Languages and their distribution in the repository by percentage | object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo_name | Repository title | string |
repo_owner | Repository owner's username | string |
name | Name of the data entity in the record (repository) | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Contains the information on the open-source licenses the repository uses | object |
key | Part of the Github URL identifying license | string |
name | License name | string |
spdx_id | Spdx license ID | string |
url | URL redirecting to Github info on licensing | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Contains information on the repository developer | object |
image | Developer's logo/avatar | string |
url | Developer's profile | string |
source_id | Unique identifier of the record on GitHub | number |
username | Developer's username | string |
node_id | ID assigned to objects by GitHub REST API | string |
site_admin | Marks if the user is the site admin | boolean |
type | Marks the entity type (repository owner) | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
starred | Contains information on the repositories the developer starred | array of objects |
disabled | Marks if the repository was disabled when we last scraped it | boolean |
archived | Shows if the repository is archived and no longer accessible | boolean |
created_at | Time and date when the repository was created | string |
default_branch | Title of the repository default branch | string |
description | Repository description Note: may contain control characters | string |
fork | Marks if the repository in a record is a copy of another repository | boolean |
fork_count | The number of repository copies | number |
forked_from | The original repository the copy has been made from | string |
has_downloads | Shows if other users have downloaded the repository | boolean |
has_issues | Marks if the repository has the issues section enabled | boolean |
has_pages | Marks if the repository has the pages section enabled | boolean |
has_projects | Marks if the repository has the projects section enabled | boolean |
has_wiki | Shows if the repository has a wiki included | boolean |
website | Project website | string |
url | Repository GitHub page | string |
source_id | Unique identifier of the record on GitHub | number |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | The number of open issues in the repository | number |
pushed_at | Time and date the repository was published | string |
size | Repository size in MB | number |
stargazer_count | The number of people who have starred the repository | number |
updated_at | Time and date the repository was last updated | string |
watcher_count | The number of people who are following the repository updates | number |
topics | Topics covered in the repository | array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | The main programming language in the repository | string |
languages_distribution | Languages and their distribution in the repository by percentage | object |
See a snippet of the dataset for reference:
Data type | Description | Data type |
repo_name | Repository title | string |
repo_owner | Repository owner's username | string |
name | Name of the data entity in the record (repository) | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Contains the information on the open-source licenses the repository uses | object |
key | Part of the Github URL identifying license | string |
name | License name | string |
spdx_id | Spdx license ID | string |
url | URL redirecting to Github info on licensing | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Contains information on the developer of the starred repository | object |
image | Developer's logo/avatar | string |
url | Developer's profile | string |
source_id | Unique identifier of the record on GitHub | number |
username | Developer's username | string |
node_id | ID assigned to objects by GitHub REST API | string |
site_admin | Marks if the user is the site admin | boolean |
type | Shows the entity type (repository owner) | string |
See a snippet of the dataset for reference:
Data type | Description | Data type |
subscription | Contains information on the repositories the developer subscribes to | array of objects |
disabled | Marks if the repository was disabled when we last scraped it | boolean |
archived | Shows if the repository is archived and no longer accessible | boolean |
created_at | Time and date when the repository was created | string |
default_branch | Title of the repository default branch | string |
description | Repository description Note: may contain control characters | string |
fork | Marks if the repository in a record is a copy of another repository | boolean |
fork_count | The number of repository copies | number |
forked_from | The original repository the copy has been made from | string |
has_downloads | Shows if other users have downloaded the repository | boolean |
has_issues | Marks if the repository has the issues section enabled | boolean |
has_pages | Marks if the repository has the pages section enabled | boolean |
has_projects | Marks if the repository has the projects section enabled | boolean |
has_wiki | Shows if the repository has a wiki included | boolean |
website | Project website | string |
url | Repository GitHub page | string |
source_id | Unique identifier of the record on GitHub | number |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | The number of open issues in the repository | number |
pushed_at | Time and date the repository was published | string |
size | Repository size in MB | number |
stargazer_count | The number of people who have starred the repository | number |
updated_at | Time and date the repository was last updated | string |
watcher_count | The number of people who are following the repository updates | number |
topics | Topics covered in the repository | array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | The main programming language in the repository | string |
languages_distribution | Contains languages and their distribution in the repository by percentage | object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo_name | Repository title | string |
repo_owner | Repository owner's username | string |
name | Name of the data entity in the record (repository) | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Contains the information on the open-source licenses the repository uses | object |
key | Part of the Github URL identifying license | string |
name | License name | string |
spdx_id | Spdx license ID | string |
url | URL redirecting to Github info on licensing | string |
node_id | ID assigned to objects by GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Contains information on the developer of the subscribed repository | object |
image | Developer's logo/avatar | string |
url | Developer's profile | string |
source_id | Unique identifier of the record on GitHub | number |
username | Developer's username | string |
node_id | IDs assigned to objects while scraping in the GIT API | string |
site_admin | ID assigned to objects by GitHub REST API | boolean |
type | Marks the entity type (repository owner) | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
organization | Contains information on the organizations the developer is connected to | array of objects |
description | Organization description Note: may contain control characters | string |
source_id | Unique identifier of the record on GitHub | string |
username | Organization name | string |
node_id | IDs assigned to objects while scraping in the GitHub REST API | string |
url | Information on the organization returned by the GitHub REST API | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
followed_by | Contains information on people who are following the developer | array of objects |
username | Follower's username | string |
source_id | Unique identifier of the record on GitHub | number |
url | Follower's GitHub profile | string |
See a snippet of the dataset for reference:
Data point | Description | Data type |
is_following | Contains information on the people the developer follows | array of objects |
username | Followee's username | string |
source_id | Unique identifier of the record on GitHub | number |
url | Followee's GitHub profile | string |
See a snippet of the dataset for reference: