GitHub Data Dictionary
Contains explanations and examples for all the data fields available in the GitHub dataset.
Data points in the example snippets are rearranged for better grouping. To see where a specific data point stands, check the full data sample below:
Data point | Description | Data type | Example values |
meta | Contains information about the record | | |
source | The record source | string | github |
object | The data object/entity | string | user |
created_at_date | The date when we first scraped the record | array of numbers | 2021, 9, 13 |
created_at_timestamp | The date we first scraped the record (Unix time) | number | 1631498987.348509 |
updated_at_date | The date when we last scraped the record | array of numbers | 2022, 12, 2 |
version_id | Dataset version ID | string | 1669994154.952415 |
updated_at_timestamp | The date when we last scraped the record (Unix time) | number | e0f2c272 |
See a snippet of the dataset for reference:
Null value means that the information was not available on GitHub.
Data point | Description | Data type | Example values |
doc | Start of the dataset: contains the first set of information points about the company | object | |
source_id | Unique identifier of the record on GitHub | string | 5b24276186ed43b1aaad5624bac02cd9 |
id | Unique identifier of GitHub record in our database | string | github_people_7081362 |
site_admin | Marks if the user is the site admin | boolean | false |
type | Marks the entity type (repository owner) | string | User |
See snippets of the dataset for reference:
Data point | Description | Data type | Example values |
events_url | GitHub REST API response | string | https://api.github.com/users/ngav/events{/privacy} |
node_id | ID assigned to objects by GitHub REST API | string | MDQ6VXNlcjEwMDI5MDY5 |
See snippets of the dataset for reference:
Data point | Description | Data type | Example values |
image | Developer's avatar/logo | string | https://avatars.githubusercontent.com/u/. . . |
bio | Developer's bio Note: contains control characters | string | I'm just a random dude. Don't mind me.\r\n\r\nDeveloper at imec PreDiCT |
url | Developer's GitHub profile | string | https://github.com/john-doe |
location | Developer's location | string | indonesia |
See snippets of the dataset for reference:
Data point | Description | Data type | Example values |
username | Developer's username | string | john-doe |
name | Developer's name Note: not necessarily the same as the username | string | john-doe |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
contact_info | Contains the developer's publicly accessible contact information | object | |
blog | Developer's blog | string | https://john-doe.be |
Developer's Twitter handle | string | john-doe |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
company | Company the user has listed on their profile | string | SJTU |
hireable | Marks if the developer is hireable Note: Users select the option in their settings. Information can be retrieved by using the GitHub REST API. | - | null / true |
See snippets of the dataset for reference:
Data point | Description | Data type | Example values |
follower_count | Developer's follower count | number | 14 |
following_count | The number of people the developer follows | number | 28 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
public_gist_count | The number of gists by the developer | number | 0 |
public_repo_count | The number of repositories owned by the developer | number | 2 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
repo | Contains information on the developer's repositories | array of objects | |
disabled | Marks if the repository was disabled when we last scraped it | boolean | false |
archived | Shows if the repository is archived and no longer accessible | boolean | false |
created_at | Time and date when the repository was created | string | 2022-10-14T10:06:59Z |
default_branch | Title of the repository default branch | string | main |
description | Repository description Note: may contain control characters | string | null |
fork | Marks if the repository in a record is a copy of another repository | boolean | false |
fork_count | The number of repository copies | number | 0 |
forked_from | The original repository the copy has been made from | string | null |
has_downloads | Shows if other users have downloaded the repository | boolean | true |
has_issues | Marks if the repository has the issues section enabled | boolean | true |
has_pages | Marks if the repository has the pages section enabled | boolean | false |
has_projects | Marks if the repository has the projects section enabled | boolean | true |
has_wiki | Shows if the repository has a wiki included | boolean | true |
website | Project website | string | null |
url | Repository GitHub page | string | https://github.com/john-doe/software |
source_id | Unique identifier of the record on GitHub | number | 551391535 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
open_issues_count | The number of open issues in the repository | number | 47 |
pushed_at | Time and date the repository was published | string | 2022-11-01T18:21:42Z |
size | Repository size in MB | number | 15938 |
stargazer_count | The number of people who have starred the repository | number | 7249 |
updated_at | Time and date the repository was last updated | string | 2021-05-18T03:32:01Z |
watcher_count | The number of people who are following the repository updates | number | 7249 |
topics | Topics covered in the repository | array of strings | v2-ui |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
language | The main programming language in the repository | string | JavaScript |
languages_distribution | Languages and their distribution in the repository by percentage | object | JavaScript: 58.2 Vue: 37.9 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
repo_name | Repository title | string | software |
repo_owner | Repository owner's username | string | dev |
name | Name of the data entity in the record (repository) | string | software |
node_id | ID assigned to objects by GitHub REST API | string | R_kgDOIN2RLw |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
license | Contains the information on the open-source licenses the repository uses | object | |
key | Part of the Github URL identifying license | string | mit |
name | License name | string | MIT License |
spdx_id | Spdx license ID | string | MIT |
url | URL redirecting to Github info on licensing | string | https://api.github.com/licenses/mit |
node_id | ID assigned to objects by GitHub REST API | string | MDc6TGljZW5zZTEz |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
owner | Contains information on the repository developer | object | |
image | Developer's logo/avatar | string | https://avatars.githubusercontent.com/u/. . . |
url | Developer's profile | string | https://github.com/john-doe |
source_id | Unique identifier of the record on GitHub | number | 47310637 |
username | Developer's username | string | dev |
node_id | ID assigned to objects by GitHub REST API | string | MDQ6VXNlcjQ3MzEwNjM3 |
site_admin | Marks if the user is the site admin | boolean | false |
type | Marks the entity type (repository owner) | string | User |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
starred | Contains information on the repositories the developer starred | array of objects | |
disabled | Marks if the repository was disabled when we last scraped it | boolean | false |
archived | Shows if the repository is archived and no longer accessible | boolean | false |
created_at | Time and date when the repository was created | string | 2022-10-14T10:06:59Z |
default_branch | Title of the repository default branch | string | master |
description | Repository description Note: may contain control characters | string | null |
fork | Marks if the repository in a record is a copy of another repository | boolean | false |
fork_count | The number of repository copies | number | 362 |
forked_from | The original repository the copy has been made from | string | null |
has_downloads | Shows if other users have downloaded the repository | boolean | true |
has_issues | Marks if the repository has the issues section enabled | boolean | true |
has_pages | Marks if the repository has the pages section enabled | boolean | false |
has_projects | Marks if the repository has the projects section enabled | boolean | true |
has_wiki | Shows if the repository has a wiki included | boolean | true |
website | Project website | string | null |
url | Repository GitHub page | string | https://github.com/john-doe/software |
source_id | Unique identifier of the record on GitHub | number | 551391535 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
open_issues_count | The number of open issues in the repository | number | 47 |
pushed_at | Time and date the repository was published | string | 2022-02-09T22:20:12Z |
size | Repository size in MB | number | 292 |
stargazer_count | The number of people who have starred the repository | number | 321 |
updated_at | Time and date the repository was last updated | string | 2021-01-19T18:26:34Z |
watcher_count | The number of people who are following the repository updates | number | 3217 |
topics | Topics covered in the repository | array of strings | android |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
language | The main programming language in the repository | string | Python |
languages_distribution | Languages and their distribution in the repository by percentage | object | Python: 95.3 |
See a snippet of the dataset for reference:
Data type | Description | Data type | Example values |
repo_name | Repository title | string | dev-software |
repo_owner | Repository owner's username | string | dev |
name | Name of the data entity in the record (repository) | string | dev-software |
node_id | ID assigned to objects by GitHub REST API | string | MDEwOlJlcG9zaXRvcnkzNDIzNDM4NTE= |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
license | Contains the information on the open-source licenses the repository uses | object | |
key | Part of the Github URL identifying license | string | gpl-3.0 |
name | License name | string | GNU General Public License v3.0 |
spdx_id | Spdx license ID | string | GPL-3.0 |
url | URL redirecting to Github info on licensing | string | https://api.github.com/licenses/gpl-3.0 |
node_id | ID assigned to objects by GitHub REST API | string | MDc6TGljZW5zZTk= |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
owner | Contains information on the developer of the starred repository | object | |
image | Developer's logo/avatar | string | https://avatars.githubusercontent.com/u/. . . |
url | Developer's profile | string | https://github.com/john-noakes |
source_id | Unique identifier of the record on GitHub | number | 8597527 |
username | Developer's username | string | john noakes |
node_id | ID assigned to objects by GitHub REST API | string | MDEyOk9yZ2FuaXphdGlvbjg1OTc1Mjc |
site_admin | Marks if the user is the site admin | boolean | false |
type | Shows the entity type (repository owner) | string | Organization |
See a snippet of the dataset for reference:
Data type | Description | Data type | Example values |
subscription | Contains information on the repositories the developer subscribes to | array of objects | |
disabled | Marks if the repository was disabled when we last scraped it | boolean | false |
archived | Shows if the repository is archived and no longer accessible | boolean | false |
created_at | Time and date when the repository was created | string | 2021-02-25T18:40:15Z |
default_branch | Title of the repository default branch | string | master |
description | Repository description Note: may contain control characters | string | null |
fork | Marks if the repository in a record is a copy of another repository | boolean | false |
fork_count | The number of repository copies | number | 0 |
forked_from | The original repository the copy has been made from | string | null |
has_downloads | Shows if other users have downloaded the repository | boolean | true |
has_issues | Marks if the repository has the issues section enabled | boolean | true |
has_pages | Marks if the repository has the pages section enabled | boolean | true |
has_projects | Marks if the repository has the projects section enabled | boolean | true |
has_wiki | Shows if the repository has a wiki included | boolean | true |
website | Project website | string | null |
url | Repository GitHub page | string | https://github.com/john-stiles/software |
source_id | Unique identifier of the record on GitHub | number | 342343851 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
open_issues_count | The number of open issues in the repository | number | 0 |
pushed_at | Time and date the repository was published | string | 2021-03-17T14:55:44Z |
size | Repository size in MB | number | 388554 |
stargazer_count | The number of people who have starred the repository | number | 0 |
updated_at | Time and date the repository was last updated | string | 2021-03-17T14:55:02Z |
watcher_count | The number of people who are following the repository updates | number | 0 |
topics | Topics covered in the repository | array of strings | public-api |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
language | The main programming language in the repository | string | JavaScript |
languages_distribution | Contains languages and their distribution in the repository by percentage | object | JavaScript: 99.9 HTML: 0.1 CSS: 0.0 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
repo_name | Repository title | string | python.dev.repo |
repo_owner | Repository owner's username | string | Python dev |
name | Name of the data entity in the record (repository) | string | python.dev.repo |
node_id | ID assigned to objects by GitHub REST API | string | MDEwOlJlcG9zaXRvcnkzNDg3NDg2MzA= |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
license | Contains the information on the open-source licenses the repository uses | object | |
key | Part of the Github URL identifying license | string | ms-pl |
name | License name | string | Microsoft Public License |
spdx_id | Spdx license ID | string | MS-PL |
url | URL redirecting to Github info on licensing | string | https://api.github.com/licenses/ms-pl |
node_id | ID assigned to objects by GitHub REST API | string | MDc6TGljZW5zZTE5 |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
owner | Contains information on the developer of the subscribed repository | object | |
image | Developer's logo/avatar | string | https://avatars.githubusercontent.com/u/. . . |
url | Developer's profile | string | https://github.com/java-developer |
source_id | Unique identifier of the record on GitHub | number | 79530557 |
username | Developer's username | string | java developer |
node_id | IDs assigned to objects while scraping in the GIT API | string | MDQ6VXNlcjc5NTMwNTU3 |
site_admin | ID assigned to objects by GitHub REST API | boolean | false |
type | Marks the entity type (repository owner) | string | User |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
organization | Contains information on the organizations the developer is connected to | array of objects | |
description | Organization description Note: may contain control characters | string | null |
source_id | Unique identifier of the record on GitHub | string | 70442962 |
username | Organization name | string | dev-org |
node_id | IDs assigned to objects while scraping in the GitHub REST API | string | MDEyOk9yZ2FuaXphdGlvbjcwNDQyOTYy |
url | Information on the organization returned by the GitHub REST API | string | https://api.github.com/orgs/dev-org |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
followed_by | Contains information on people who are following the developer | array of objects | |
username | Follower's username | string | follower-dev |
source_id | Unique identifier of the record on GitHub | number | 6514464 |
url | Follower's GitHub profile | string | https://github.com/folower-dev |
See a snippet of the dataset for reference:
Data point | Description | Data type | Example values |
is_following | Contains information on the people the developer follows | array of objects | |
username | Followee's username | string | following-dev |
source_id | Unique identifier of the record on GitHub | number | 163421 |
url | Followee's GitHub profile | string | https://github.com/following-dev |
See a snippet of the dataset for reference: