GitHub Data Dictionary
Contains explanations and examples of all data fields available in the GitHub Users dataset.
All personal/company information mentioned within this context is entirely fictional and is solely intended for illustrative purposes.
Data points in the example snippets are rearranged for better grouping. To see where a specific data point stands, check the full data sample below:
Data point | Description | Data type |
meta | Contains information about the record | Object |
created_at_date | The date when we first scraped the record | Array of numbers (integers) |
created_at_timestamp | The date we first scraped the record (Unix time) | Float |
updated_at_date | The date when we last scraped the record | Array of numbers (integers) |
updated_at_timestamp | The date when we last scraped the record (Unix time) | Float |
version_id | Dataset version ID | String |
source | The record source | String |
object | The data object/entity | String |
is_deleted | Marks if the user is available on GitHub | Boolean |
See a snippet of the dataset for reference:
Data point | Description | Data type |
doc | Dataset starting point | Object |
source_id | User profile identifier on GitHub | String |
id | Record identifier in our database | String |
site_admin | Indicates if the user is the site admin | Boolean |
type | Repository owner entity type | String |
See snippets of the dataset for reference:
Data point | Description | Data type |
events_url | GitHub REST API response | String |
node_id | Identification key assigned by GitHub REST API | String |
See snippets of the dataset for reference:
Data point | Description | Data type |
image | Developer's avatar/logo | String |
bio | Developer's bio Note: contains control characters | String |
url | Developer's GitHub profile URL | String |
location | Developer's location | String |
username | Developer's username | String |
name | Developer's name Note: not necessarily the same as the username | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
contact_info | Publicly accessible contact information | Object |
blog | Developer's blog | String |
Developer's Twitter handle | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
company | Company affiliated with the developer | string |
hireable | Marks if the developer is hireable | Boolean/null |
See snippets of the dataset for reference:
Data point | Description | Data type |
organization | Organizations the developer is connected to | Array of objects |
description | Organization description Note: may contain control characters | String |
source_id | Organization identification key on GitHub | String |
username | Organization name | String |
node_id | Organization identifier assigned by GitHub REST API | String |
url | Information on the organization returned by the GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
follower_count | Developer's follower count | Integer |
following_count | Number of people the developer follows | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
followed_by | Users who are following the (record) developer | Array of objects |
username | Follower's username | String |
source_id | User identifier on GitHub | Integer |
url | Follower's GitHub profile URL | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
is_following | Users the developer follows | Array of objects |
username | Followee's username | String |
source_id | User identifier on GitHub | Integer |
url | Followee's GitHub profile | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
public_gist_count | Number of public developer's gists | Integer |
public_repo_count | Number of public developer's repos | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo | Developer's public repositories | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Shows if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String (date) |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Original repository the copy has been made from | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | Main programming language in the repository | String |
languages_distribution | Languages and their distribution by percentage in the repository | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
starred | Repositories the developer starred | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Shows if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Original repository the copy has been made from | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String (date) |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | Main programming language in the repository | String |
languages_distribution | Languages and their distribution in the repository by percentage | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference:
Data type | Description | Data type |
subscription | Repositories the developer is subscribed to | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Shows if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String (date) |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Original repository the copy has been made from | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String (date) |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
language | Main programming language in the repository | String |
languages_distribution | Languages and their distribution in the repository by percentage | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference: