Data Dictionary
Contains explanations and examples of all data fields available in the GitHub Users dataset.
All personal/company information mentioned within this context is entirely fictional and is solely intended for illustrative purposes.
The data points in the example snippets have been rearranged for better grouping. To see where a specific data point stands, check the full data sample here.
Data point | Description | Data type |
---|---|---|
meta | Contains information about the record | Object |
created_at_date | The date when we first scraped the record | Array of numbers (integers) |
created_at_timestamp | The date we first scraped the record (Unix time) | Float |
updated_at_date | The date when we last scraped the record | Array of numbers (integers) |
updated_at_timestamp | The date when we last scraped the record (Unix time) | Float |
version_id | Dataset version ID | String |
source | The record source | String |
object | The data object/entity | String |
is_deleted | Marks if the user is available on GitHub | Boolean |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
doc | Dataset starting point | Object |
source_id | User profile identifier on GitHub | String |
id | Record identifier in our database | String |
site_admin | Indicates if the user is the site admin | Boolean |
type | Record type | String |
See snippets of the dataset for reference:
Data point | Description | Data type |
---|---|---|
events_url | GitHub REST API response | String |
node_id | Identification key assigned by GitHub REST API | String |
See snippets of the dataset for reference:
Data point | Description | Data type |
---|---|---|
image | Developer's avatar/logo | String |
bio | Developer's bio Note: contains control characters | String |
url | Developer's GitHub profile URL | String |
location | Developer's location | String |
username | Developer's username | String |
name | Developer's name Note: not necessarily the same as the username | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
contact_info | Publicly accessible contact information | Object |
blog | Developer's blog | String |
Developer's Twitter handle | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
company | Company affiliated with the developer | String |
hireable | Marks if the developer is open for hire | Boolean/null |
See snippets of the dataset for reference:
Data point | Description | Data type |
---|---|---|
organization | Organizations the developer is connected to | Array of objects |
description | Organization description Note: may contain control characters | String |
source_id | Organization identification key on GitHub | String |
username | Organization name | String |
node_id | Organization identifier assigned by GitHub REST API | String |
url | Information on the organization returned by the GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
follower_count | Developer's follower count | Integer |
following_count | Number of people the developer follows | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
followed_by | Developer's followers | Array of objects |
username | Follower's username | String |
source_id | User identifier on GitHub | Integer |
url | Follower's GitHub profile URL | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
is_following | Followed users | Array of objects |
username | Followee's username | String |
source_id | User identifier on GitHub | Integer |
url | Followee's GitHub profile | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
public_gist_count | Number of public gists by the developer | Integer |
public_repo_count | Number of public developer's repositories | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
repo | Developer's public repositories | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Indicates if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String (date) |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Link to the original repository | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
language | Main programming language in the repository | String |
languages_distribution | The distribution of languages in the repository (percentage) | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
starred | Repositories the developer starred | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Shows if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Link to the original repository | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String (date) |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
language | Main programming language in the repository | String |
languages_distribution | Languages and their distribution in the repository by percentage | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference:
Data type | Description | Data type |
---|---|---|
subscription | Repositories the developer is subscribed to | Array of objects |
disabled | Indicates if the repository was disabled at the time of the last scrape | Boolean |
archived | Indicates if the repository is archived and no longer accessible | Boolean |
created_at | Timestamp when the repository was created in ISO 8601format | String (date) |
default_branch | Default branch title | String |
description | Repository description Note: may contain control characters | String |
fork | Marks if the repository in a record is a copy of another repository | Boolean |
fork_count | Number of repository copies | Integer |
forked_from | Link to the original repository | String |
has_downloads | Indicates if other users have downloaded the repository | Boolean |
has_issues | Marks if the repository has the issues section enabled | Boolean |
has_pages | Marks if the repository has the pages section enabled | Boolean |
has_projects | Marks if the repository has the projects section enabled | Boolean |
has_wiki | Marks if the repository has the wiki section enabled | Boolean |
website | Project website | String |
url | Repository URL | String |
source_id | Repository identifier on GitHub | Integer |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
open_issues_count | Number of open issues in the repository | Integer |
pushed_at | Timestamp when the repository was published in ISO 8601format | String (date) |
size | Repository size | Integer |
stargazer_count | Number of people who have starred the repository | Integer |
updated_at | Timestamp when the repository was updated in ISO 8601format | String (date) |
watcher_count | Number of people who are following the repository updates | Integer |
topics | Repository topics | Array of strings |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
language | Main programming language in the repository | String |
languages_distribution | Languages and their distribution in the repository by percentage | Object |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
repo_name | Repository title | String |
repo_owner | Repository owner | String |
name | Repository name | String |
node_id | Repository identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
license | Open-source licenses the repository uses | Object |
key | Github URL identifying license | String |
name | License name | String |
spdx_id | Spdx license ID | String |
url | URL redirecting to Github info on licensing | String |
node_id | License identifier assigned by GitHub REST API | String |
See a snippet of the dataset for reference:
Data point | Description | Data type |
---|---|---|
owner | Repository owner | Object |
image | Developer's logo/avatar | String |
url | Developer's profile | String |
source_id | Developer's identifier on Github | Integer |
username | Developer's username | String |
node_id | Developer's identifier assigned by GitHub REST API | String |
site_admin | Marks if the user is the site admin | Boolean |
type | User's profile type | String |
See a snippet of the dataset for reference: