Community & Repository Data
GitHub Users

Data Dictionary

45min

Overview

Contains explanations and examples of all data fields available in the GitHub Users dataset.

All personal/company information mentioned within this context is entirely fictional and is solely intended for illustrative purposes.

The data points in the example snippets have been rearranged for better grouping. To see where a specific data point stands, check the full data sample here.



Metadata

Record metadata

Data point

Description

Data type

meta

Contains information about the record

Object

created_at_date

The date when we first scraped the record

Array of numbers (integers)

created_at_timestamp

The date we first scraped the record (Unix time)

Float

updated_at_date

The date when we last scraped the record

Array of numbers (integers)

updated_at_timestamp

The date when we last scraped the record (Unix time)

Float

version_id

Dataset version ID

String

source

The record source

String

object

The data object/entity

String

is_deleted

Marks if the user is available on GitHub

Boolean

See a snippet of the dataset for reference:

Metadata


User profile metadata

Data point

Description

Data type

doc

Dataset starting point

Object

source_id

User profile identifier on GitHub

String

id

Record identifier in our database

String

site_admin

Indicates if the user is the site admin

Boolean

type

Record type

String

See snippets of the dataset for reference:

User profile metadata


Data point

Description

Data type

events_url

GitHub REST API response

String

node_id

Identification key assigned by GitHub REST API

String

See snippets of the dataset for reference:

User profile metadata


Developer's profile details

Data point

Description

Data type

image

Developer's avatar/logo

String

bio

Developer's bio Note: contains control characters

String

url

Developer's GitHub profile URL

String

location

Developer's location

String

username

Developer's username

String

name

Developer's name Note: not necessarily the same as the username

String

See a snippet of the dataset for reference:

Developer profile details


Public contact information

Data point

Description

Data type

contact_info

Publicly accessible contact information

Object

blog

Developer's blog

String

twitter

Developer's Twitter handle

String

See a snippet of the dataset for reference:

Contact information


Developer affiliation

Data point

Description

Data type

company

Company affiliated with the developer

String

hireable

Marks if the developer is open for hire

Boolean/null

See snippets of the dataset for reference:

Developer affiliation


Organizations

Data point

Description

Data type

organization

Organizations the developer is connected to

Array of objects

description

Organization description Note: may contain control characters

String

source_id

Organization identification key on GitHub

String

username

Organization name

String

node_id

Organization identifier assigned by GitHub REST API

String

url

Information on the organization returned by the GitHub REST API

String

See a snippet of the dataset for reference:

Organization


Following

Data point

Description

Data type

follower_count

Developer's follower count

Integer

following_count

Number of people the developer follows

Integer

See a snippet of the dataset for reference:

Following


Data point

Description

Data type

followed_by

Developer's followers

Array of objects

username

Follower's username

String

source_id

User identifier on GitHub

Integer

url

Follower's GitHub profile URL

String

See a snippet of the dataset for reference:

Followers


Data point

Description

Data type

is_following

Followed users

Array of objects

username

Followee's username

String

source_id

User identifier on GitHub

Integer

url

Followee's GitHub profile

String

See a snippet of the dataset for reference:

Following


Gists and repos

Data point

Description

Data type

public_gist_count

Number of public gists by the developer

Integer

public_repo_count

Number of public developer's repositories

Integer

See a snippet of the dataset for reference:

Gists and repos


Developer's repositories

Data point

Description

Data type

repo

Developer's public repositories

Array of objects

disabled

Indicates if the repository was disabled at the time of the last scrape

Boolean

archived

Indicates if the repository is archived and no longer accessible

Boolean

created_at

Timestamp when the repository was created in ISO 8601format

String (date)

default_branch

Default branch title

String

description

Repository description Note: may contain control characters

String

fork

Marks if the repository in a record is a copy of another repository

Boolean

fork_count

Number of repository copies

Integer

forked_from

Link to the original repository

String

has_downloads

Indicates if other users have downloaded the repository

Boolean

has_issues

Marks if the repository has the issues section enabled

Boolean

has_pages

Marks if the repository has the pages section enabled

Boolean

has_projects

Marks if the repository has the projects section enabled

Boolean

has_wiki

Marks if the repository has the wiki section enabled

Boolean

website

Project website

String

url

Repository URL

String

source_id

Repository identifier on GitHub

Integer

See a snippet of the dataset for reference:

Developer's repositories


Data point

Description

Data type

open_issues_count

Number of open issues in the repository

Integer

pushed_at

Timestamp when the repository was published in ISO 8601format

String (date)

size

Repository size

Integer

stargazer_count

Number of people who have starred the repository

Integer

updated_at

Timestamp when the repository was updated in ISO 8601format

String

watcher_count

Number of people who are following the repository updates

Integer

topics

Repository topics

Array of strings

See a snippet of the dataset for reference:

Developer's repositories


Languages in the repo

Data point

Description

Data type

language

Main programming language in the repository

String

languages_distribution

The distribution of languages in the repository (percentage)

Object

See a snippet of the dataset for reference:

Programming languages


Repository details

Data point

Description

Data type

repo_name

Repository title

String

repo_owner

Repository owner

String

name

Repository name

String

node_id

Repository identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

Repository owner


Used licenses

Data point

Description

Data type

license

Open-source licenses the repository uses

Object

key

Github URL identifying license

String

name

License name

String

spdx_id

Spdx license ID

String

url

URL redirecting to Github info on licensing

String

node_id

License identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

License


Repository owner

Data point

Description

Data type

owner

Repository owner

Object

image

Developer's logo/avatar

String

url

Developer's profile

String

source_id

Developer's identifier on Github

Integer

username

Developer's username

String

node_id

Developer's identifier assigned by GitHub REST API

String

site_admin

Marks if the user is the site admin

Boolean

type

User's profile type

String

See a snippet of the dataset for reference:

Repository owner


Starred repositories

Data point

Description

Data type

starred

Repositories the developer starred

Array of objects

disabled

Indicates if the repository was disabled at the time of the last scrape

Boolean

archived

Shows if the repository is archived and no longer accessible

Boolean

created_at

Timestamp when the repository was created in ISO 8601format

String

default_branch

Default branch title

String

description

Repository description Note: may contain control characters

String

fork

Marks if the repository in a record is a copy of another repository

Boolean

fork_count

Number of repository copies

Integer

forked_from

Link to the original repository

String

has_downloads

Indicates if other users have downloaded the repository

Boolean

has_issues

Marks if the repository has the issues section enabled

Boolean

has_pages

Marks if the repository has the pages section enabled

Boolean

has_projects

Marks if the repository has the projects section enabled

Boolean

has_wiki

Marks if the repository has the wiki section enabled

Boolean

website

Project website

String

url

Repository URL

String

source_id

Repository identifier on GitHub

Integer

See a snippet of the dataset for reference:

Starred repositories


Data point

Description

Data type

open_issues_count

Number of open issues in the repository

Integer

pushed_at

Timestamp when the repository was published in ISO 8601format

String (date)

size

Repository size

Integer

stargazer_count

Number of people who have starred the repository

Integer

updated_at

Timestamp when the repository was updated in ISO 8601format

String (date)

watcher_count

Number of people who are following the repository updates

Integer

topics

Repository topics

Array of strings

See a snippet of the dataset for reference:

Starred repositories


Languages in the repo

Data point

Description

Data type

language

Main programming language in the repository

String

languages_distribution

Languages and their distribution in the repository by percentage

Object

See a snippet of the dataset for reference:

Programming languages


Repository details

Data point

Description

Data type

repo_name

Repository title

String

repo_owner

Repository owner

String

name

Repository name

String

node_id

Repository identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

Repository owner


Used licenses

Data point

Description

Data type

license

Open-source licenses the repository uses

Object

key

Github URL identifying license

String

name

License name

String

spdx_id

Spdx license ID

String

url

URL redirecting to Github info on licensing

String

node_id

License identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

License


Repository owner

Data point

Description

Data type

owner

Repository owner

Object

image

Developer's logo/avatar

String

url

Developer's profile

String

source_id

Developer's identifier on Github

Integer

username

Developer's username

String

node_id

Developer's identifier assigned by GitHub REST API

String

site_admin

Marks if the user is the site admin

Boolean

type

User's profile type

String

See a snippet of the dataset for reference:

Developer information


Developer's subscriptions

Data type

Description

Data type

subscription

Repositories the developer is subscribed to

Array of objects

disabled

Indicates if the repository was disabled at the time of the last scrape

Boolean

archived

Indicates if the repository is archived and no longer accessible

Boolean

created_at

Timestamp when the repository was created in ISO 8601format

String (date)

default_branch

Default branch title

String

description

Repository description Note: may contain control characters

String

fork

Marks if the repository in a record is a copy of another repository

Boolean

fork_count

Number of repository copies

Integer

forked_from

Link to the original repository

String

has_downloads

Indicates if other users have downloaded the repository

Boolean

has_issues

Marks if the repository has the issues section enabled

Boolean

has_pages

Marks if the repository has the pages section enabled

Boolean

has_projects

Marks if the repository has the projects section enabled

Boolean

has_wiki

Marks if the repository has the wiki section enabled

Boolean

website

Project website

String

url

Repository URL

String

source_id

Repository identifier on GitHub

Integer

See a snippet of the dataset for reference:

Developer's subscriptions


Data point

Description

Data type

open_issues_count

Number of open issues in the repository

Integer

pushed_at

Timestamp when the repository was published in ISO 8601format

String (date)

size

Repository size

Integer

stargazer_count

Number of people who have starred the repository

Integer

updated_at

Timestamp when the repository was updated in ISO 8601format

String (date)

watcher_count

Number of people who are following the repository updates

Integer

topics

Repository topics

Array of strings

See a snippet of the dataset for reference:

Developer's repositories


Languages in the repo

Data point

Description

Data type

language

Main programming language in the repository

String

languages_distribution

Languages and their distribution in the repository by percentage

Object

See a snippet of the dataset for reference:

language


Repository details

Data point

Description

Data type

repo_name

Repository title

String

repo_owner

Repository owner

String

name

Repository name

String

node_id

Repository identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

Repository owner


Used licenses

Data point

Description

Data type

license

Open-source licenses the repository uses

Object

key

Github URL identifying license

String

name

License name

String

spdx_id

Spdx license ID

String

url

URL redirecting to Github info on licensing

String

node_id

License identifier assigned by GitHub REST API

String

See a snippet of the dataset for reference:

license


Repository owner

Data point

Description

Data type

owner

Repository owner

Object

image

Developer's logo/avatar

String

url

Developer's profile

String

source_id

Developer's identifier on Github

Integer

username

Developer's username

String

node_id

Developer's identifier assigned by GitHub REST API

String

site_admin

Marks if the user is the site admin

Boolean

type

User's profile type

String

See a snippet of the dataset for reference:

Owner




Updated 15 Jul 2024
Did this page help you?