# Data Dictionary: Docker Hub Repositories

Dictionary contains explanations and examples of all data fields available in the **Docker Hub Repositories** dataset.

{% hint style="info" %}
All personal/company information mentioned within this context is entirely fictional and is solely intended for illustrative purposes.
{% endhint %}

{% tabs %}
{% tab title="Data fields per category" %}

1. [Metadata](#metadata)
2. [Repository publisher](#repository-publisher)
3. [Repository details](#repository-details)
   {% endtab %}
   {% endtabs %}

{% hint style="info" %}
The data fields in the example snippets have been rearranged for better grouping. To see where a specific data field stands, check the full data sample [here](https://docs.coresignal.com/additional-sources/docker-hub/docker-hub-repositories/data-sample).
{% endhint %}

***

## Metadata

### Record metadata

| Data field             | Description                                          | Data type                   |
| ---------------------- | ---------------------------------------------------- | --------------------------- |
| `meta`                 | Contains information about the record                | Object                      |
| `created_at_date`      | The date when we first scraped the record            | Array of numbers (integers) |
| `created_at_timestamp` | The date we first scraped the record (Unix time)     | Float                       |
| `updated_at_date`      | The date when we last scraped the record             | Array of numbers (integers) |
| `updated_at_timestamp` | The date when we last scraped the record (Unix time) | Float                       |
| `version_id`           | Dataset version ID                                   | String                      |
| `source`               | The record source                                    | String                      |
| `object`               | The data object/entity                               | String                      |
| `is_deleted`           | Marks if the record available on Docker Hub          | Boolean                     |

**See a snippet of the dataset for reference:**

{% code title="Metadata" %}

```json
		"_meta": {
			"created_at_date": [
				2023,
				5,
				26
			],
			"created_at_timestamp": 1685105728.013557,
			"updated_at_date": [
				2024,
				5,
				1
			],
			"updated_at_timestamp": 1714554186.669536,
			"version_id": "a1efb819",
			"source": "dockerhub",
			"object": "repository",
			"is_deleted": false
		},
```

{% endcode %}

### Repository metadata

| Data field     | Description                                                         | Data type |
| -------------- | ------------------------------------------------------------------- | --------- |
| `doc`          | Dataset starting point                                              | Object    |
| `source_id`    | Repository identifier on Docker Hub                                 | String    |
| `id`           | Record identifier in our database                                   | String    |
| `last_updated` | Timestamp when the repository was last updated in `ISO 8601` format | String    |

**See a snippet of the dataset for reference:**

{% code title="Metadata" %}

```json
"doc": {
	"id": "dockerhub_repository_example_repository_dev",
	"source_id": "example_repository_dev",
	"last_updated": "2022-06-09T08:32:05.75699Z",
```

{% endcode %}

## Repository publisher

| Data field  | Description                   | Data type |
| ----------- | ----------------------------- | --------- |
| `publisher` | Publisher's name              | String    |
| `name`      | Repository title              | String    |
| `hub_user`  | User tied with the repository | String    |
| `namespace` | Associated developer          | String    |

**See a snippet of the dataset for reference:**

{% code title="Publisher" %}

```json
"publisher": "dev",
"name": "example-repository",
"hub_user": "dev",
"namespace": "dev",
```

{% endcode %}

## Repository details

| Data field         | Description                                                                                 | Data type       |
| ------------------ | ------------------------------------------------------------------------------------------- | --------------- |
| `url`              | Repository URL                                                                              | String          |
| `repository_type`  | Repository type                                                                             | String          |
| `is_automated`     | Indicates if the repository automatically updates the Docker image version                  | Boolean         |
| `status`           | Repository status                                                                           | Integer/boolean |
| `description`      | Repository description                                                                      | String          |
| `full_description` | <p>Full repository description</p><p><strong>Note:</strong> contains control characters</p> | String          |

**See a snippet of the dataset for reference:**

{% code title="Repository details" %}

```json
"url": "https://hub.docker.com/r/example_repository/dev"
"repository_type": "image",
"is_automated": false,
"status": 1,
"description": "Example repository description",
"full_description": "Full example repository description",
```

{% endcode %}

### Statistics

| Data field           | Description                                 | Data type |
| -------------------- | ------------------------------------------- | --------- |
| `star_count`         | Number of stars the repository has received | Integer   |
| `pull_count`         | Number of repository downloads              | Integer   |
| `collaborator_count` | Number of collaborators for the repository  | Integer   |

**See a snippet of the dataset for reference:**

{% code title="Statistics" %}

```json
"star_count": 0,
"pull_count": 2,
"collaborator_count": 0,
```

{% endcode %}

### Permissions

| Data field    | Description                                                      | Data type |
| ------------- | ---------------------------------------------------------------- | --------- |
| `permissions` | Permissions in the repository                                    | Object    |
| `read`        | Indicates if the `read` permission is enabled in the repository  | Boolean   |
| `write`       | Indicates if the `write` permission is enabled in the repository | Boolean   |
| `admin`       | Indicates if the `admin` permission is enabled in the repository | Boolean   |

**See a snippet of the dataset for reference:**

{% code title="Permissions" %}

```json
"permissions": {
            "read": true,
            "write": false,
            "admin": false
          },
```

{% endcode %}
