# Delivery Formats

We seek to provide multiple ways to access our data, ensuring that your team can get information in the right format and at the right time.

## Delivery options

We generally offer two options: **flat file datasets** and access data via **API**. Depending on the project scope and size, you can choose the option that best suits your needs.

| Delivery option                                                                         | Sources                      | Description                                                                     |
| --------------------------------------------------------------------------------------- | ---------------------------- | ------------------------------------------------------------------------------- |
| Flat files: download the dataset using a web link                                       | All sources                  | We provide you with the link and login credentials for you to retrieve the data |
| Flat files: uploaded data file to your **cloud server** (S3, Azure, Google Cloud, etc.) | All sources                  | Provide your storage credentials, and we will send the data to you              |
| APIs: get data using available APIs                                                     | Company, Employee, Jobs data | Access data by sending API requests                                             |

## Get a flat-file dataset

We offer **nine** different flat-file datasets for businesses. Datasets are available in **JSON, JSONL, CSV, or Parquet formats:**

| Dataset               | Delivery format     |
| --------------------- | ------------------- |
| Base Company          | JSONL; JSON         |
| Base Employee         | JSONL; Parquet; CSV |
| Employee Posts        | JSONL; Parquet      |
| Base Jobs             | JSONL; Parquet; CSV |
| Clean Company         | JSONL; Parquet; CSV |
| Clean Employee        | JSONL; Parquet; CSV |
| Multi-source Company  | JSONL; Parquet      |
| Multi-source Employee | JSONL; Parquet      |
| Multi-source Jobs     | JSONL; Parquet      |

{% hint style="info" %}
We are constantly improving our delivery capabilities. If you do not find a preferred method or format, contact us.
{% endhint %}

## Access the data via API

Data access via our API provides a freshly collected dataset in JSON format that can be analyzed using Python, Ruby, PHP, or any other preferred scripting language.

## Recommended tools

{% hint style="info" %}
We can only offer general solutions since it depends on the tech stack you use or what you prefer using.
{% endhint %}

Ingesting large datasets can be efficiently managed using a combination of tools and technologies tailored to handle big data workloads.

| Tool category                             | Tool example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| ----------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Database systems                          | <p><a href="https://www.mongodb.com/docs/manual/">Mongo DB</a></p><p><a href="https://docs.couchbase.com/home/index.html">Couchbase</a></p><p><a href="https://www.postgresql.org/docs/">PostgreSQL</a></p><p><a href="https://cassandra.apache.org/_/index.html">Apache Cassandra</a></p><p><a href="https://docs.aws.amazon.com/redshift/?icmpid=docs_homepage_analytics">Amazon Redshift</a></p><p><a href="https://docs.aws.amazon.com/s3/?icmpid=docs_homepage_featuredsvcs">Amazon S3</a> + <a href="https://docs.aws.amazon.com/athena/?icmpid=docs_homepage_analytics">Athena</a></p><p><a href="https://www.elastic.co/elasticsearch">Elasticsearch</a></p> |
| Data processing frameworks                | <p><a href="https://spark.apache.org/docs/latest/">Apache Spark</a></p><p><a href="https://hadoop.apache.org/docs/current/">Apache Hadoop</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Data ingestion tools                      | <p><a href="https://nifi.apache.org/documentation/">Apache NiFi</a></p><p><a href="https://cloud.google.com/bigquery/?utm_source=google&#x26;utm_medium=cpc&#x26;utm_campaign=emea-emea-all-en-dr-bkws-all-all-trial-e-gcp-1707574&#x26;utm_content=text-ad-none-any-DEV_c-CRE_683760970761-ADGP_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-KWID_43700078882901453-kwd-63326440124-userloc_9062284&#x26;utm_term=KW_google%20bigquery-NET_g-PLAC_&#x26;&#x26;gad_source=1&#x26;gclid=CjwKCAjwjqWzBhAqEiwAQmtgT_YDxbhoa9HU9m1P8VqZqtyO1esrm4j0F-dmDNxirswc4LeVn5aDtxoCYioQAvD_BwE&#x26;gclsrc=aw.ds#how-it-works">Google BigQuery</a></p>         |
| Data ETL (Extract, Transform, Load) tools | <p><a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/serverless-etl-aws-glue/aws-glue-etl.html">AWS Glue</a></p><p><a href="https://www.talend.com/knowledge-center/">Talend</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Data transformation                       | <p><a href="https://docs.getdbt.com/">dbt</a></p><p><a href="https://pandas.pydata.org/docs/">Pandas</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
