Data Dictionary: Clean Company API

Data dictionary for data retrieved using Clean Company API endpoints.

This data dictionary shows all available data points, explains their values, and provides data samples from the Clean Company API data.

The data provided in the samples is strictly intended for illustrative purposes, allowing you to visualize its appearance and format.

Metadata

Data point
Processing
Description
Data type

last_updated

Cleaned

Record update date

String (date)

professional_network_source_id

Raw

Record identification key assigned by Professional Network

String

created_at

Cleaned

Time and date when we created the company record

String (date)

Meta data
"created_at": "2019-04-07",
"last_updated": "2023-12-06",
"professional_network_source_id": "60191",
Cleaning actions
Data point
Cleaning action

last_updated

Value is converted to the yyyy-mm-dd format.

created_at

Value is converted to the yyyy-mm-dd format.


Identifiers

Data point
Processing
Description
Data type

id

Raw

Company ID in our database

Number (integer)

name

Cleaned

Company name

String

logo

Cleaned

BASE64 encoded JPEG image of the company's logo

String

ticker

Cleaned

Company's stock ticker

String

exchange

Cleaned

Company's stock exchange

String

Identifiers
"id": 8039488,
"name": "Example Company",
"logo": "/9j/4AAQSkZJRgABAQAAAQABAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAAjACMDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9U6K+K7P9rfx1cfFG7smj00aNHrBsltBbHd5Qm8v/AFm7O7HOfXtivc9N/aHsLLRr2612yuI3t/EFzoSGxj83zGjDOr7cgj5ByBnkHHHTtqYOtTtdXv2Pr8dwrmeAUHOKlzJP3Xd6/wBdLnsNFeX+EP2i/CXjjxNa6DpbXr6jcPNGqSQBQpiUNICd3YMp4zncMU/R/wBoHw3rj3S21vqWLc7Sz26gMd8CYHz5+9cR9cd/SsXQqp2cWeNPKcfTk4zoyTST26NtL72n9zPTaK5Tw78QYPFGlR6jYaTqr2zySxAyW6o26ORo3BBbPDIworNwknZnDPD1acnCas1o0fL3wn8TfCbwHqHie68ez6bY+JLfxJdzWzXsEjzIgkyhAUHo24jj39K0vgvN4g+JPg3xHq/hKO1Nu3jG/njkvoQfPtHgAOzeOHbeVz0GSD3r6ivvB2g6ndPc3ei6ddXD/emmtI3dvqSuTWjY2FtptrHbWlvFa28YwkUKBEX6AcCu+eLjK7Sd3bd3XyPtcZxHQr06kqdOTqT5b80lKMVG+kY8qdn5s+cbP4VePrO+iu49H0W2kjdQRarbqSm5/M2NsBVjG6oD/s88YJZqHwQ8U/2VZ/YNI0z7X5UInivEtzGMQxpIoABxkocEE9EPbj6YorL63O97I8NZ5ilLmSX3P/M838FaV4x0fw1a2dzBbQzxtJvX7QgyTIx3YWPABznHbPPNFekUVzOd3eyPJniHUm5uKu9f61CiiiszkCiiigAooooA/9k=",
"ticker": "DHCC",
"exchange": "NYSE",
Cleaning and enriching actions
Data point
Cleaning/enriching action

name

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

company_logo

Image is resized to 50x50px.

ticker/exchange

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Firmographics

Data point
Processing
Description
Data type

industry

Cleaned

Industry the company operates in

String

type

Cleaned

Company type

String

founded

Cleaned

Company founding year

String

size_range

Cleaned

Company size range

String

size_employees_count

Enriched

The number of employees working in the company

Number (integer)

followers

Cleaned

The number of company followers

Number (integer)

description

Cleaned

Company description

String

specialities

Raw

Company specialties

Array of strings

metadata_title

Enriched

Company title parsed from additional sources

String

metadata_description

Enriched

Company description parsed from additional sources

String

enriched_summary

Enriched

LLM enriched company summary

String

enriched_category

Enriched

Company category assigned with LLM

String

enriched_keywords

Enriched

LLM enriched company keywords

Array of strings

enriched_b2b

Enriched

Marks if the company offers B2B products/services enriched with the help of LLM 1 – B2B company 0 – not B2B company

Number (double)

Firmographics
"type": "Privately Held",
"founded": "2011",
"followers": 1234,
"size_range": "11-50 employees",
"size_employees_count": 5,
"industry": "Unique industry",
"description": "Digital Example Company offers very important services.",
"specialities": [
  "Example_1"
],
"enriched_summary": "Digital Example Company offers services.",
"enriched_keywords": [
  "keyword_1",
  "keyword_2"
],
"enriched_b2b": 0.0,
"enriched_category": "Example_2",
"metadata_title": "A great company for you",
"metadata_description": null,
Cleaning and enriching actions
Data point
Cleaning/enriching action

industry

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

type

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

founded

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Values are replaced with None if the year is not between 500 and the current year.

followers

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

size_range

Some inconsistencies are fixed with overlapping values:

  • "1 employee" – "Myself Only";

  • "2-10 employees" – "1-10 employees";

  • "501-1,000 employees" – "501-1000 employees";


  • "1,001-5,000 employees" – "1001-5000 employees".

size_employees_count

When size_employees_count is 0, we check if we have any scraped profiles of employees working at this company. If yes, then we count how many employees are associated with it and change the value to that number. This can occur in cases when the public profile does not show some of the employees.

industry

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

description

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Value is replaced to None if the description is shorter than 3 characters;

  • Text styling tags removed;


  • Multiple spaces are replaced with single ones.


Product and services overview

Data point
Processing
Description
Data type

pricing_available

Enriched

Marks if the company service pricing is available online

Boolean

free_trial_available

Enriched

Marks if the company offers a free trial of their services

Boolean

demo_available

Enriched

Marks if the company offers a demo

Boolean

is_downloadable

Enriched

Marks if the company offers a downloadable file/service

Boolean

mobile_apps_exist

Enriched

Marks if the company has mobile apps for their service

Boolean

online_reviews_exist

Enriched

Marks if the company has any online reviews

Boolean

api_docs_exist

Enriched

Marks if the company has API docs published

Boolean

Product and services overview
    "pricing_available": false,
    "free_trial_available": false,
    "demo_available": false,
    "is_downloadable": false,
    "mobile_apps_exist": false,
    "online_reviews_exist": false,
    "api_docs_exist": false,
Enriching actions
Data point
Enriching action

pricing_available, free_trial_available, demo_available, is_downloadable, mobile_apps_exist, online_reviews_exist, api_docs_exist

Information is taken from the official company website.


Contact information

Data point
Processing
Description
Data type

phone_numbers

Enriched

Publicly available company phone number

Array of strings

emails

Enriched

Publicly available company email address

Array of strings

Contact information
"phone_numbers": [
        "0000 111 222"
    ],
    "emails": [
        "[email protected]"
    ],
Enriching actions
Data point
Enriching action

phone_numbers, emails

Information taken from the official company website.


Social media and websites

Data point
Processing
Description
Data type

websites_main_original

Raw

Company website URL

String

websites_main

Cleaned

Cleaned and resolved company website URL

String

websites_resolved

Enriched

Resolved company website URL

String

websites_facebook

Enriched

Company Facebook URL

String

websites_twitter

Enriched

Company Twitter URL

String

websites_professional_network

Raw

Company professional network URL

String

websites_professional_network_canonical

Raw

Canonical professional network profile URL

String

social_discord_urls

Enriched

Company discord profile/channel

Array of strings

social_facebook_urls

Enriched

Company Facebook page

Array of strings

social_instagram_urls

Enriched

Company Instagram page

Array of strings

social_professional_network_urls

Enriched

Company professional network profile

Array of strings

social_pinterest_urls

Enriched

Company Pinterest page

Array of strings

social_tiktok_urls

Enriched

Company TikTok profile

Array of strings

social_twitter_urls

Enriched

Company Twitter profile

Array of strings

social_x_urls

Enriched

Company X profile

Array of strings

social_youtube_urls

Enriched

Company YouTube channel/profile

Array of strings

social_github_urls

Enriched

Company Github page/profile

Array of strings

social_reddit_urls

Enriched

Company Reddit profile

Array of strings

Social media and websites
 "websites_main_original": "http://www.example-company.com.",
 "websites_main": "https://example-company.com.",
 "websites_facebook": "https://www.facebook.com/example-company",
 "websites_twitter": "https://www.twitter.com/example-company",
 "websites_professional_network": "https://www.professional-network.com/company/example-company",
 "websites_professional_network_canonical": "https://www.professional-network.com/company/example-company",
Cleaning and enriching actions
Data point
Cleaning/enriching action

websites_main

  • Every website_main_original URL is resolved;

  • Each URL we collect is parsed, parameters are removed and added to the websites_main column.

  • Only one company can have a unique <domain>.<tld>/<path>.

  • Expired domains are removed.

websites_twitter

If <domain> (from websites_main) == twitter, we move the URL value to websites_twitter.

websites_facebook

If <domain> (from websites_main) == facebook, we move the URL value to websites_facebook.

websites_professional_network

If <domain> (from websites_main) == professional_network, we move the URL value to websites_professional_network.

social_discord_urls, social_facebook_urls, social_instagram_urls, social_professional_network_urls, social_pinterest_urls, social_tiktok_urls, social_twitter_urls, social_x_urls, social_youtube_urls, social_github_urls, social_reddit_urls

URLs taken from the official company website.


Location

Data point
Processing
Description
Data type

location_hq_country

Cleaned

Company headquarters country

String

location_hq_raw_address

Cleaned

Detailed company location

String

location_hq_regions

Enriched

Geographical region(s) the company is associated with based on the company_location_hq_country value.

String

locations_full

Raw

Full company location information

Array of objects

location_adress

Raw

Full location of the company HQ

String

is_primary

Raw

Marks if the listed location is the primary

Boolean

Location
"location_hq_raw_address": "Encinitas, CA, United States",
"location_hq_country": "United States",
"location_hq_regions": "[Northern America, Northern America, AMER]",
Cleaning actions
Data point
Cleaning action

location_hq_country

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

location_hq_raw_address

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Special trailing characters trimmed;


  • Value location_hq_country added to the end of the string (separated by a comma).


Funding information

Data point
Processing
Description
Data type

funding_rounds

Information on company funding (rounds)

Array of objects

last_round_investors_count

Cleaned

The number of investors that participated in the last funding round

Number (integer)

total_rounds_count

Cleaned

Total number of funding rounds

Number (integer)

last_round_type

Cleaned

Last funding round type

String

last_round_date

Cleaned

Last funding round date

String

last_round_money_raised

Cleaned

Amount of money raised during the last funding round

Number (integer)

financial_website_url

Raw

Last funding round financial website URL

String

Funding information
 "funding_rounds": [
        {
            "last_round_investors_count": 10,
            "total_rounds_count": 5,
            "last_round_type": "Series A",
            "last_round_date": "2020-12-09",
            "last_round_money_raised": 1230000,
            "financial_website_url": "https://www.financial_website.com/funding_round/example"
        }
    ]
}
Cleaning actions
Data point
Cleaning action

funding_rounds

  • Duplicate data points filtered out;

  • Removed empty/irrelevant data points.

last_round_investors_count

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

total_rounds_count

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

last_round_type

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0.

last_round_date

Value is converted to the yyyy-mm-dd format.

last_round_money_raised

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer (integer value is parsed from the text value).


Technologies

Data point
Processing
Description
Data type

technologies

-

Data type changed from array of strings to array of structs

Array of structs

technology

Enriched

Technology name

String

first_verified_at

Cleaned

Date this technology was first assigned to the company

String (date)

last_verified_at

Cleaned

Date this technology was last assigned to the company

String (date)

Technologies
"technologies_used": [
    {
      "technology": "React",
      "first_verified_at": "2022-03-15",
      "last_verified_at": "2024-10-15"
    }
  ]
Enriching and cleaning actions
Data point
Enriching action

company_technologies

Enriched by our ML model from multiple sources.

first_verified_at last_verified_at

Value is converted to the yyyy-mm-dd format.


Supporting fields

Data point
Processing
Description
Data type

expired_domain

Enriched

Marks if the company_websites_main_original URL redirects to a domain dealer

Number (integer)

unique_domain

Enriched

Marks if only this company has the right to have this unique domain, e.g., company_websites_main: https://ibm.com

Number (integer)

unique_website

Enriched

Marks if only this company has a unique website but not necessarily a unique domain, e.g., company_websites_main: https://ibm.com/generation

Number (integer)

Supporting fields
    "expired_domain": 0,
    "unique_domain": 0,
    "unique_website": 0,

Company updates

Data point
Processing
Description
Data type

updates

Company posts and related details

Array of objects

urn

Raw

String-based identifier

String

followers

Raw

Number of followers

String

date

Raw

Post publish date (e.g., 1 month ago)

String

description

Raw

Published text

Note: may contain control characters

String

reactions_count

Raw

Number of reactions on the post

Integer

comments_count

Raw

Number of comments on the post

Integer

reshared_post_author

Raw

Reshared post author

String

reshared_post_author_url

Raw

Author's profile URL

String

reshared_post_author_headline

Raw

Author's headline

String

reshared_post_description

Raw

Reshared post text

String

reshared_post_followers

Raw

The number of followers of the reshared post author

Integer

reshared_post_date

Raw

Date the reshared post was published (e.g., 1 month ago)

String

Company updates
"updates": [
      {
        "urn": "urn:pn:activity:0000000000000000000"
        "followers": 1371,
        "date": "1mo",
        "description": "Example description",
        "reactions_count": 22,
        "comments_count": 2,
        "reshared_post_author": "John Doe",
        "reshared_post_author_url": "https://www.professional_network.com/in/john-doe",
        "reshared_post_author_headline": "Co-Founder at Example Company, TEDx & Keynote Speaker",
        "reshared_post_description": "Example description",
        "reshared_post_followers": 45,
        "reshared_post_date": "1mo"
      }
  ]

Last updated

Was this helpful?