Data Dictionary: Clean Company API
Data dictionary for data retrieved using Clean Company API endpoints.
This data dictionary shows all available data fields, explains their values, and provides data samples from the Clean Company API data.
The data provided in the samples is strictly intended for illustrative purposes, allowing you to visualize its appearance and format.
Metadata
last_updated
Cleaned
Record update date
String (date)
professional_network_source_id
Raw
Record identification key assigned by Professional Network
String
created_at
Cleaned
Time and date when we created the company record
String (date)
Cleaning actions
last_updated
Value is converted to the yyyy-mm-dd format.
created_at
Value is converted to the yyyy-mm-dd format.
Identifiers
id
Raw
Company ID in our database
Number (integer)
name
Cleaned
Company name
String
logo
Cleaned
BASE64 encoded JPEG image of the company's logo
String
ticker
Cleaned
Company's stock ticker
String
exchange
Cleaned
Company's stock exchange
String
Cleaning and enriching actions
name
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
company_logo
Image is resized to 50x50px.
ticker/exchange
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
Firmographics
industry
Cleaned
Industry the company operates in
String
type
Cleaned
Company type
String
founded
Cleaned
Company founding year
String
size_range
Cleaned
Company size range
String
size_employees_count
Enriched
The number of employees working in the company
Number (integer)
size_employees_count_inferred
Enriched
Estimated number of employees, calculated based on inferred employee data
Number (integer)
followers
Cleaned
The number of company followers
Number (integer)
description
Cleaned
Company description
String
specialities
Raw
Company specialties
Array of strings
metadata_title
Enriched
Company title parsed from additional sources
String
metadata_description
Enriched
Company description parsed from additional sources
String
enriched_summary
Enriched
LLM enriched company summary
String
enriched_category
Enriched
Company category assigned with LLM
String
enriched_keywords
Enriched
LLM enriched company keywords
Array of strings
enriched_b2b
Enriched
Marks if the company offers B2B products/services enriched with the help of LLM
1 – B2B company
0 – not B2B company
Number (double)
Cleaning and enriching actions
industry
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
type
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
founded
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
None;Values are replaced with
Noneif the year is not between 500 and the current year.
followers
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
0;Every value is converted to an integer.
size_range
Some inconsistencies are fixed with overlapping values:
"1 employee" – "Myself Only";
"2-10 employees" – "1-10 employees";
"501-1,000 employees" – "501-1000 employees";
"1,001-5,000 employees" – "1001-5000 employees".
size_employees_count
When size_employees_count is 0, we check if we have any scraped profiles of employees working at this company. If yes, then we count how many employees are associated with it and change the value to that number. This can occur in cases when the public profile does not show some of the employees.
industry
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
description
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
None;Value is replaced to
Noneif the description is shorter than 3 characters;Text styling tags removed;
Multiple spaces are replaced with single ones.
Product and services overview
pricing_available
Enriched
Marks if the company service pricing is available online
Boolean
free_trial_available
Enriched
Marks if the company offers a free trial of their services
Boolean
demo_available
Enriched
Marks if the company offers a demo
Boolean
is_downloadable
Enriched
Marks if the company offers a downloadable file/service
Boolean
mobile_apps_exist
Enriched
Marks if the company has mobile apps for their service
Boolean
online_reviews_exist
Enriched
Marks if the company has any online reviews
Boolean
api_docs_exist
Enriched
Marks if the company has API docs published
Boolean
Enriching actions
pricing_available,
free_trial_available,
demo_available,
is_downloadable,
mobile_apps_exist,
online_reviews_exist,
api_docs_exist
Information is taken from the official company website.
Contact information
phone_numbers
Enriched
Publicly available company phone number
Array of strings
emails
Enriched
Publicly available company email address
Array of strings
Enriching actions
phone_numbers,
emails
Information taken from the official company website.
Social media and websites
websites_main_original
Raw
Company website URL
String
websites_main
Cleaned
Cleaned and resolved company website URL
String
websites_resolved
Enriched
Resolved company website URL
String
websites_facebook
Enriched
Company Facebook URL
String
websites_twitter
Enriched
Company Twitter URL
String
websites_professional_network
Raw
Professional network URL where the company was first discovered. It can be outdated if the company has changed its profile
String
websites_professional_network_canonical
Raw
The current official Professional network URL for the company, reflecting the most recent updates
String
social_discord_urls
Enriched
Company discord profile/channel
Array of strings
social_facebook_urls
Enriched
Company Facebook page
Array of strings
social_instagram_urls
Enriched
Company Instagram page
Array of strings
social_professional_network_urls
Enriched
Company professional network profile
Array of strings
social_pinterest_urls
Enriched
Company Pinterest page
Array of strings
social_tiktok_urls
Enriched
Company TikTok profile
Array of strings
social_twitter_urls
Enriched
Company Twitter profile
Array of strings
social_x_urls
Enriched
Company X profile
Array of strings
social_youtube_urls
Enriched
Company YouTube channel/profile
Array of strings
social_github_urls
Enriched
Company Github page/profile
Array of strings
social_reddit_urls
Enriched
Company Reddit profile
Array of strings
Cleaning and enriching actions
websites_main
Every
website_main_originalURL is resolved;Each URL we collect is parsed, parameters are removed and added to the
websites_maincolumn.Only one company can have a unique
<domain>.<tld>/<path>.Expired domains are removed.
websites_twitter
If <domain> (from websites_main) == twitter, we move the URL value to websites_twitter.
websites_facebook
If <domain> (from websites_main) == facebook, we move the URL value to websites_facebook.
websites_professional_network
If <domain> (from websites_main) == professional_network, we move the URL value to websites_professional_network.
social_discord_urls,
social_facebook_urls,
social_instagram_urls,
social_professional_network_urls,
social_pinterest_urls,
social_tiktok_urls,
social_twitter_urls,
social_x_urls,
social_youtube_urls,
social_github_urls,
social_reddit_urls
URLs taken from the official company website.
Location
location_hq_country
Cleaned
Headquarters country
String
location_hq_raw_address
Cleaned
Detailed company location
String
location_hq_regions
Enriched
Geographical region(s) the company is associated with based on the company_location_hq_country value.
String
locations_full
Raw
Full company location information
Array of objects
location_address
Raw
Company location address
String
is_primary
Raw
Marks if the listed location is the primary
Boolean
city
Enriched
Location city
String
state
Enriched
Location state
String
country_code
Enriched
Country code
String
country
Enriched
Country
String
country_iso_2
Enriched
ISO 2-letter code of the location country
String
country_iso_3
Enriched
ISO 3-letter code of the location country
String
regions
Enriched
Regions list
Struct
region
Enriched
Region
String
Cleaning actions
location_hq_country
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.
location_hq_raw_address
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
None;Special trailing characters trimmed;
Value
location_hq_countryadded to the end of the string (separated by a comma).
Funding information
funding_rounds
Information on company funding (rounds)
Array of objects
last_round_investors_count
Cleaned
The number of investors that participated in the last funding round
Number (integer)
total_rounds_count
Cleaned
Total number of funding rounds
Number (integer)
last_round_type
Cleaned
Last funding round type
String
last_round_date
Cleaned
Last funding round date
String
last_round_money_raised
Cleaned
Amount of money raised during the last funding round
Number (integer)
financial_website_url
Raw
Last funding round financial website URL
String
Cleaning actions
funding_rounds
Duplicate data fields filtered out;
Removed empty/irrelevant data fields.
last_round_investors_count
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
0;Every value is converted to an integer.
total_rounds_count
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
0;Every value is converted to an integer.
last_round_type
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0.
last_round_date
Value is converted to the yyyy-mm-dd format.
last_round_money_raised
Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value
0;Every value is converted to an integer (integer value is parsed from the text value).
Technologies
technologies
-
Data type changed from array of strings to array of structs
Array of structs
technology
Enriched
Technology name
String
first_verified_at
Cleaned
Date this technology was first assigned to the company
String (date)
last_verified_at
Cleaned
Date this technology was last assigned to the company
String (date)
Enriching and cleaning actions
company_technologies
Enriched by our ML model from multiple sources.
first_verified_at
last_verified_at
Value is converted to the yyyy-mm-dd format.
Supporting fields
expired_domain
Enriched
Marks if the company_websites_main_original URL redirects to a domain dealer
Number (integer)
unique_domain
Enriched
Marks if only this company has the right to have this unique domain, e.g., company_websites_main: https://ibm.com
Number (integer)
unique_website
Enriched
Marks if only this company has a unique website but not necessarily a unique domain, e.g., company_websites_main: https://ibm.com/generation
Number (integer)
Company updates
updates
Company posts and related details
Array of objects
urn
Raw
String-based identifier
String
followers
Raw
Number of followers
String
date
Raw
Post publish date (e.g., 1 month ago)
String
description
Raw
Published text
Note: may contain control characters
String
reactions_count
Raw
Number of reactions on the post
Integer
comments_count
Raw
Number of comments on the post
Integer
reshared_post_author
Raw
Reshared post author
String
reshared_post_author_url
Raw
Author's profile URL
String
reshared_post_author_headline
Raw
Author's headline
String
reshared_post_description
Raw
Reshared post text
String
reshared_post_followers
Raw
The number of followers of the reshared post author
Integer
reshared_post_date
Raw
Date the reshared post was published (e.g., 1 month ago)
String
Last updated
Was this helpful?