Dictionary: Clean Company Data
Request access to our full documentation
This is a simplified version of our documentation. If you want to:
- Access data samples
- Learn more about the cleaning and enrichment actions
- Explore the complete list of data sources we offer
Clean Company Data provides high-quality, structured business data ready for immediate use. Our data is meticulously cleaned and enriched, allowing organizations to streamline their workflows and confidently make data-driven decisions. By leveraging Clean Company Data, businesses can reduce engineering overhead, gain access to additional insights, and work with optimized data formats for improved efficiency.
With multiple retrieval options—including flat file downloads in JSONL, Parquet, and CSV formats, as well as API access—our solution adapts to your needs, ensuring seamless integration into your existing data infrastructure.
Clean Company Data is derived from our Base Company Data.
The data points are separated into collections to visualize the data better. The data provided in the samples is strictly intended for illustrative purposes, allowing you to understand its appearance and format better.
Data point | Processing | Description | Data type |
---|---|---|---|
company_last_updated | Cleaned | Record update date | String (date) |
company_created_at | Cleaned | Record creation date | String (date) |
professional_network_source_id | Raw | Record identification key assigned by professional network | String |
Data point | Processing | Description | Data type |
---|---|---|---|
company_id | Raw | Company ID in our database | Number (integer) |
company_hash | Raw | Company profile URL processed by the MD5 algorithm. | String |
company_canonical_shorthand_name_hash | Raw | Canonical shorthand name processed by the MD5 algorithm | String |
company_name | Cleaned | Company name | String |
company_logo | Cleaned | BASE64 encoded JPEG image of the company's logo | String |
company_ticker | Cleaned | Company's stock ticker | String |
company_exchange | Cleaned | Company's stock exchange | String |
Data point | Processing | Description | Data type |
---|---|---|---|
company_industry | Cleaned | Company's industry | String |
company_type | Cleaned | Company type | String |
company_founded | Cleaned | Company's founding year | String |
company_size_range | Cleaned | Company size range | String |
company_size_employees_count | Enriched | The number of employees working in the company | Number (integer) |
company_followers | Cleaned | The number of company followers | Number (integer) |
company_description | Cleaned | Company description | String |
company_specialities | Raw | Company specialties | String |
metadata_title | Enriched | Company title parsed from additional sources | String |
metadata_description | Enriched | Company description parsed from additional sources | String |
company_enriched_summary | Enriched | LLM enriched company summary | String |
company_enriched_category | Enriched | Company category assigned with LLM | String |
company_enriched_keywords | Enriched | LLM enriched company keywords | Array of strings |
company_enriched_b2b | Enriched | Marks if the company offers B2B products/services enriched with the help of LLM 1 - B2B company 0 - not B2B company | Boolean |
Data point | Processing | Description | Data type |
---|---|---|---|
pricing_available | Enriched | Marks if the company service pricing is available online | Boolean |
free_trial_available | Enriched | Marks if the company offers a free trial of their services | Boolean |
demo_available | Enriched | Marks if the company offers a demo | Boolean |
is_downloadable | Enriched | Marks if the company offers a downloadable file/service | Boolean |
mobile_apps_exist | Enriched | Marks if the company has mobile apps for their service | Boolean |
online_reviews_exist | Enriched | Marks if the company has any online reviews | Boolean |
api_docs_exist | Enriched | Marks if the company has API docs published | Boolean |
Data point | Processing | Description | Data type |
---|---|---|---|
company_phone_numbers | Enriched | Publicly available company phone number | Array of strings |
company_emails | Enriched | Publicly available company email address | Array of strings |
Data point | Processing | Description | Data type |
---|---|---|---|
company_websites_main_original | Raw | Company website | String |
company_websites_main | Enriched | Cleaned and resolved website URL | String |
company_websites_facebook | Enriched | Facebook profile URL | String |
company_websites_twitter | Enriched | Twitter profile URL | String |
company_websites_linkedin | Raw | Company LinkedIn profile URL | String |
company_websites_linkedin_ canonical | Raw | Canonical LinkedIn profile URL | String |
company_social_discord_urls | Enriched | Discord channel URL | Array of strings |
company_social_facebook_urls | Enriched | Facebook profile URL | Array of strings |
company_social_instagram_urls | Enriched | Instagram profile URL | Array of strings |
company_social_linkedin_urls | Enriched | Company LinkedIn profile URL | Array of strings |
company_social_pinterest_urls | Enriched | Pinterest profile URL | Array of strings |
company_social_tiktok_urls | Enriched | TikTok profile URL | Array of strings |
company_social_twitter_urls | Enriched | Twitter profile URL | Array of strings |
company_social_x_urls | Enriched | X profile URL | Array of strings |
company_social_youtube_urls | Enriched | YouTube channel/profile URL | Array of strings |
company_social_github_urls | Enriched | Github page/profile URL | Array of strings |
company_social_reddit_urls | Enriched | Reddit profile URL | Array of strings |
Data point | Processing | Description | Data type |
---|---|---|---|
company_location_hq_country | Cleaned | Headquarters country | String |
company_location_hq_raw_address | Cleaned | Detailed company location | String |
company_location_hq_regions | Enriched | Geographical region(s) the company is associated with based on the company_location_hq_country value. | String |
company_locations_full | Raw | Full company location information | Array of objects |
location_address | Raw | Company HQ location | String |
is_primary | Raw | Marks if the listed location is the primary | Boolean |
Data point | Processing | Description | Data type |
---|---|---|---|
company_funding_rounds | | Funding round details | Array of objects |
last_round_investors_count | Cleaned | The number of investors that participated in the last funding round | Number (integer) |
total_rounds_count | Cleaned | Total number of funding rounds | Number (integer) |
last_round_type | Cleaned | Last funding round type | String |
last_round_date | Cleaned | Last funding round date | String |
last_round_money_raised | Cleaned | Total funds raised | number (integer) |
financial_website_url | Raw | Financial website URL of the last funding round | String |
Data point | Processing | Description | Data type |
---|---|---|---|
company_technologies | Enriched | Technologies used by the company | Array of structs |
technology | Enriched | Technology name | String |
first_verified_at | Enriched | Date this technology was first assigned to the company. Date format: YYYY-MM-DD | String (date) |
last_verified_at | Enriched | Date this technology was last assigned to the company. Date format: YYYY-MM-DD | String (date) |
Data point | Processing | Description | Data type |
---|---|---|---|
expired_domain | Enriched | Indicates that the company_websites_main_original URL redirects to a domain dealer | Boolean |
unique_subdomain | Enriched | Indicates that only the record company owns the subdomain | Boolean |
unique_domain | Enriched | Indicates that only this company has the right to have this unique domain, e.g., company_websites_main: https://ibm.com | Boolean |
unique_website | Enriched | Indicates that only this company has a unique website but not necessarily a unique domain, e.g., company_websites_main: https://ibm.com/generation | Boolean |
Data point | Processing | Description | Data type |
---|---|---|---|
company_updates | | Company posts and related details | Array of objects |
urn | Raw | String-based identifier | String |
followers | Raw | Number of followers | String |
date | Raw | Post publish date (e.g., 1 month ago) | String |
description | Raw | Published text Note: may contain control characters | String |
reactions_count | Raw | Number of reactions on the post | Integer |
comments_count | Raw | Number of comments on the post | Integer |
reshared_post_author | Raw | Reshared post author | String |
reshared_post_author_url | Raw | Author's profile URL | String |
reshared_post_author_headline | Raw | Author's headline | String |
reshared_post_description | Raw | Reshared post text | String |
reshared_post_followers | Raw | The number of followers of the reshared post author | Integer |
reshared_post_date | Raw | Date the reshared post was published (e.g., 1 month ago) | String |