Company Data
Clean Company Data

Dictionary: Clean Company Data

28min

Request access to our full documentation

This is a simplified version of our documentation. If you want to:

  • Access data samples
  • Learn more about the cleaning and enrichment actions
  • Explore the complete list of data sources we offer

Clean Company Data provides high-quality, structured business data ready for immediate use. Our data is meticulously cleaned and enriched, allowing organizations to streamline their workflows and confidently make data-driven decisions. By leveraging Clean Company Data, businesses can reduce engineering overhead, gain access to additional insights, and work with optimized data formats for improved efficiency.

With multiple retrieval options—including flat file downloads in JSONL, Parquet, and CSV formats, as well as API access—our solution adapts to your needs, ensuring seamless integration into your existing data infrastructure.

Clean Company Data is derived from our Base Company Data.

The data points are separated into collections to visualize the data better. The data provided in the samples is strictly intended for illustrative purposes, allowing you to understand its appearance and format better.

Metadata

Data point

Processing

Description

Data type

company_last_updated

Cleaned

Record update date

String (date)

company_created_at

Cleaned

Record creation date

String (date)

professional_network_source_id

Raw

Record identification key assigned by professional network

String

Meta data



Identifiers

Data point

Processing

Description

Data type

company_id

Raw

Company ID in our database

Number (integer)

company_hash

Raw

Company profile URL processed by the MD5 algorithm.

String

company_canonical_shorthand_name_hash

Raw

Canonical shorthand name processed by the MD5 algorithm

String

company_name

Cleaned

Company name

String

company_logo

Cleaned

BASE64 encoded JPEG image of the company's logo

String

company_ticker

Cleaned

Company's stock ticker

String

company_exchange

Cleaned

Company's stock exchange

String

Clean company data



Firmographics

Data point

Processing

Description

Data type

company_industry

Cleaned

Company's industry

String

company_type

Cleaned

Company type

String

company_founded

Cleaned

Company's founding year

String

company_size_range

Cleaned

Company size range

String

company_size_employees_count

Enriched

The number of employees working in the company

Number (integer)

company_followers

Cleaned

The number of company followers

Number (integer)

company_description

Cleaned

Company description

String

company_specialities

Raw

Company specialties

String

metadata_title

Enriched

Company title parsed from additional sources

String

metadata_description

Enriched

Company description parsed from additional sources

String

company_enriched_summary

Enriched

LLM enriched company summary

String

company_enriched_category

Enriched

Company category assigned with LLM

String

company_enriched_keywords

Enriched

LLM enriched company keywords

Array of strings

company_enriched_b2b

Enriched

Marks if the company offers B2B products/services enriched with the help of LLM

1 - B2B company

0 - not B2B company

Boolean

Clean company data



Product and services overview

Data point

Processing

Description

Data type

pricing_available

Enriched

Marks if the company service pricing is available online

Boolean

free_trial_available

Enriched

Marks if the company offers a free trial of their services

Boolean

demo_available

Enriched

Marks if the company offers a demo

Boolean

is_downloadable

Enriched

Marks if the company offers a downloadable file/service

Boolean

mobile_apps_exist

Enriched

Marks if the company has mobile apps for their service

Boolean

online_reviews_exist

Enriched

Marks if the company has any online reviews

Boolean

api_docs_exist

Enriched

Marks if the company has API docs published

Boolean

Clean company data



Contact information

Data point

Processing

Description

Data type

company_phone_numbers

Enriched

Publicly available company phone number

Array of strings

company_emails

Enriched

Publicly available company email address

Array of strings

Contact information



Social media and websites

Data point

Processing

Description

Data type

company_websites_main_original

Raw

Company website

String

company_websites_main

Enriched

Cleaned and resolved website URL

String

company_websites_facebook

Enriched

Facebook profile URL

String

company_websites_twitter

Enriched

Twitter profile URL

String

company_websites_linkedin

Raw

Company LinkedIn profile URL

String

company_websites_linkedin_ canonical

Raw

Canonical LinkedIn profile URL

String

company_social_discord_urls

Enriched

Discord channel URL

Array of strings

company_social_facebook_urls

Enriched

Facebook profile URL

Array of strings

company_social_instagram_urls

Enriched

Instagram profile URL

Array of strings

company_social_linkedin_urls

Enriched

Company LinkedIn profile URL

Array of strings

company_social_pinterest_urls

Enriched

Pinterest profile URL

Array of strings

company_social_tiktok_urls

Enriched

TikTok profile URL

Array of strings

company_social_twitter_urls

Enriched

Twitter profile URL

Array of strings

company_social_x_urls

Enriched

X profile URL

Array of strings

company_social_youtube_urls

Enriched

YouTube channel/profile URL

Array of strings

company_social_github_urls

Enriched

Github page/profile URL

Array of strings

company_social_reddit_urls

Enriched

Reddit profile URL

Array of strings

Clean company data
Company social links



Location

Data point

Processing

Description

Data type

company_location_hq_country

Cleaned

Headquarters country

String

company_location_hq_raw_address

Cleaned

Detailed company location

String

company_location_hq_regions

Enriched

Geographical region(s) the company is associated with based on the company_location_hq_country value.

String

company_locations_full

Raw

Full company location information

Array of objects

location_address

Raw

Company HQ location

String

is_primary

Raw

Marks if the listed location is the primary

Boolean

Locations
Full location



Funding information

Data point

Processing

Description

Data type

company_funding_rounds



Funding round details

Array of objects

last_round_investors_count

Cleaned

The number of investors that participated in the last funding round

Number (integer)

total_rounds_count

Cleaned

Total number of funding rounds

Number (integer)

last_round_type

Cleaned

Last funding round type

String

last_round_date

Cleaned

Last funding round date

String

last_round_money_raised

Cleaned

Total funds raised

number (integer)

financial_website_url

Raw

Financial website URL of the last funding round

String

Clean company data



Technologies

Data point

Processing

Description

Data type

company_technologies

Enriched

Technologies used by the company

Array of structs

technology

Enriched

Technology name

String

first_verified_at

Enriched

Date this technology was first assigned to the company.

Date format: YYYY-MM-DD

String (date)

last_verified_at

Enriched

Date this technology was last assigned to the company.

Date format: YYYY-MM-DD

String (date)

Technologies



Supporting fields

Data point

Processing

Description

Data type

expired_domain

Enriched

Indicates that the company_websites_main_original URL redirects to a domain dealer

Boolean

unique_subdomain

Enriched

Indicates that only the record company owns the subdomain

Boolean

unique_domain

Enriched

Indicates that only this company has the right to have this unique domain, e.g., company_websites_main: https://ibm.com

Boolean

unique_website

Enriched

Indicates that only this company has a unique website but not necessarily a unique domain, e.g., company_websites_main: https://ibm.com/generation

Boolean

Clean company data



Company updates

Data point

Processing

Description

Data type

company_updates



Company posts and related details

Array of objects

urn

Raw

String-based identifier



String

followers

Raw

Number of followers

String

date

Raw

Post publish date (e.g., 1 month ago)

String

description

Raw

Published text Note: may contain control characters

String

reactions_count

Raw

Number of reactions on the post

Integer

comments_count

Raw

Number of comments on the post

Integer

reshared_post_author

Raw

Reshared post author

String

reshared_post_author_url

Raw

Author's profile URL

String

reshared_post_author_headline

Raw

Author's headline

String

reshared_post_description

Raw

Reshared post text

String

reshared_post_followers

Raw

The number of followers of the reshared post author

Integer

reshared_post_date

Raw

Date the reshared post was published (e.g., 1 month ago)

String

Company updates table