Dictionary: Clean Company Data

Request access to our full documentation

Clean Company Data provides high-quality, structured business data ready for immediate use. Our data is meticulously cleaned and enriched, allowing organizations to streamline their workflows and confidently make data-driven decisions. By leveraging Clean Company Data, businesses can reduce engineering overhead, gain access to additional insights, and work with optimized data formats for improved efficiency.

With multiple retrieval options—including flat file downloads in JSONL, Parquet, and CSV formats, as well as API access—our solution adapts to your needs, ensuring seamless integration into your existing data infrastructure.

Clean Company Data is derived from our Base Company Data.

The data points are separated into collections to visualize the data better. The data provided in the samples is strictly intended for illustrative purposes, allowing you to understand its appearance and format better.

Metadata

Data point
Processing
Description
Data type

company_last_updated

Cleaned

Record update date

String (date)

company_created_at

Cleaned

Record creation date

String (date)

professional_network_source_id

Raw

Record identification key assigned by professional network

String

Meta data
"company_created_at": "2023-12-06",
"company_last_updated": "2024-12-06",
"professional_network_source_id": "60191",
Cleaning actions
Data point
Cleaning action

company_last_updated

Value is converted to the yyyy-mm-dd format.

company_created_at

Value is converted to the yyyy-mm-dd format.


Identifiers

Data point
Processing
Description
Data type

company_id

Raw

Company ID in our database

Number (integer)

company_hash

Raw

Company profile URL processed by the MD5 algorithm.

String

company_canonical_shorthand_name_hash

Raw

Canonical shorthand name processed by the MD5 algorithm

String

company_name

Cleaned

Company name

String

company_logo

Cleaned

BASE64 encoded JPEG image of the company's logo

String

company_ticker

Cleaned

Company's stock ticker

String

company_exchange

Cleaned

Company's stock exchange

String

Identifiers
    "company_id": 7811468,
    "company_hash": "8ef8d364df382df483f47fe3e56dc4cd",
    "company_canonical_shorthand_name_hash": "8631ca96b6f656040bf3326deeb38df6",
    "company_name": "Example Company",
    "company_logo": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAAjACMDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9U6K+K7P9rfx1cfFG7smj00aNHrBsltBbHd5Qm8v/AFm7O7HOfXtivc9N/aHsLLRr2612yuI3t/EFzoSGxj83zGjDOr7cgj5ByBnkHHHTtqYOtTtdXv2Pr8dwrmeAUHOKlzJP3Xd6/wBdLnsNFeX+EP2i/CXjjxNa6DpbXr6jcPNGqSQBQpiUNICd3YMp4zncMU/R/wBoHw3rj3S21vqWLc7Sz26gMd8CYHz5+9cR9cd/SsXQqp2cWeNPKcfTk4zoyTST26NtL72n9zPTaK5Tw78QYPFGlR6jYaTqr2zySxAyW6o26ORo3BBbPDIworNwknZnDPD1acnCas1o0fL3wn8TfCbwHqHie68ez6bY+JLfxJdzWzXsEjzIgkyhAUHo24jj39K0vgvN4g+JPg3xHq/hKO1Nu3jG/njkvoQfPtHgAOzeOHbeVz0GSD3r6ivvB2g6ndPc3ei6ddXD/emmtI3dvqSuTWjY2FtptrHbWlvFa28YwkUKBEX6AcCu+eLjK7Sd3bd3XyPtcZxHQr06kqdOTqT5b80lKMVG+kY8qdn5s+cbP4VePrO+iu49H0W2kjdQRarbqSm5/M2NsBVjG6oD/s88YJZqHwQ8U/2VZ/YNI0z7X5UInivEtzGMQxpIoABxkocEE9EPbj6YorL63O97I8NZ5ilLmSX3P/M838FaV4x0fw1a2dzBbQzxtJvX7QgyTIx3YWPABznHbPPNFekUVzOd3eyPJniHUm5uKu9f61CiiiszkCiiigAooooA/9k=",
    "company_ticker": "EXMP",
    "company_exchange": "NYSE",
Cleaning and enriching actions
Data point
Cleaning/enriching action

company_name

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

company_logo

Image is resized to 50x50px.

company_ticker

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Firmographics

Data point
Processing
Description
Data type

company_industry

Cleaned

Company's industry

String

company_type

Cleaned

Company type

String

company_founded

Cleaned

Company's founding year

String

company_size_range

Cleaned

Company size range

String

company_size_employees_count

Enriched

The number of employees working in the company

Number (integer)

company_size_employees_count

Enriched

Estimated number of employees, calculated based on inferred employee data

Number (integer)

company_followers

Cleaned

The number of company followers

Number (integer)

company_description

Cleaned

Company description

String

company_specialities

Raw

Company specialties

String

metadata_title

Enriched

Company title parsed from additional sources

String

metadata_description

Enriched

Company description parsed from additional sources

String

company_enriched_summary

Enriched

LLM enriched company summary

String

company_enriched_category

Enriched

Company category assigned with LLM

String

company_enriched_keywords

Enriched

LLM enriched company keywords

Array of strings

company_enriched_b2b

Enriched

Marks if the company offers B2B products/services enriched with the help of LLM 1 – B2B company 0 – not B2B company

Integer

Firmographics
    "company_type": "Partnership",
    "company_founded": "2010",
    "company_followers": 0,
    "company_size_range": "1-10 employees",
    "company_size_employees_count": 2,
    "company_size_employees_count": 2,
    "company_industry": "Advertising Services",
    "company_description": "We help SMEs grow their businesses through effective online marketing strategies. ",
    "company_specialities": "Email Marketing, Web Sites, Search Engine Optimisation, Inbound Marketing, Social media Marketing",
    "company_enriched_summary": "Company1 is a premier web design and digital marketing agency based in London, UK. Specializing in custom, responsive websites, they provide professional design services, training, easy content management, and ongoing support.",
    "company_enriched_keywords": [
        "website design",
        "digital marketing",
        "professional",
        "custom responsive websites",
        "training"
    ],
    "company_enriched_b2b": 1.0,
    "company_enriched_category": "Web Design",
    "metadata_title": "Marketing, London,Cost Effective Web Design",
    "metadata_description": null
Cleaning and enriching actions
Data point
Cleaning/enriching action

company_industry

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

company_type

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

company_founded

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Values are replaced with None if the year is not between 500 and the current year.

company_followers

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

company_size_range

Some inconsistencies are fixed with overlapping values:

  • "1 employee" – "Myself Only";

  • "2-10 employees" – "1-10 employees";

  • "501-1,000 employees" – "501-1000 employees";


  • "1,001-5,000 employees" – "1001-5000 employees".

company_size_employees_count

When company_size_employees_count is 0, we check if we have any scraped profiles of employees working at this company. If yes, then we count how many employees are associated with it and change the value to that number. This can occur in cases when the public profile does not show some of the employees.

company_industry

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

company_description

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Value is replaced to None if the description is shorter than 3 characters;

  • Text styling tags removed;


  • Multiple spaces are replaced with single ones.


Product and services overview

Data point
Processing
Description
Data type

pricing_available

Enriched

Marks if the company service pricing is available online

Boolean

free_trial_available

Enriched

Marks if the company offers a free trial of their services

Boolean

demo_available

Enriched

Marks if the company offers a demo

Boolean

is_downloadable

Enriched

Marks if the company offers a downloadable file/service

Boolean

mobile_apps_exist

Enriched

Marks if the company has mobile apps for their service

Boolean

online_reviews_exist

Enriched

Marks if the company has any online reviews

Boolean

api_docs_exist

Enriched

Marks if the company has API docs published

Boolean

Product and services overview
    "pricing_available": true,
    "free_trial_available": false,
    "demo_available": false,
    "is_downloadable": false,
    "mobile_apps_exist": false,
    "online_reviews_exist": false,
    "api_docs_exist": false,
Enriching actions
Data point
Enriching action

pricing_available, free_trial_available, demo_available, is_downloadable, mobile_apps_exist, online_reviews_exist, api_docs_exist

Information taken from the official company website.


Contact information

Data point
Processing
Description
Data type

company_phone_numbers

Enriched

Publicly available company phone number

Array of strings

company_emails

Enriched

Publicly available company email address

Array of strings

Contact information
"company_phone_numbers": [
        "0000 000 000"
    ],
    "company_emails": [
        "[email protected]"
    ],
Enriching actions
Data point
Enriching action

company_phone_numbers, company_emails

Information taken from the official company website.


Social media and websites

Data point
Processing
Description
Data type

company_websites_main_original

Raw

Company website

String

company_websites_main

Enriched

Cleaned and resolved website URL

String

company_websites_facebook

Enriched

Facebook profile URL

String

company_websites_twitter

Enriched

Twitter profile URL

String

company_websites_professional_network

Raw

Company professional network profile URL

String

company_websites_professional_network_canonical

Raw

Canonical professional netwok profile URL

String

company_social_discord_urls

Enriched

Discord channel URL

Array of strings

company_social_facebook_urls

Enriched

Facebook profile URL

Array of strings

company_social_instagram_urls

Enriched

Instagram profile URL

Array of strings

company_social_professional_network_urls

Enriched

Company professional network profile URL

Array of strings

company_social_pinterest_urls

Enriched

Pinterest profile URL

Array of strings

company_social_tiktok_urls

Enriched

TikTok profile URL

Array of strings

company_social_twitter_urls

Enriched

Twitter profile URL

Array of strings

company_social_x_urls

Enriched

X profile URL

Array of strings

company_social_youtube_urls

Enriched

YouTube channel/profile URL

Array of strings

company_social_github_urls

Enriched

Github page/profile URL

Array of strings

company_social_reddit_urls

Enriched

Reddit profile URL

Array of strings

Social media and websites
 "company_websites_main_original": "http://www.example-company.com.",
 "company_websites_main": "https://example-company.com.",
 "company_websites_facebook": "https://www.facebook.com/example-company",
 "company_websites_twitter": "https://www.twitter.com/example-company",
 "company_websites_professional_network": "https://www.professional_network.com/company/example-company-international-limited",
 "company_websites_professional_network_canonical": "https://www.professional_network.com/company/example-company-international-limited",
Cleaning and enriching actions
Data point
Cleaning/enriching action

company_websites_main

  • Every company_website_main_original URL is resolved;

  • Each URL we collect is parsed, parameters are removed and added to the company_websites_main column. URL format in values is seen as <protocol>://<domain>.<tld>/<path>;

  • Only one company can have a unique <domain>.<tld>/<path>. If multiple companies have the same URL, we assign it to the company that has the highest number of employees;

  • Expired domains are removed;

  • Additional enrichment actions are completed

company_websites_twitter

If <domain> (from company_websites_main) == twitter, we move the URL value to company_websites_twitter.

company_websites_facebook

If <domain> (from company_websites_main) == facebook, we move the URL value to company_websites_facebook.

company_websites_professional_network

If <domain> (from company_websites_main) == professional_network, we move the URL value to company_websites_professional_network.

company_social_discord_urls, company_social_facebook_urls, company_social_instagram_urls, company_social_professional_network_urls, company_social_pinterest_urls, company_social_tiktok_urls, company_social_twitter_urls, company_social_x_urls, company_social_youtube_urls, company_social_github_urls, company_social_reddit_urls

URLs taken from the official company website.


Location

Data point
Processing
Description
Data type

company_location_hq_country

Cleaned

Headquarters country

String

company_location_hq_raw_address

Cleaned

Detailed company location

String

company_location_hq_regions

Enriched

Geographical region(s) the company is associated with based on the company_location_hq_country value.

String

company_locations_full

Raw

Full company location information

Array of objects

location_address

Raw

Company HQ location

String

is_primary

Raw

Marks if the listed location is the primary

Boolean

Locations
"company_location_hq_raw_address": "Los Angeles, CA, United States",
"company_location_hq_country": "United States",
"company_location_hq_regions": "[Northern America, Northern America, AMER]",
Cleaning actions
Data point
Cleaning action

location_hq_country

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

location_hq_raw_address

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Special trailing characters trimmed;


  • Value company_location_hq_country added to the end of the string (separated by a comma).


Funding information

Data point
Processing
Description
Data type

company_funding_rounds

Funding round details

Array of objects

last_round_investors_count

Cleaned

The number of investors that participated in the last funding round

Number (integer)

total_rounds_count

Cleaned

Total number of funding rounds

Number (integer)

last_round_type

Cleaned

Last funding round type

String

last_round_date

Cleaned

Last funding round date

String

last_round_money_raised

Cleaned

Total funds raised

number (integer)

financial_website_url

Raw

Financial website URL of the last funding round

String

Funding information
 "company_funding_rounds": [
        {
            "last_round_investors_count": 5,
            "total_rounds_count": 3,
            "last_round_type": "Series A",
            "last_round_date": "2020-11-10",
            "last_round_money_raised": 15600000,
            "financial_website_url": "https://www.financial_website.com/funding_round/example-company-series-a--f1687fe3"
        }
    ]
}
Cleaning actions
Data point
Cleaning action

company_funding_rounds

  • Duplicate data points filtered out;

  • Removed empty/irrelevant data points.

last_round_investors_count

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

total_rounds_count

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer.

last_round_type

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0.

last_round_date

Value is converted to the yyyy-mm-dd format.

last_round_money_raised

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value 0;

  • Every value is converted to an integer (integer value is parsed from the text value).


Technologies

Data point
Processing
Description
Data type

company_technologies

Enriched

Technologies used by the company

Array of structs

technology

Enriched

Technology name

String

first_verified_at

Enriched

Date this technology was first assigned to the company. Date format: YYYY-MM-DD

String (date)

last_verified_at

Enriched

Date this technology was last assigned to the company. Date format: YYYY-MM-DD

String (date)

Technologies
"company_technologies": [
    {
      "technology": "React",
      "first_verified_at": "2022-03-15",
      "last_verified_at": "2025-02-15"
    }
  ]
Enriching actions
Data point
Enriching action

company_technologies

Enriched by our ML model from multiple sources.


Supporting fields

Data point
Processing
Description
Data type

expired_domain

Enriched

Indicates that the company_websites_main_original URL redirects to a domain dealer

Integer

unique_subdomain

Enriched

Indicates that only the record company owns the subdomain

Integer

unique_domain

Enriched

Indicates that only this company has the right to have this unique domain, e.g., company_websites_main: https://ibm.com

Integer

unique_website

Enriched

Indicates that only this company has a unique website but not necessarily a unique domain, e.g., company_websites_main: https://ibm.com/generation

Integer

Supporting fields
    "expired_domain": 0,
    "unique_domain": 1,
    "unique_subdomain": 1,
    "unique_website": 0,

Company updates

Data point
Processing
Description
Data type

company_updates

Company posts and related details

Array of objects

urn

Raw

String-based identifier

String

followers

Raw

Number of followers

String

date

Raw

Post publish date (e.g., 1 month ago)

String

description

Raw

Published text

Note: may contain control characters

String

reactions_count

Raw

Number of reactions on the post

Integer

comments_count

Raw

Number of comments on the post

Integer

reshared_post_author

Raw

Reshared post author

String

reshared_post_author_url

Raw

Author's profile URL

String

reshared_post_author_headline

Raw

Author's headline

String

reshared_post_description

Raw

Reshared post text

String

reshared_post_followers

Raw

The number of followers of the reshared post author

Integer

reshared_post_date

Raw

Date the reshared post was published (e.g., 1 month ago)

String

Company updates
"company_updates_collection": [
      {
        "urn": "urn:pn:activity:6991335602751201281",
        "followers": 1371,
        "date": "1mo",
        "description": "Example description",
        "reactions_count": 22,
        "comments_count": 2,
        "reshared_post_author": "John Doe",
        "reshared_post_author_url": "https://www.professional_network.com/in/john-doe",
        "reshared_post_author_headline": "Co-Founder at Example Company, TEDx & Keynote Speaker",
        "reshared_post_description": "Example description",
        "reshared_post_followers": 45,
        "reshared_post_date": "1mo"
      }
  ]

Last updated

Was this helpful?