June 2025

Craft Company data

🔧 Improvement: data type change

CoinMarketcaps.marketcap: Changed from Integer to Number to better support precision and accommodate larger values.

Multi-source Company data and Multi-source Company API

📦 New features: fields added

  • company_logo_url (String): Logo URL from professional networks.

  • active_job_postings (Array of structs): Contains structured job posting data.

  • job_posting_id (Long): Professional network job ID.

  • job_posting_title (String): Title posted by the recruiter.

🔧 Improvements

Data type standardization:

  • is_b2b: Changed from Float to Integer.

  • company_updates.date: Converted to standardized Date format to reflect post publication dates.

Currency code normalization:

Replaced currency symbols with ISO currency codes for consistency across financial fields

  • ipo_share_price

  • acquired_by_summary.currency

  • last_funding_round_amount_raised_currency

  • source_4_annual_revenue_range.annual_revenue_range_currency

  • source_6_annual_revenue_range.annual_revenue_range_currency

  • source_5_annual_revenue.annual_revenue_currency

  • source_1_annual_revenue.annual_revenue_currency

  • acquisition_list_source_5.currency

  • acquisition_list_source_1.currency

  • acquisition_list_source_2.currency

  • revenue_quarterly.currency

  • stock_information.currency

  • funding_rounds.amount_raised_currency

  • income_statements.currency

🗑 Deprecation: field removed

active_job_postings_titles (Array of strings): Deprecated in favor of the structured active_job_postings collection.

Base Employee data

📝 Summary aggregation updates

We are improving the freshness and consistency of employee summaries across all dataset variants:

  • summary field now returns full values—no more truncation or nulls

  • Displays the last successfully scraped result

  • All HTML tags have been removed for formatting consistency across base, clean, and multi-source data

Please report any unexpected or corrupted results to the team for review.


🔁 Legacy dataset migration guidance

For users accustomed to incremental updates, we recommend a one-time full ingest of the updated dataset to ensure:

  • Better data quality and completeness

  • Elimination of legacy duplicate records

  • ID consistency across systems

🔄 Mapping legacy IDs to new dataset IDs

A new member_ids_mapping_dataset is provided to help match legacy IDs to updated ones:

  • legacy_dataset_id → original IDs

  • updated_dataset_id → new IDs

Note: Some legacy_dataset_ids may not map due to:

  • Historic encoding bugs (dating back to 2019)

  • Deletion or review status

This mapping ensures continuity when migrating primary keys between versions.


🧩 Field-by-field changelog

Refer to Raw -_ Employee dataset changelog.xlsx for details on what fields were added, removed, or modified in this version.


📎 Deduplication handling and identification

Improved logic for recognizing and preserving duplicates:

  • is_parent = 1 → Original record

  • is_parent = 0 → Duplicate record (retained for completeness)

  • Duplicate entries preserve identical content

  • Unique identifier: id

  • Support fields:

    • shorthand_names: All known shorthand names used by the employee

    • historical_ids: Previously assigned record IDs

Clean Employee data

🧼 Summary improvements

We’ve made targeted improvements to the member_description field in the Clean Employee dataset to address issues with data staleness and formatting:

  • The field now returns complete values instead of truncated or null data (when available)

  • Displays the last successfully scraped result for improved data reliability

  • All HTML tags have been removed to standardize formatting across base, clean, and multi-source datasets

Multi-source Employee data

🔄 Summary enhancements

As part of our continued commitment to data freshness and quality, we’ve updated the summary field across the Multi-source Employee dataset:

  • Now returns full field values (no truncation or nulls)

  • Reflects the most recent successful data scrape

  • HTML tags have been removed to align with formatting across all Employee data variants (base, clean, multi-source)

If you encounter unexpected or corrupted entries, please contact the team for support.

Base Employee API

🛠 Bulk Collect result update

Bulk Collect results will now include previously excluded data categories and fields. These changes ensure more complete and consistent record delivery.

  • Effective delivery: Starting June 4, 2025

  • Impact: Annulled Bulk Collect exceptions; all available data points will now be included in responses

✅ Restored categories data points

Category
Data point

Publications

authors

Patents

inventors

Projects

team_members

This update improves parity between standard API and Bulk Collect outputs.

Bulk Collect changes

🆕 New Bulk Collect endpoints

Impacted APIs:

  • Base Employee API

  • Clean Employee API

New endpoints:

API
Endpoint

Base Employee API

/v2/data_requests/employee_base/shorthand_names /v2/data_requests/employee_base/urls

Clean Employee API

/v2/data_requests/employee_clean/shorthand_names /v2/data_requests/employee_clean/urls

Input requirements for shorthand_names:

  • No empty strings

  • No capital letters

  • No leading/trailing spaces

  • No special characters (most)

  • Length: 3–100 characters

  • Max 10,000 names per request


📬 New response header

New header: Location

It provides a URL where the results of the bulk collection can be retrieved once processing is finished. The new header aims to improve user experience by providing immediate, direct access to the results endpoint.

  • Example:

    Location: /v2/data_requests/e000b0ec-0f00-0b00-0a0a-0b00fa0000d0/files

⚠️ Breaking changes in Bulk Collect APIs

Impacted APIs:

  • Clean Employee API

  • Base Jobs API

Clean Employee API changes:

Action
Description
Old Value
New Value

Changed

Empty responses

""

null

Renamed

Experience field

experience.experience_description

experience.description

Changed

Timestamp format

"2025-05-12"

"2025-05-12T00:00:00.000Z"

Changed

ID data type

"561091029" (string)

561091029 (integer)

Base Jobs API changes

🔸 Added field:

job_company_website: Website of the job company (string)

🔸 Changed fields:

Action
Field
Old Value
New Value

Renamed

job_industries_collection

job_industries_collection

job_industry_collection

Changed

job_industry_collection structure

Flat list of industries

Nested object with job_industry_list

Changed

job_functions_collection structure

Flat array

New structured format

See structures' changes below:

"job_industry_collection": [
	{
		"job_industry_list": {
			"industry": "Financial Services"
		}
	}
]

🔸 Removed fields:

  • last_updated_ux

  • redirected_url_hash

  • job_status_log_collection


📦 Dump naming prefix update

Change:

Action
Description
Old Prefix
New Prefix

Changed

File naming convention

part-xxxxx.json.gz

json/part-xxxxx.json.gz


Company APIs sorting enhancements

🆕 What's new

New capability: Users can now sort Company API results by various numerical fields, beyond the traditional last_updated, id, or _score.

Benefit: More control and flexibility to surface the most relevant profiles, based on key business or engagement metrics.

🔧 Sorting behavior

  • Default order: Descending

  • Tie-breaker 1: last_updated

  • Tie-breaker 2: id

🧾 New sorting fields

Base Company API:

  • employees_count

  • source_id

Clean Company API:

  • size_employees_count

  • followers

Multi-source Company API:

  • employees_count

  • followers_count_professional_network

  • followers_count_twitter

  • followers_count_owler

  • num_technologies_used

  • ipo_share_price

  • last_funding_round_amount_raised

  • last_funding_round_num_investors

  • num_acquisitions_source_1

  • num_acquisitions_source_2

  • num_acquisitions_source_5

  • product_reviews_count

  • num_news_articles

  • total_website_visits_monthly

  • rank_global

  • rank_country

  • rank_category

  • company_employee_reviews_count

  • active_job_postings_count

  • product_reviews_aggregate_score

  • visits_change_monthly

  • bounce_rate

  • pages_per_visit

  • average_visit_duration_seconds

  • company_employee_reviews_aggregate_score

  • revenue_quarterly.value

  • revenue_annual.source_1_annual_revenue.annual_revenue

  • revenue_annual.source_5_annual_revenue.annual_revenue

Base Company API

⚠️ Breaking change: Elasticsearch field rename

To improve field clarity and alignment with naming conventions, we are renaming the shorthand_name field used in Elasticsearch operations. This affects Elasticsearch search and Bulk Collect requests.

Action
Old Field Name
New Field Name

Changed

shorthand_name

company_shorthand_name

Last updated

Was this helpful?