May 2025

Clean Employee data and Clean Employee API

πŸ“¦ New features

New root-level field added:

  • public_profile_id (String): Publicly provided employee URN added to each employee record.

New collections introduced:

  • patents (Array of Structs)

  • publications (Array of Structs)

  • organizations (Array of Structs)

πŸ”§ Improvements

Experience data logic updates:

  • Hidden experiences that no longer aggregate are now correctly marked with deleted=1.

  • Logic corrected to update date_to fields when newer experience records appear.

  • Removed inferred company HQ locations from experience records where original location was empty, addressing mismatches.

Data cleaning enhancements:

HTML tags have been stripped from description fields across multiple entities for cleaner presentation and parsing:

  • experience.description

  • education.description

  • summary

  • awards.description

  • patents.description

  • publications.description

  • projects.description

  • organizations

Multi-source Company data and Multi-source Company API

πŸ“¦ New feature: fields added

  • employees_count_inferred (Integer): Estimated employee count based on inferred data.

  • employees_count_inferred_by_month (Array of structs): Historical inferred employee counts over a rolling three-year window.

  • employees_count_inferred_by_month.employees_count_inferred (Integer): Estimated employee count based on inferred data.

  • date (String): Date identifier.

πŸ”§ Improvements

Key executive data quality:

Removed ~3.3M low-quality or stale profiles from the following collections:

  • key_executives

  • key_executive_arrivals

  • key_executive_departures

Rolling window standardization:

Implemented a consistent three-year rolling window for all *_by_month breakdowns:

  • active_job_postings_count_by_month

  • employees_count_by_month

  • employees_count_by_country_by_month

  • employees_count_breakdown_by_department_by_month

  • employees_count_breakdown_by_region_by_month

  • employees_count_breakdown_by_seniority_by_month

  • professional_network_followers_count_by_month

  • product_reviews_score_by_month

Elasticsearch schema updates (no impact on data dictionary):

Changed data types from Nested to Flattened for improved indexing and performance:

  • base_salary

  • total_salary

🐞 Bug fixes

Deduplication:

funding_rounds: Fixed duplicate entries in arrays.

Field normalization:

Resolved issues with lowercased values in several breakdown fields, improving mapping accuracy:

  • employees_count_breakdown_by_department

  • employees_count_breakdown_by_department_by_month

  • employees_count_breakdown_by_seniority

  • employees_count_breakdown_by_seniority_by_month

  • employees_count_breakdown_by_region

  • employees_count_breakdown_by_region_by_month

  • employees_count_by_country

  • employees_count_by_country_by_month

Search results

New feature: new API query parameter

items_per_page={int}: Allows clients to specify the number of results returned per page in search endpoints (maximum remains 1,000). Enables better control over paginated responses starting from May 21, 2025.

Last updated

Was this helpful?