May 2025

Clean Employee data and Clean Employee API

📦 New features

New root-level field added:

  • public_profile_id (String): Publicly provided employee URN added to each employee record.

New collections introduced:

  • patents (Array of Structs)

  • publications (Array of Structs)

  • organizations (Array of Structs)

🔧 Improvements

Experience data logic updates:

  • Hidden experiences that no longer aggregate are now correctly marked with deleted=1.

  • Logic corrected to update date_to fields when newer experience records appear.

  • Removed inferred company HQ locations from experience records where original location was empty, addressing mismatches.

Data cleaning enhancements:

HTML tags have been stripped from description fields across multiple entities for cleaner presentation and parsing:

  • experience.description

  • education.description

  • summary

  • awards.description

  • patents.description

  • publications.description

  • projects.description

  • organizations

Multi-source Company data and Multi-source Company API

📦 New feature: fields added

  • employees_count_inferred (Integer): Estimated employee count based on inferred data.

  • employees_count_inferred_by_month (Array of structs): Historical inferred employee counts over a rolling three-year window.

  • employees_count_inferred_by_month.employees_count_inferred (Integer): Estimated employee count based on inferred data.

  • date (String): Date identifier.

🔧 Improvements

Key executive data quality:

Removed ~3.3M low-quality or stale profiles from the following collections:

  • key_executives

  • key_executive_arrivals

  • key_executive_departures

Rolling window standardization:

Implemented a consistent three-year rolling window for all *_by_month breakdowns:

  • active_job_postings_count_by_month

  • employees_count_by_month

  • employees_count_by_country_by_month

  • employees_count_breakdown_by_department_by_month

  • employees_count_breakdown_by_region_by_month

  • employees_count_breakdown_by_seniority_by_month

  • professional_network_followers_count_by_month

  • product_reviews_score_by_month

Elasticsearch schema updates (no impact on data dictionary):

Changed data types from Nested to Flattened for improved indexing and performance:

  • base_salary

  • total_salary

🐞 Bug fixes

Deduplication:

funding_rounds: Fixed duplicate entries in arrays.

Field normalization:

Resolved issues with lowercased values in several breakdown fields, improving mapping accuracy:

  • employees_count_breakdown_by_department

  • employees_count_breakdown_by_department_by_month

  • employees_count_breakdown_by_seniority

  • employees_count_breakdown_by_seniority_by_month

  • employees_count_breakdown_by_region

  • employees_count_breakdown_by_region_by_month

  • employees_count_by_country

  • employees_count_by_country_by_month

Search results

New feature: new API query parameter

items_per_page={int}: Allows clients to specify the number of results returned per page in search endpoints (maximum remains 1,000). Enables better control over paginated responses starting from May 21, 2025.

Sample
curl -X 'POST' \
'https://api.coresignal.com/cdapi/v2/employee_base/search/es_dsl?items_per_page=10' \
  -H 'accept: application/json' \
  -H 'apikey: {API Key}' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
        "bool": {
            "should": [
                {
                    "query_string": {
                        "query": "John Doe",
                        "default_field": "full_name",
                        "default_operator": "and"
                    }
                }
            ]
        }
    }
}'

Last updated

Was this helpful?