# May 2025

## Clean Employee data and Clean Employee API

### 📦 New features

#### **New root-level field added**:

* `public_profile_id` (String): Publicly provided employee URN added to each employee record.

#### **New collections introduced**:

* `patents` (Array of Structs)
* `publications` (Array of Structs)
* `organizations` (Array of Structs)

### 🔧 Improvements

#### **Experience data logic updates**:

* Hidden experiences that no longer aggregate are now correctly marked with `deleted=1`.
* Logic corrected to update `date_to` fields when newer experience records appear.
* Removed inferred company HQ locations from experience records where original location was empty, addressing mismatches.

#### **Data cleaning enhancements**:

HTML tags have been stripped from description fields across multiple entities for cleaner presentation and parsing:

* `experience.description`
* `education.description`
* `summary`
* `awards.description`
* `patents.description`
* `publications.description`
* `projects.description`
* `organizations`

## Multi-source Company data and Multi-source Company API

### 📦 New feature: **fields added**

* `employees_count_inferred` (Integer): Estimated employee count based on inferred data.
* `employees_count_inferred_by_month` (Array of structs): Historical inferred employee counts over a rolling three-year window.
* `employees_count_inferred_by_month.employees_count_inferred` (Integer): Estimated employee count based on inferred data.
* `date` (String): Date identifier.

### 🔧 Improvements

#### **Key executive data quality**:

Removed \~3.3M low-quality or stale profiles from the following collections:

* `key_executives`
* `key_executive_arrivals`
* `key_executive_departures`

#### **Rolling window standardization**:

Implemented a consistent three-year rolling window for all \*\_by\_month breakdowns:

* `active_job_postings_count_by_month`
* `employees_count_by_month`
* `employees_count_by_country_by_month`
* `employees_count_breakdown_by_department_by_month`
* `employees_count_breakdown_by_region_by_month`
* `employees_count_breakdown_by_seniority_by_month`
* `professional_network_followers_count_by_month`
* `product_reviews_score_by_month`

#### **Elasticsearch schema updates** *(no impact on data dictionary)*:

Changed data types from **Nested** to **Flattened** for improved indexing and performance:

* `base_salary`
* `total_salary`

### 🐞 Bug fixes

#### **Deduplication**:

`funding_rounds`: Fixed duplicate entries in arrays.

#### **Field normalization**:

Resolved issues with lowercased values in several breakdown fields, improving mapping accuracy:

* `employees_count_breakdown_by_department`
* `employees_count_breakdown_by_department_by_month`
* `employees_count_breakdown_by_seniority`
* `employees_count_breakdown_by_seniority_by_month`
* `employees_count_breakdown_by_region`
* `employees_count_breakdown_by_region_by_month`
* `employees_count_by_country`
* `employees_count_by_country_by_month`

## Search results

### New feature: n**ew API query parameter**

`items_per_page={int}`: Allows clients to specify the number of results returned per page in search endpoints (maximum remains 1,000). Enables better control over paginated responses starting from May 21, 2025.

{% code title="Sample" %}

```json
curl -X 'POST' \
'https://api.coresignal.com/cdapi/v2/employee_base/search/es_dsl?items_per_page=10' \
  -H 'accept: application/json' \
  -H 'apikey: {API Key}' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
        "bool": {
            "should": [
                {
                    "query_string": {
                        "query": "John Doe",
                        "default_field": "full_name",
                        "default_operator": "and"
                    }
                }
            ]
        }
    }
}'
```

{% endcode %}
