# June 2025

## Craft Company data

### 🔧 Improvement: d**ata type change**

`CoinMarketcaps.marketcap`: Changed from **Integer** to **Number** to better support precision and accommodate larger values.

### Multi-source Company data and Multi-source Company API

### 📦 New features: **fields added**

* `company_logo_url` (String): Logo URL from professional networks.
* `active_job_postings` (Array of structs): Contains structured job posting data.
* `job_posting_id` (Long): Professional network job ID.
* `job_posting_title` (String): Title posted by the recruiter.

### 🔧 Improvements

#### **Data type standardization**:

* `is_b2b`: Changed from **Float** to **Integer**.
* `company_updates.date`: Converted to standardized Date format to reflect post publication dates.

#### **Currency code normalization**:

Replaced currency symbols with ISO currency codes for consistency across financial fields

* `ipo_share_price`
* `acquired_by_summary.currency`
* `last_funding_round_amount_raised_currency`
* `source_4_annual_revenue_range.annual_revenue_range_currency`
* `source_6_annual_revenue_range.annual_revenue_range_currency`
* `source_5_annual_revenue.annual_revenue_currency`
* `source_1_annual_revenue.annual_revenue_currency`
* `acquisition_list_source_5.currency`
* `acquisition_list_source_1.currency`
* `acquisition_list_source_2.currency`
* `revenue_quarterly.currency`
* `stock_information.currency`
* `funding_rounds.amount_raised_currency`
* `income_statements.currency`

### 🗑 Deprecation: f**ield removed**

`active_job_postings_titles` (Array of strings): Deprecated in favor of the structured `active_job_postings` collection.

## Base Employee data

### 📝 Summary aggregation updates

We are improving the freshness and consistency of employee summaries across all dataset variants:

* `summary` field now returns full values—no more truncation or nulls
* Displays the **last successfully scraped result**
* All HTML tags have been **removed** for formatting consistency across base, clean, and multi-source data

Please report any unexpected or corrupted results to the team for review.

***

### 🔁 Legacy dataset migration guidance

#### **✅ Full dataset ingest recommended**

For users accustomed to incremental updates, we recommend a **one-time full ingest** of the updated dataset to ensure:

* Better data quality and completeness
* Elimination of legacy duplicate records
* ID consistency across systems

#### **🔄 Mapping legacy IDs to new dataset IDs**

A new `member_ids_mapping_dataset` is provided to help match legacy IDs to updated ones:

* `legacy_dataset_id` → original IDs
* `updated_dataset_id` → new IDs

**Note**:\
Some `legacy_dataset_ids` may not map due to:

* Historic encoding bugs (dating back to 2019)
* Deletion or review status

This mapping ensures continuity when migrating primary keys between versions.

***

### 🧩 Field-by-field changelog

Refer to `Raw -_ Employee dataset changelog.xlsx` for details on what fields were **added, removed, or modified** in this version.

{% file src="<https://3176110779-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNWug9YwYHHZA07UREe8w%2Fuploads%2F1xsBwCUAO0Dbz1TtZCli%2FRaw%20-_%20Employee%20dataset%20changelog.xlsx?alt=media&token=f75a9ce8-111b-42f0-80f7-d25be6176736>" %}

***

### 📎 Deduplication handling and identification

Improved logic for recognizing and preserving duplicates:

* `is_parent = 1` → Original record
* `is_parent = 0` → Duplicate record (retained for completeness)
* Duplicate entries preserve identical content
* Unique identifier: `id`
* Support fields:
  * `shorthand_names`: All known shorthand names used by the employee
  * `historical_ids`: Previously assigned record IDs

## Clean Employee data

### 🧼 Summary improvements

We’ve made targeted improvements to the `member_description` field in the Clean Employee dataset to address issues with data staleness and formatting:

* The field now returns **complete values** instead of truncated or null data (when available)
* Displays the **last successfully scraped result** for improved data reliability
* All **HTML tags have been removed** to standardize formatting across base, clean, and multi-source datasets

## **Multi-source Employee data**

### 🔄 Summary enhancements

As part of our continued commitment to data freshness and quality, we’ve updated the `summary` field across the Multi-source Employee dataset:

* Now returns **full field values** (no truncation or nulls)
* Reflects the **most recent successful data scrape**
* **HTML tags have been removed** to align with formatting across all Employee data variants (base, clean, multi-source)

If you encounter unexpected or corrupted entries, please contact the team for support.

## **Base Employee API**

### 🛠 Bulk Collect result update

Bulk Collect results will now include previously excluded data categories and fields. These changes ensure more complete and consistent record delivery.

* **Effective delivery**: Starting June 4, 2025
* **Impact**: Annulled Bulk Collect exceptions; all available data fields will now be included in responses

#### ✅ Restored categories data fields

| Category     | Data field     |
| ------------ | -------------- |
| Publications | `authors`      |
| Patents      | `inventors`    |
| Projects     | `team_members` |

This update improves parity between standard API and Bulk Collect outputs.

### **Bulk Collect changes**

### 🆕 New Bulk Collect endpoints

**Impacted APIs:**

* Base Employee API
* Clean Employee API

**New endpoints:**

| API                | Endpoint                                                                                                                        |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------- |
| Base Employee API  | <p><code>/v2/data\_requests/employee\_base/shorthand\_names</code><br><code>/v2/data\_requests/employee\_base/urls</code></p>   |
| Clean Employee API | <p><code>/v2/data\_requests/employee\_clean/shorthand\_names</code><br><code>/v2/data\_requests/employee\_clean/urls</code></p> |

**Input requirements for `shorthand_names`:**

* No empty strings
* No capital letters
* No leading/trailing spaces
* No special characters (most)
* Length: 3–100 characters
* Max 10,000 names per request

***

### 📬 New response header

**New header**: `Location`

It provides a URL where the results of the bulk collection can be retrieved once processing is finished. The new header aims to improve user experience by providing immediate, direct access to the results endpoint.

* **Example**:

  ```
  Location: /v2/data_requests/e000b0ec-0f00-0b00-0a0a-0b00fa0000d0/files
  ```

***

### ⚠️ Breaking changes in Bulk Collect APIs

**Impacted APIs:**

* Clean Employee API
* Base Jobs API

#### **Clean Employee API changes:**

| Action  | Description      | Old Value                           | New Value                    |
| ------- | ---------------- | ----------------------------------- | ---------------------------- |
| Changed | Empty responses  | `""`                                | `null`                       |
| Renamed | Experience field | `experience.experience_description` | `experience.description`     |
| Changed | Timestamp format | `"2025-05-12"`                      | `"2025-05-12T00:00:00.000Z"` |
| Changed | ID data type     | `"561091029"` (string)              | `561091029` (integer)        |

### **Base Jobs API changes**

**🔸 Added field**:

`job_company_website`: Website of the job company (string)

**🔸 Changed fields**:

| Action  | Field                                | Old Value                   | New Value                              |
| ------- | ------------------------------------ | --------------------------- | -------------------------------------- |
| Renamed | `job_industries_collection`          | `job_industries_collection` | `job_industry_collection`              |
| Changed | `job_industry_collection` structure  | Flat list of industries     | Nested object with `job_industry_list` |
| Changed | `job_functions_collection` structure | Flat array                  | New structured format                  |

See structures' changes below:

{% tabs %}
{% tab title="job\_industry\_collection" %}

```
"job_industry_collection": [
	{
		"job_industry_list": {
			"industry": "Financial Services"
		}
	}
]
```

{% endtab %}

{% tab title="job\_functions\_collection" %}

```
"job_functions_collection": ["Health Care Provider"]
```

{% endtab %}
{% endtabs %}

**🔸 Removed fields**:

* `last_updated_ux`
* `redirected_url_hash`
* `job_status_log_collection`

***

### 📦 Dump naming prefix update

**Change:**

| Action  | Description            | Old Prefix           | New Prefix                |
| ------- | ---------------------- | -------------------- | ------------------------- |
| Changed | File naming convention | `part-xxxxx.json.gz` | `json/part-xxxxx.json.gz` |

***

## Company APIs sorting enhancements

### 🆕 What's new

**New capability**: Users can now sort Company API results by various **numerical fields**, beyond the traditional `last_updated`, `id`, or `_score`.

**Benefit**: More control and flexibility to surface the most relevant profiles, based on key business or engagement metrics.

### 🔧 Sorting behavior

* Default order: **Descending**
* Tie-breaker 1: `last_updated`
* Tie-breaker 2: `id`

### 🧾 New sorting fields

**Base Company API:**

* `employees_count`
* `source_id`

**Clean Company API:**

* `size_employees_count`
* `followers`

**Multi-source Company API:**

{% columns %}
{% column %}

* `employees_count`
* `followers_count_professional_network`
* `followers_count_twitter`
* `followers_count_owler`
* `num_technologies_used`
* `ipo_share_price`
* `last_funding_round_amount_raised`
* `last_funding_round_num_investors`
* `num_acquisitions_source_1`
* `num_acquisitions_source_2`
* `num_acquisitions_source_5`
* `product_reviews_count`
* `num_news_articles`
* `total_website_visits_monthly`
* `rank_global`
  {% endcolumn %}

{% column %}

* `rank_country`
* `rank_category`
* `company_employee_reviews_count`
* `active_job_postings_count`
* `product_reviews_aggregate_score`
* `visits_change_monthly`
* `bounce_rate`
* `pages_per_visit`
* `average_visit_duration_seconds`
* `company_employee_reviews_aggregate_score`
* `revenue_quarterly.value`
* `revenue_annual.source_1_annual_revenue.annual_revenue`
* `revenue_annual.source_5_annual_revenue.annual_revenue`
  {% endcolumn %}
  {% endcolumns %}

## **Base Company API**

### ⚠️ Breaking change: **Elasticsearch field rename**

To improve field clarity and alignment with naming conventions, we are renaming the `shorthand_name` field used in Elasticsearch operations. This affects **Elasticsearch search and Bulk Collect requests**.

| Action  | Old Field Name   | New Field Name           |
| ------- | ---------------- | ------------------------ |
| Changed | `shorthand_name` | `company_shorthand_name` |
