June 2025

Craft Company data

🔧 Improvement: data type change

CoinMarketcaps.marketcap: Changed from Integer to Number to better support precision and accommodate larger values.

Multi-source Company data and Multi-source Company API

📦 New features: fields added

company_logo_url (String): Logo URL from professional networks.
active_job_postings (Array of structs): Contains structured job posting data.
job_posting_id (Long): Professional network job ID.
job_posting_title (String): Title posted by the recruiter.

🔧 Improvements

Data type standardization:

is_b2b: Changed from Float to Integer.
company_updates.date: Converted to standardized Date format to reflect post publication dates.

Currency code normalization:

Replaced currency symbols with ISO currency codes for consistency across financial fields

ipo_share_price
acquired_by_summary.currency
last_funding_round_amount_raised_currency
source_4_annual_revenue_range.annual_revenue_range_currency
source_6_annual_revenue_range.annual_revenue_range_currency
source_5_annual_revenue.annual_revenue_currency
source_1_annual_revenue.annual_revenue_currency
acquisition_list_source_5.currency
acquisition_list_source_1.currency
acquisition_list_source_2.currency
revenue_quarterly.currency
stock_information.currency
funding_rounds.amount_raised_currency
income_statements.currency

🗑 Deprecation: field removed

active_job_postings_titles (Array of strings): Deprecated in favor of the structured active_job_postings collection.

Base Employee data

📝 Summary aggregation updates

We are improving the freshness and consistency of employee summaries across all dataset variants:

summary field now returns full values—no more truncation or nulls
Displays the last successfully scraped result
All HTML tags have been removed for formatting consistency across base, clean, and multi-source data

Please report any unexpected or corrupted results to the team for review.

🔁 Legacy dataset migration guidance

✅ Full dataset ingest recommended

For users accustomed to incremental updates, we recommend a one-time full ingest of the updated dataset to ensure:

Better data quality and completeness
Elimination of legacy duplicate records
ID consistency across systems

🔄 Mapping legacy IDs to new dataset IDs

A new member_ids_mapping_dataset is provided to help match legacy IDs to updated ones:

legacy_dataset_id → original IDs
updated_dataset_id → new IDs

Note: Some legacy_dataset_ids may not map due to:

Historic encoding bugs (dating back to 2019)
Deletion or review status

This mapping ensures continuity when migrating primary keys between versions.

🧩 Field-by-field changelog

Refer to Raw -_ Employee dataset changelog.xlsx for details on what fields were added, removed, or modified in this version.

32KB

Raw -_ Employee dataset changelog.xlsx

Open

📎 Deduplication handling and identification

Improved logic for recognizing and preserving duplicates:

is_parent = 1 → Original record
is_parent = 0 → Duplicate record (retained for completeness)
Duplicate entries preserve identical content
Unique identifier: id
Support fields:
- shorthand_names: All known shorthand names used by the employee
- historical_ids: Previously assigned record IDs

Clean Employee data

🧼 Summary improvements

We’ve made targeted improvements to the member_description field in the Clean Employee dataset to address issues with data staleness and formatting:

The field now returns complete values instead of truncated or null data (when available)
Displays the last successfully scraped result for improved data reliability
All HTML tags have been removed to standardize formatting across base, clean, and multi-source datasets

Multi-source Employee data

🔄 Summary enhancements

As part of our continued commitment to data freshness and quality, we’ve updated the summary field across the Multi-source Employee dataset:

Now returns full field values (no truncation or nulls)
Reflects the most recent successful data scrape
HTML tags have been removed to align with formatting across all Employee data variants (base, clean, multi-source)

If you encounter unexpected or corrupted entries, please contact the team for support.

Base Employee API

🛠 Bulk Collect result update

Bulk Collect results will now include previously excluded data categories and fields. These changes ensure more complete and consistent record delivery.

Effective delivery: Starting June 4, 2025
Impact: Annulled Bulk Collect exceptions; all available data points will now be included in responses

✅ Restored categories data points

Bulk Collect changes

🆕 New Bulk Collect endpoints

Impacted APIs:

Base Employee API
Clean Employee API

New endpoints:

API

Endpoint

Base Employee API

/v2/data_requests/employee_base/shorthand_names /v2/data_requests/employee_base/urls

Clean Employee API

/v2/data_requests/employee_clean/shorthand_names /v2/data_requests/employee_clean/urls

Input requirements for shorthand_names:

No empty strings
No capital letters
No leading/trailing spaces
No special characters (most)
Length: 3–100 characters
Max 10,000 names per request

📬 New response header

New header: Location

It provides a URL where the results of the bulk collection can be retrieved once processing is finished. The new header aims to improve user experience by providing immediate, direct access to the results endpoint.

Example:

Location: /v2/data_requests/e000b0ec-0f00-0b00-0a0a-0b00fa0000d0/files

⚠️ Breaking changes in Bulk Collect APIs

Impacted APIs:

Clean Employee API
Base Jobs API

Clean Employee API changes:

Action

Description

Old Value

New Value

Changed

Empty responses

""

null

Renamed

Experience field

experience.experience_description

experience.description

Changed

Timestamp format

"2025-05-12"

"2025-05-12T00:00:00.000Z"

Changed

ID data type

"561091029" (string)

561091029 (integer)

Base Jobs API changes

🔸 Added field:

job_company_website: Website of the job company (string)

🔸 Changed fields:

Action

Field

Old Value

New Value

Renamed

job_industries_collection

job_industry_collection

Changed

job_industry_collection structure

Flat list of industries

Nested object with job_industry_list

Changed

job_functions_collection structure

Flat array

New structured format

See structures' changes below:

"job_industry_collection": [
	{
		"job_industry_list": {
			"industry": "Financial Services"
		}
	}
]

🔸 Removed fields:

last_updated_ux
redirected_url_hash
job_status_log_collection

📦 Dump naming prefix update

Change:

Action

Description

Old Prefix

New Prefix

Changed

File naming convention

part-xxxxx.json.gz

json/part-xxxxx.json.gz

Company APIs sorting enhancements

🆕 What's new

New capability: Users can now sort Company API results by various numerical fields, beyond the traditional last_updated, id, or _score.

Benefit: More control and flexibility to surface the most relevant profiles, based on key business or engagement metrics.

🔧 Sorting behavior

Default order: Descending
Tie-breaker 1: last_updated
Tie-breaker 2: id

🧾 New sorting fields

Base Company API:

employees_count
source_id

Clean Company API:

size_employees_count
followers

Multi-source Company API:

employees_count
followers_count_professional_network
followers_count_twitter
followers_count_owler
num_technologies_used
ipo_share_price
last_funding_round_amount_raised
last_funding_round_num_investors
num_acquisitions_source_1
num_acquisitions_source_2
num_acquisitions_source_5
product_reviews_count
num_news_articles
total_website_visits_monthly
rank_global

rank_country
rank_category
company_employee_reviews_count
active_job_postings_count
product_reviews_aggregate_score
visits_change_monthly
bounce_rate
pages_per_visit
average_visit_duration_seconds
company_employee_reviews_aggregate_score
revenue_quarterly.value
revenue_annual.source_1_annual_revenue.annual_revenue
revenue_annual.source_5_annual_revenue.annual_revenue

Base Company API

⚠️ Breaking change: Elasticsearch field rename

To improve field clarity and alignment with naming conventions, we are renaming the shorthand_name field used in Elasticsearch operations. This affects Elasticsearch search and Bulk Collect requests.

Action

Old Field Name

New Field Name

Changed

shorthand_name

company_shorthand_name

PreviousJuly 2025 NextMay 2025

Last updated 5 months ago

Was this helpful?