June 2025
Craft Company data
🔧 Improvement: data type change
CoinMarketcaps.marketcap: Changed from Integer to Number to better support precision and accommodate larger values.
Multi-source Company data and Multi-source Company API
📦 New features: fields added
company_logo_url(String): Logo URL from professional networks.active_job_postings(Array of structs): Contains structured job posting data.job_posting_id(Long): Professional network job ID.job_posting_title(String): Title posted by the recruiter.
🔧 Improvements
Data type standardization:
is_b2b: Changed from Float to Integer.company_updates.date: Converted to standardized Date format to reflect post publication dates.
Currency code normalization:
Replaced currency symbols with ISO currency codes for consistency across financial fields
ipo_share_priceacquired_by_summary.currencylast_funding_round_amount_raised_currencysource_4_annual_revenue_range.annual_revenue_range_currencysource_6_annual_revenue_range.annual_revenue_range_currencysource_5_annual_revenue.annual_revenue_currencysource_1_annual_revenue.annual_revenue_currencyacquisition_list_source_5.currencyacquisition_list_source_1.currencyacquisition_list_source_2.currencyrevenue_quarterly.currencystock_information.currencyfunding_rounds.amount_raised_currencyincome_statements.currency
🗑 Deprecation: field removed
active_job_postings_titles (Array of strings): Deprecated in favor of the structured active_job_postings collection.
Base Employee data
📝 Summary aggregation updates
We are improving the freshness and consistency of employee summaries across all dataset variants:
summaryfield now returns full values—no more truncation or nullsDisplays the last successfully scraped result
All HTML tags have been removed for formatting consistency across base, clean, and multi-source data
Please report any unexpected or corrupted results to the team for review.
🔁 Legacy dataset migration guidance
✅ Full dataset ingest recommended
For users accustomed to incremental updates, we recommend a one-time full ingest of the updated dataset to ensure:
Better data quality and completeness
Elimination of legacy duplicate records
ID consistency across systems
🔄 Mapping legacy IDs to new dataset IDs
A new member_ids_mapping_dataset is provided to help match legacy IDs to updated ones:
legacy_dataset_id→ original IDsupdated_dataset_id→ new IDs
Note:
Some legacy_dataset_ids may not map due to:
Historic encoding bugs (dating back to 2019)
Deletion or review status
This mapping ensures continuity when migrating primary keys between versions.
🧩 Field-by-field changelog
Refer to Raw -_ Employee dataset changelog.xlsx for details on what fields were added, removed, or modified in this version.
📎 Deduplication handling and identification
Improved logic for recognizing and preserving duplicates:
is_parent = 1→ Original recordis_parent = 0→ Duplicate record (retained for completeness)Duplicate entries preserve identical content
Unique identifier:
idSupport fields:
shorthand_names: All known shorthand names used by the employeehistorical_ids: Previously assigned record IDs
Clean Employee data
🧼 Summary improvements
We’ve made targeted improvements to the member_description field in the Clean Employee dataset to address issues with data staleness and formatting:
The field now returns complete values instead of truncated or null data (when available)
Displays the last successfully scraped result for improved data reliability
All HTML tags have been removed to standardize formatting across base, clean, and multi-source datasets
Multi-source Employee data
🔄 Summary enhancements
As part of our continued commitment to data freshness and quality, we’ve updated the summary field across the Multi-source Employee dataset:
Now returns full field values (no truncation or nulls)
Reflects the most recent successful data scrape
HTML tags have been removed to align with formatting across all Employee data variants (base, clean, multi-source)
If you encounter unexpected or corrupted entries, please contact the team for support.
Base Employee API
🛠 Bulk Collect result update
Bulk Collect results will now include previously excluded data categories and fields. These changes ensure more complete and consistent record delivery.
Effective delivery: Starting June 4, 2025
Impact: Annulled Bulk Collect exceptions; all available data points will now be included in responses
✅ Restored categories data points
Publications
authors
Patents
inventors
Projects
team_members
This update improves parity between standard API and Bulk Collect outputs.
Bulk Collect changes
🆕 New Bulk Collect endpoints
Impacted APIs:
Base Employee API
Clean Employee API
New endpoints:
Base Employee API
/v2/data_requests/employee_base/shorthand_names
/v2/data_requests/employee_base/urls
Clean Employee API
/v2/data_requests/employee_clean/shorthand_names
/v2/data_requests/employee_clean/urls
Input requirements for shorthand_names:
No empty strings
No capital letters
No leading/trailing spaces
No special characters (most)
Length: 3–100 characters
Max 10,000 names per request
📬 New response header
New header: Location
It provides a URL where the results of the bulk collection can be retrieved once processing is finished. The new header aims to improve user experience by providing immediate, direct access to the results endpoint.
Example:
Location: /v2/data_requests/e000b0ec-0f00-0b00-0a0a-0b00fa0000d0/files
⚠️ Breaking changes in Bulk Collect APIs
Impacted APIs:
Clean Employee API
Base Jobs API
Clean Employee API changes:
Changed
Empty responses
""
null
Renamed
Experience field
experience.experience_description
experience.description
Changed
Timestamp format
"2025-05-12"
"2025-05-12T00:00:00.000Z"
Changed
ID data type
"561091029" (string)
561091029 (integer)
Base Jobs API changes
🔸 Added field:
job_company_website: Website of the job company (string)
🔸 Changed fields:
Renamed
job_industries_collection
job_industries_collection
job_industry_collection
Changed
job_industry_collection structure
Flat list of industries
Nested object with job_industry_list
Changed
job_functions_collection structure
Flat array
New structured format
See structures' changes below:
"job_industry_collection": [
{
"job_industry_list": {
"industry": "Financial Services"
}
}
]"job_functions_collection": ["Health Care Provider"]🔸 Removed fields:
last_updated_uxredirected_url_hashjob_status_log_collection
📦 Dump naming prefix update
Change:
Changed
File naming convention
part-xxxxx.json.gz
json/part-xxxxx.json.gz
Company APIs sorting enhancements
🆕 What's new
New capability: Users can now sort Company API results by various numerical fields, beyond the traditional last_updated, id, or _score.
Benefit: More control and flexibility to surface the most relevant profiles, based on key business or engagement metrics.
🔧 Sorting behavior
Default order: Descending
Tie-breaker 1:
last_updatedTie-breaker 2:
id
🧾 New sorting fields
Base Company API:
employees_countsource_id
Clean Company API:
size_employees_countfollowers
Multi-source Company API:
employees_countfollowers_count_professional_networkfollowers_count_twitterfollowers_count_owlernum_technologies_usedipo_share_pricelast_funding_round_amount_raisedlast_funding_round_num_investorsnum_acquisitions_source_1num_acquisitions_source_2num_acquisitions_source_5product_reviews_countnum_news_articlestotal_website_visits_monthlyrank_global
rank_countryrank_categorycompany_employee_reviews_countactive_job_postings_countproduct_reviews_aggregate_scorevisits_change_monthlybounce_ratepages_per_visitaverage_visit_duration_secondscompany_employee_reviews_aggregate_scorerevenue_quarterly.valuerevenue_annual.source_1_annual_revenue.annual_revenuerevenue_annual.source_5_annual_revenue.annual_revenue
Base Company API
⚠️ Breaking change: Elasticsearch field rename
To improve field clarity and alignment with naming conventions, we are renaming the shorthand_name field used in Elasticsearch operations. This affects Elasticsearch search and Bulk Collect requests.
Changed
shorthand_name
company_shorthand_name
Last updated
Was this helpful?