June 2025
Craft Company data
🔧 Improvement: data type change
CoinMarketcaps.marketcap
: Changed from Integer to Number to better support precision and accommodate larger values.
Multi-source Company data and Multi-source Company API
📦 New features: fields added
company_logo_url
(String): Logo URL from professional networks.active_job_postings
(Array of structs): Contains structured job posting data.job_posting_id
(Long): Professional network job ID.job_posting_title
(String): Title posted by the recruiter.
🔧 Improvements
Data type standardization:
is_b2b
: Changed from Float to Integer.company_updates.date
: Converted to standardized Date format to reflect post publication dates.
Currency code normalization:
Replaced currency symbols with ISO currency codes for consistency across financial fields
ipo_share_price
acquired_by_summary.currency
last_funding_round_amount_raised_currency
source_4_annual_revenue_range.annual_revenue_range_currency
source_6_annual_revenue_range.annual_revenue_range_currency
source_5_annual_revenue.annual_revenue_currency
source_1_annual_revenue.annual_revenue_currency
acquisition_list_source_5.currency
acquisition_list_source_1.currency
acquisition_list_source_2.currency
revenue_quarterly.currency
stock_information.currency
funding_rounds.amount_raised_currency
income_statements.currency
🗑 Deprecation: field removed
active_job_postings_titles
(Array of strings): Deprecated in favor of the structured active_job_postings
collection.
Base Employee data
📝 Summary aggregation updates
We are improving the freshness and consistency of employee summaries across all dataset variants:
summary
field now returns full values—no more truncation or nullsDisplays the last successfully scraped result
All HTML tags have been removed for formatting consistency across base, clean, and multi-source data
Please report any unexpected or corrupted results to the team for review.
🔁 Legacy dataset migration guidance
✅ Full dataset ingest recommended
For users accustomed to incremental updates, we recommend a one-time full ingest of the updated dataset to ensure:
Better data quality and completeness
Elimination of legacy duplicate records
ID consistency across systems
🔄 Mapping legacy IDs to new dataset IDs
A new member_ids_mapping_dataset
is provided to help match legacy IDs to updated ones:
legacy_dataset_id
→ original IDsupdated_dataset_id
→ new IDs
Note:
Some legacy_dataset_ids
may not map due to:
Historic encoding bugs (dating back to 2019)
Deletion or review status
This mapping ensures continuity when migrating primary keys between versions.
🧩 Field-by-field changelog
Refer to Raw -_ Employee dataset changelog.xlsx
for details on what fields were added, removed, or modified in this version.
📎 Deduplication handling and identification
Improved logic for recognizing and preserving duplicates:
is_parent = 1
→ Original recordis_parent = 0
→ Duplicate record (retained for completeness)Duplicate entries preserve identical content
Unique identifier:
id
Support fields:
shorthand_names
: All known shorthand names used by the employeehistorical_ids
: Previously assigned record IDs
Clean Employee data
🧼 Summary improvements
We’ve made targeted improvements to the member_description
field in the Clean Employee dataset to address issues with data staleness and formatting:
The field now returns complete values instead of truncated or null data (when available)
Displays the last successfully scraped result for improved data reliability
All HTML tags have been removed to standardize formatting across base, clean, and multi-source datasets
Multi-source Employee data
🔄 Summary enhancements
As part of our continued commitment to data freshness and quality, we’ve updated the summary
field across the Multi-source Employee dataset:
Now returns full field values (no truncation or nulls)
Reflects the most recent successful data scrape
HTML tags have been removed to align with formatting across all Employee data variants (base, clean, multi-source)
If you encounter unexpected or corrupted entries, please contact the team for support.
Base Employee API
🛠 Bulk Collect result update
Bulk Collect results will now include previously excluded data categories and fields. These changes ensure more complete and consistent record delivery.
Effective delivery: Starting June 4, 2025
Impact: Annulled Bulk Collect exceptions; all available data points will now be included in responses
✅ Restored categories data points
Publications
authors
Patents
inventors
Projects
team_members
This update improves parity between standard API and Bulk Collect outputs.
Bulk Collect changes
🆕 New Bulk Collect endpoints
Impacted APIs:
Base Employee API
Clean Employee API
New endpoints:
Base Employee API
/v2/data_requests/employee_base/shorthand_names
/v2/data_requests/employee_base/urls
Clean Employee API
/v2/data_requests/employee_clean/shorthand_names
/v2/data_requests/employee_clean/urls
Input requirements for shorthand_names
:
No empty strings
No capital letters
No leading/trailing spaces
No special characters (most)
Length: 3–100 characters
Max 10,000 names per request
📬 New response header
New header: Location
It provides a URL where the results of the bulk collection can be retrieved once processing is finished. The new header aims to improve user experience by providing immediate, direct access to the results endpoint.
Example:
Location: /v2/data_requests/e000b0ec-0f00-0b00-0a0a-0b00fa0000d0/files
⚠️ Breaking changes in Bulk Collect APIs
Impacted APIs:
Clean Employee API
Base Jobs API
Clean Employee API changes:
Changed
Empty responses
""
null
Renamed
Experience field
experience.experience_description
experience.description
Changed
Timestamp format
"2025-05-12"
"2025-05-12T00:00:00.000Z"
Changed
ID data type
"561091029"
(string)
561091029
(integer)
Base Jobs API changes
🔸 Added field:
job_company_website
: Website of the job company (string)
🔸 Changed fields:
Renamed
job_industries_collection
job_industries_collection
job_industry_collection
Changed
job_industry_collection
structure
Flat list of industries
Nested object with job_industry_list
Changed
job_functions_collection
structure
Flat array
New structured format
See structures' changes below:
"job_industry_collection": [
{
"job_industry_list": {
"industry": "Financial Services"
}
}
]
🔸 Removed fields:
last_updated_ux
redirected_url_hash
job_status_log_collection
📦 Dump naming prefix update
Change:
Changed
File naming convention
part-xxxxx.json.gz
json/part-xxxxx.json.gz
Company APIs sorting enhancements
🆕 What's new
New capability: Users can now sort Company API results by various numerical fields, beyond the traditional last_updated
, id
, or _score
.
Benefit: More control and flexibility to surface the most relevant profiles, based on key business or engagement metrics.
🔧 Sorting behavior
Default order: Descending
Tie-breaker 1:
last_updated
Tie-breaker 2:
id
🧾 New sorting fields
Base Company API:
employees_count
source_id
Clean Company API:
size_employees_count
followers
Multi-source Company API:
employees_count
followers_count_professional_network
followers_count_twitter
followers_count_owler
num_technologies_used
ipo_share_price
last_funding_round_amount_raised
last_funding_round_num_investors
num_acquisitions_source_1
num_acquisitions_source_2
num_acquisitions_source_5
product_reviews_count
num_news_articles
total_website_visits_monthly
rank_global
rank_country
rank_category
company_employee_reviews_count
active_job_postings_count
product_reviews_aggregate_score
visits_change_monthly
bounce_rate
pages_per_visit
average_visit_duration_seconds
company_employee_reviews_aggregate_score
revenue_quarterly.value
revenue_annual.source_1_annual_revenue.annual_revenue
revenue_annual.source_5_annual_revenue.annual_revenue
Base Company API
⚠️ Breaking change: Elasticsearch field rename
To improve field clarity and alignment with naming conventions, we are renaming the shorthand_name
field used in Elasticsearch operations. This affects Elasticsearch search and Bulk Collect requests.
Changed
shorthand_name
company_shorthand_name
Last updated
Was this helpful?