Dictionary: Clean Employee Data

Request access to our full documentation

Overview

Clean Employee Data provides high-quality, structured workforce data that is ready for immediate use. Our data is meticulously cleaned and enriched, enabling businesses to streamline operations, enhance decision-making, and optimize workforce analysis.

By leveraging Clean Employee Data, organizations can reduce engineering overhead, gain access to additional insights, and work with optimized data formats for improved efficiency. The data is available in JSONL, Parquet, and CSV formats, ensuring faster downloads and seamless integration.

With flexible retrieval options—including flat file downloads and API access—businesses in sales tech, HR intelligence, and investment sectors can efficiently access the workforce insights they need.

Clean Employee Data is derived from our Base Employee Data.

The data points are separated into collections to visualize the data better.

All personal/company information mentioned within this context is entirely fictional and is solely intended for illustrative purposes.

Metadata

Data point
Processing
Description
Data type

member_last_updated

Cleaned

Date the record was last updated

String

member_is_deleted

Raw

Indicates whether the profile was accessible: 1 – deleted or private 0 – publicly available

Integer

Meta data
"member_last_updated": "2023-07-29",
"member_is_deleted": 0
Cleaning actions
Data point
Cleaning action

member_last_updated

Value is converted to the yyyy-mm-dd format.


Identifiers

Data point
Processing
Description
Data type

member_id

Raw

Identification key in our database

Integer

member_websites_professional_network

Raw

Professional network profile URL

String

member_picture_url

Raw

Profile picture URL

String

member_full_name

Cleaned

Full name

String

member_name_first

Raw

First name

String

member_name_middle

Enriched

Middle name

String

member_name_last

Enriched

Last name

String

member_shorthand_names

Raw

A list of all historical employee shorthand names

Array of strings

member_follower_count

Raw

Number of profile followers

Integer

Identifiers
"member_id": 4290,
"member_full_name": "John Leonardo Doe",
"member_name_first": "John",
"member_name_middle": "Leonardo",
"member_name_last": "Doe",
"member_websites_professional_network": "https://www.professional_network.com/in/john-leonardo-doe",
"member_picture_url": "https://static.lnk.com/aero-v1/sc/h/9c8pery4andzj6ohjkjp54ma2",
"member_shorthand_names": [
        "john-lenoardo-doe"
    ],
"member_follower_count": 445,
Cleaning actions
Data point
Cleaning action

member_full_name

  • Special characters/emojis are removed;

  • Any words that follow a comma or are in parentheses are removed;

  • Titles (preceding or following the name) are removed.

member_name_middle

Parsed from member_full_name.

member_name_last

Parsed from member_full_name


Skills

Data point
Processing
Description
Data type

member_skills

Enriched

List of employees' skills

Array of strings

Skills
"member_skills": [
        "creative",
        "design",
        "electronics",
        "photography",
        "programming"
    ]
Enriching action
Data point
Enriching action

member_skills

Enriched with our ML model from different description fields.


Experience

Data point
Processing
Description
Data type

member_description

Raw

Job position description

String

company_id

Enriched

Identification key for the company associated with the employee's experience

Integer

member_job_title

Cleaned

Current job position title

String

is_decision_maker

Enriched

Indicates whether the employee is a decision-maker based on member_job_title 1 – Employee is marked as a decision-maker in the current role 0 – Employee is not marked as a decision-maker in the current role

Integer

member_job_description

Raw

Current job position description

String

member_headline

Raw

Job title found in the profile headline

String

member_generated_headline

Raw

A user-written headline that can be found in web search, also viewed and other publicly available spaces. It serves the same purpose as the title but is derived from a different source, potentially providing more accurate and up-to-date profile information. This field should be used in place title as it reflects the latest user activity.

String

total_experience_duration

Enriched

Summed up experience (displayed as years and months)

String

total_experience_duration_months

Enriched

Summed up employee experience (displayed as months)

Integer

Experience
"member_description": "Results-driven professional with extensive experience in supervisory roles, business analysis, project management, and financial analysis. Skilled in managing enterprise-wide implementations of healthcare information systems, with expertise in gathering and defining client data requirements.",
"company_id": 1111111,
"member_job_title": "Senior Consultant",
"is_decision_maker": 1,
"member_job_description": "Senior Business analyst @ Company123",
"member_headline": "Healthcare Consultant",
"member_generated_headline": "Healthcare Consultant at Company 123",
"total_experience_duration": "2 years 4 months",
"total_experience_duration_months": 28,
Cleaning and enriching actions
Data point
Cleaning/enriching action

company_id

Company ID from an active experience record from member_experience.

job_title

Special characters are removed.

total_experience_duration

Values converted to readable text.

total_experience_duration_months

Field aggregated from durationvalues.

Data point
Processing
Description
Data type

member_experience

-

Employee's work experience

Array of objects

company_id

Raw

Workplace (company) identifier in our database

Integer

date_from

Cleaned

Employment start date

String (date)

date_from_year

Cleaned

Employment start year

Integer

date_from_month

Cleaned

Employment start month

Integer

date_to

Cleaned

Employment end date

String (date)

date_to_year

Cleaned

Employment end year

Integer

date_to_month

Cleaned

Employment end month

Integer

company_url

Raw

Employee's workplace URL on professional network

String

company_name

Raw

Employer company

title

Raw

Job title

String

department

Enriched

Department the employee works in

String

management_level

Enriched

Employee's management level

String

description

Cleaned

Job description

String

order_in_profile

Raw

Record order as seen on the employee's profile

Integer

duration

Enriched

Employment duration

String (date)

duration_months

Cleaned

Employment duration in months

Integer

location

Cleaned

Job/workplace location

String

Experience
"member_experience": [
        {
            "company_id": 1774347,
            "date_from": "2015-10-01",
            "date_from_year": 2015,
            "date_from_month": 10,
            "date_to": "2016-09-01",
            "date_to_year": 2016,
            "date_to_month": 9,
            "company_name": "Company123, Ltd.",
            "company_url": "https://www.professional_network.com/company/company123",
            "title": "Senior Analyst",
            "description": "Financialconsulting for a leading manufacturing organizations.",
            "order_in_profile": 5,
            "duration": "1 year",
            "duration_months": 12,
            "department": "Project Management",
            "management_level": "Senior",
            "location": "Jacksonville, Florida Area"
        }
    ],
Cleaning and enriching actions
Data point
Cleaning/enriching action

date_from

Value is converted to the yyyy-mm-dd format.

date_from_year, date_from_month

  • Year value extracted from date_from value;

  • Value converted to integer.

date_to

Value is converted to the yyyy-mm-dd format.

date_to_year, date_to_month

  • Year value extracted from date_to value;

  • Value converted to integer.

department

Enriched with our ML model from the title value.

management_level

Enriched with our ML model from the member_job_title value.

description

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Value is replaced to None if the description is shorter than 3 characters;

  • Text styling tags removed;

  • Multiple spaces are replaced with single ones.

duration

Derived from date_from and date_to values.

duration_months

Duration converted in numerical value.

location

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Data point
Processing
Description
Data type

member_department

Enriched

Departments derived from the member_job_title

String

member_management_level

Enriched

Management levels identified from the member_job_title

String

is_working

Enriched

Represents if the employee is currently working 0 – the employee is currently not working 1 – the employee is currently working

Integer

Experience
"member_department": "Project Management",
"member_management_level": "Senior",
"is_working": 1
Enriching actions
Data point
Cleaning/enriching action

member_department

Enriched with our ML model from the member_job_title value.

member_subdepartment

Enriched with our ML model from the member_job_title value.

member_management_level

Enriched with our ML model from the member_job_title value.

is_working

Based on date_to and date_from values of employee experience.


Education

Data point
Processing
Description
Data type

member_education

Employee's education

Array of objects

major

Cleaned

Field of study

String

title

Cleaned

Educational institution

String

date_to

Cleaned

Graduation date

String

date_from

Cleaned

Enrolment date

String

institution_url

Cleaned

Institution's profile URL

String

description

Cleaned

Education description

String

activities_and_societies

Cleaned

Details about activities and societies

String

Education
 "member_education": [
        {
            "major": "Associate's degree, Business Administration and Management",
            "title": "Business College",
            "date_to": "2017",
            "date_from": "2015",
            "institution_url": "https://www.professional_network.com/school/business-college",
            "description": "Attended Business College from 2015 to 2017",
            "activities_and_societies": "Activities and Societies: Phi Theta Kappa"
        }
    ],
Cleaning actions
Data point
Cleaning action

title

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Values are capitalized.

major

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

date_from

Value is converted to the yyyy format.

date_to

Value is converted to the yyyy format.

institution_url

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

description

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Text styling tags are removed;

  • Multiple spaces are replaced with single ones.

activities_and_societies

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Text styling tags are removed;

  • Multiple spaces are replaced with single ones.


Hidden collections

Data point
Description
Data type

is_hidden

Marks if the employee profile has a hidden education/experience collection.

0 – education/experience information was available at the time of profile scraping. 1 – education/experience information was not available at the time of profile scraping

Number (integer)

is\_hidden + experience
"is_hidden": 0,
"member_experience": [
        {
            "company_id": 23124977,
            "date_from": "2020-02-01",
            "date_to": "2020-09-01"
        },
        {
            "company_id": 3140930,
            "date_from": "2023-06-01",
            "date_to": null
        }
    ]
}

Location

Data point
Processing
Description
Data type

member_location_raw_address

Cleaned

Raw address of the employee's location

String

member_location_country

Cleaned

Country of the employee's location

String

member_location_regions

Cleaned

Geographical regions within the employee's country

String

Location
    "member_location_raw_address": "Nashville Metropolitan Area United States",
    "member_location_country": "United States",
    "member_location_regions": "Northern America",
Cleaning actions
Data point
Cleaning action

location_raw_address

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Special trailed characters are trimmed;

  • Value is set to None if it is shorter than three characters;

  • The value of member_location_country is added at the end of the string.

location_country

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Recommendations and connections

Data point
Processing
Description
Data type

member_recommendations

Cleaned

Employee recommendations

Array of objects

recommendation

Cleaned

Recommendation text

String

referee_name

Raw

Referee's name

String

referee_url

Raw

Referee's profile URL

String

member_recommendations_count

Cleaned

Number of received recommendations

Integer

member_connections_count

Raw

Number of employee's connections

Integer

Recommendations and connections
"member_recommendations": [
    {
      "recommendation": "“John was a great asset in collaborating the tasks in different departments to produce the same goal. He was great at providing advice and asking questions to avoid even a tiny error during the process. Great to work with him!”",
      "referee_name": "Marry Doe",
      "referee_url": "www.professional_network.com/in/marry-doe",
      "order_in_profile": 1
    }
  ],
  "member_recommendations_count": 1,
  "member_connections_count": 15535,
Cleaning actions
Data point
Cleaning action

member_recommendations

Deleted rows are filtered out.

recommendation

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Value is set to None if it is shorter than three characters;

  • Text styling tags are removed;

  • Multiple spaces are replaced with single ones;

  • Empty recommendations are filtered out.

member_recommendations_count

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • None values are replaced with 0 and made an integer.


Languages

Data point
Processing
Description
Data type

member_languages

Employee's language knowledge

Array of objects

language

Cleaned

Language

String

proficiency

Cleaned

Language proficiency

String

order_in_profile

Raw

Record order in the section

Integer

Languages
"member_languages": [
        {
            "language": "English",
            "proficiency": "Intermediate",
            "order_in_profile": 1
        }
    ],
Cleaning actions
Data point
Cleaning action

language

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

proficiency

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Certifications

Data point
Processing
Description
Data type

member_certifications

Employee's certifications

Array of objects

title

Cleaned

Language

String

issuer

Cleaned

Language proficiency

String

credential_id

Cleaned

Record order in the section

String

certificate_url

Cleaned

Certificate URL

String

date_from

Cleaned

Issue date

String

date_to

Cleaned

Expiration date

String

issuer_url

Cleaned

Issuer profile URL

String

order_in_profile

Raw

Section record order

Integer

date_from_year

Cleaned

Issue year

Integer

date_from_month

Cleaned

Issue month

Integer

date_to_year

Cleaned

Expiration year

Integer

date_to_month

Cleaned

Expiration month

Integer

Certifications
"member_certifications": [
        {
            "title": "Data Analysis Certification B4",
            "issuer": "Data School123",
            "credential_id": "1345",
            "certificate_url": "http://data-analysis-certification-school123.com/verify?trk=public_profile_certification-title",
            "date_from": "2021-06-01",
            "date_to": "2024-06-01",
            "issuer_url": "https://www.professional_network.com/company/data-school-123",
            "order_in_profile": 1,
            "date_from_year": 2021,
            "date_from_month": 6,
            "date_to_year": 2024,
            "date_to_year": 6
        }
    ],
Cleaning actions
Data point
Cleaning action

title

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

issuer

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

date_from

Value is converted to the yyyy-mm-dd format.

date_to

Value is converted to the yyyy-mm-dd format.

issuer_url

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

date_from_year, date_to_year

Year value from date is converted to an integer.

date_from_month, date_to_month

Month value from date is converted to an integer.


Courses

Data point
Processing
Description
Data type

member_courses

Attended courses

Array of objects

organizer

Cleaned

Course organizer

String

title

Cleaned

Course title

String

order_in_profile

Raw

Record order in the section

Integer

Courses
 "member_courses": [
        {
            "organizer": "IT Academy",
            "title": "Microsoft Certified Excel Expert",
            "order_in_profile": 1
        }
    ],
Cleaning actions
Data point
Cleaning action

organizer

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

title

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.


Awards

Data point
Processing
Description
Data type

member_awards

Held awards

Array of objects

title

Cleaned

Award

String

issuer

Cleaned

Award issuer

String

description

Cleaned

Award description

String

date

Cleaned

Issue date

String

order_in_profile

Raw

Section record order

Integer

date_year

Cleaned

Issue year

Integer

date_month

Cleaned

Issue month

Integer

date_day

Cleaned

Issue day

Integer

Awards
"member_awards": [
        {
            "title": "Certified in Inventory Management",
            "issuer": "School of Operations Management",
            "description": "Certification in Production and Inventory Management",
            "date": "2001-01-01",
            "order_in_profile": 5,
            "date_year": 2001,
            "date_month": 1,
            "date_day": 1
        }
    ],
Cleaning actions
Data point
Cleaning action

title

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Values are capitalized.

issuer

Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None.

date

Value is converted to the yyyy-mm-dd format.

date_year

Year value from date is converted to an integer.

date_month

Month value from date is converted to an integer.


Activity

Data point
Processing
Description
Data type

member_activity

Interaction with posts on professional network

Array of objects

activity_url

Raw

Post URL

String

title

Cleaned

Post title

String

action

Cleaned

Interaction type

String

order_in_profile

Raw

Section record order

Integer

Activity
"member_activity": [
        {
            "activity_url": "https://www.professional_network.com/posts/company123-incorporated_healthcare-laborproductivity-activity-7161365554581172224-XUpZ",
            "title": "Company 123 is excited to introduce our Team Spotlight featuring John Doe! @Health Systems, Ltd #Healthcare #LaborProductivity",
            "action": "Liked by",
            "order_in_profile": 1
        }
    ],
Cleaning actions
Data point
Cleaning action

title

  • Values ["None"; "Unknown"; "NaN"; "nan"; "na"; "null"; "Null"; "NULL"; "-"; "--"] are replaced with value None;

  • Text styling tags removed;

  • Multiple spaces are replaced with single ones.

Organizations

Data point
Description
Data type

member_organizations

Memberships in organizations

Array of structs

organization

Organization title

String

position

Position in the organization

String

description

Description of the activity/experience in the organization

String

date_from

Membership start date

String

date_from_year

Membership start year

Integer

date_from_month

Membership start month

Integer

date_to

Membership end date

String

date_to_year

Membership end year

Integer

date_to_month

Membership end month

Integer

order_in_profile

The exact position of the organization in the profile

Integer

Organizations
  "member_public_profile_id": "123456789",
  "member_organizations": [
    {
      "organization": "Example Organization",
      "position": "Lead Software Engineer",
      "description": "Led a team of developers providing great services.",
      "date_from": "2019-06",
      "date_from_year": 2019,
      "date_from_month": 6,
      "date_to": "2023-09",
      "date_to_year": 2023,
      "date_to_month": 9,
      "order_in_profile": 1
    }
  ],

Patents

Data point
Description
Data type

member_patents

Authored patents

Array of structs

title

Patent title

String

status

Patent status

String

inventors

Inventors of the patent

Array of structs

full_name

Full name of the inventor

String

profile_url

Profile URL

String

order_in_profile

Order in profile

Integer

date

Patent filing date

String

date_year

Filling year

Integer

date_month

Filling month

Integer

date_day

Filling day

Integer

patent_url

Patent URL

String

description

Patent description

String

patent_or_application_number

Patent or application number

String

order_in_profile

The exact position of the patent in the profile

Integer

Patents
  "member_patents": [
    {
      "title": "Data Synchronization System",
      "status": "Granted",
      "inventors": [
        {
          "full_name": "John Doe",
          "profile_url": "https://www.professional-network.com/profile/johndoe",
          "order_in_profile": 1
        },
        {
          "full_name": "Jane Smith",
          "profile_url": "https://www.professional-network.com/profile/janesmith",
          "order_in_profile": 2
        }
      ],
      "date": "2022-01-01",
      "date_year": 2022,
      "date_month": 1,
      "date_day": 1,
      "patent_url": "https://wwww.patents.example.com/US1234567",
      "description": "A method for efficient synchronization of distributed systems in real-time environments.",
      "patent_or_application_number": "US1234567B2",
      "order_in_profile": 1
    }

Publications

Data point
Description
Data type

member_publications

Memberships in organizations

Array of structs

title

Publication title

String

publisher

Publisher name

String

date

Publication release date

String

date_year

Release year

Integer

date_month

Release month

Integer

date_day

Release day

Integer

description

Publication description

String

authors

Authors of the publication

Array of structs

full_name

Full name of the author

String

profile_url

Profile URL

String

order_in_profile

Order in the profile

Integer

publication_url

Publication website URL

String

order_in_profile

The exact position of the publication in the profile

Integer

Publications
   "member_publications": [
    {
      "title": "Microservices Architecture in Cloud Environments",
      "publisher": "Journal of Software Systems",
      "date": "2024-08-01",
      "date_year": 2024,
      "date_month": 8,
      "date_day": 1,
      "description": "An in-depth analysis of architectural patterns and scalability challenges in cloud-native microservices.",
      "authors": [
        {
          "full_name": "John Doe",
          "profile_url": "https://www.professional-network.com/profile/johndoe",
          "order_in_profile": 1
        }
      ],
      "publication_url": "https://www.publications.example.com/microservices-architecture",
      "order_in_profile": 1
    }
  ]
}

Last updated

Was this helpful?