Data Engineer

42 - 45 EUR/ godz.B2B (netto)

SeniorFull-time·B2B

#327670·Dodano 21 dni temu·21

Źródło: nofluffjobs.com

Tech Stack / Keywords

ETLNLPDatabricksSparkSQLPythonCloud platform

For our client, one of the Global Pharmaceutical Company, we realize a recruitment process for Data Engineer role.

At least 5 years of experience as Data Engineer.
Strong data engineering skills: Databricks, Spark, Delta Lake, SQL, ETL design and orchestration.
Familiarity with clinical trial concepts (inclusion/exclusion criteria, endpoints, demographics) and biomedical terminologies.
Practical experience with data modeling and working with end users to define requirements.
Experience with CI/CD, testing frameworks, and monitoring for data pipelines and ML models.
Experience with NLP for information extraction from scientific text (publications, registries).
Fluency in English both written and spoken.

Nice to have:

Combine clinical data expertise with strong data engineering and technical skills to generate well documented pipelines from source to curated data sets in common data models like CDISC SDTM.
Collaborate closely with clinical SMEs, data scientists, infrastructure, and other skilled data engineers.
Include external benchmarking data as a FounData product and help automate extraction and harmonisation of competitor clinical trial data from public registries and publications into structured, analysis ready formats.
Productionise and monitor pipelines and models; collaborate on CI/CD, testing, and user feedback.
Implement ETL patterns (medallion architecture), ensuring data provenance, validation, and versioning.
Take part of continuous improvement and validation of existing pipelines.
Ensure clinical concepts are correctly represented and harmonized across data models (CDISC SDTM/ADaM, OMOP, HL7); contribute to mapping and transformation logic.
Develop NLP models for entity and relation extraction (e.g., inclusion/exclusion criteria, demographics, endpoints, study design).
Build automated pipelines to ingest registry and publication data and convert to tabular, queryable datasets.
Co design the benchmarking data model with end users and map extracted information to standardized terminologies.
Integrate human in the loop review, confidence scoring, and vocabulary/units normalization.

Karta sportowa

Opieka zdrowotna

Ework Group

64 aktywne oferty