Data Engineer
42 - 45 EUR/ godz.B2B (netto)
SeniorFull-time·B2B
#327670·Dodano 21 dni temu·21
Źródło: nofluffjobs.comTech Stack / Keywords
ETLNLPDatabricksSparkSQLPythonCloud platform
Firma i stanowisko
For our client, one of the Global Pharmaceutical Company, we realize a recruitment process for Data Engineer role.
Wymagania
- At least 5 years of experience as Data Engineer.
- Strong data engineering skills: Databricks, Spark, Delta Lake, SQL, ETL design and orchestration.
- Familiarity with clinical trial concepts (inclusion/exclusion criteria, endpoints, demographics) and biomedical terminologies.
- Practical experience with data modeling and working with end users to define requirements.
- Experience with CI/CD, testing frameworks, and monitoring for data pipelines and ML models.
- Experience with NLP for information extraction from scientific text (publications, registries).
- Fluency in English both written and spoken.
Nice to have:
- Experience with pharmaceutical sector or clinical research data environments.
Obowiązki
- Combine clinical data expertise with strong data engineering and technical skills to generate well documented pipelines from source to curated data sets in common data models like CDISC SDTM.
- Collaborate closely with clinical SMEs, data scientists, infrastructure, and other skilled data engineers.
- Include external benchmarking data as a FounData product and help automate extraction and harmonisation of competitor clinical trial data from public registries and publications into structured, analysis ready formats.
- Productionise and monitor pipelines and models; collaborate on CI/CD, testing, and user feedback.
- Implement ETL patterns (medallion architecture), ensuring data provenance, validation, and versioning.
- Take part of continuous improvement and validation of existing pipelines.
- Ensure clinical concepts are correctly represented and harmonized across data models (CDISC SDTM/ADaM, OMOP, HL7); contribute to mapping and transformation logic.
- Develop NLP models for entity and relation extraction (e.g., inclusion/exclusion criteria, demographics, endpoints, study design).
- Build automated pipelines to ingest registry and publication data and convert to tabular, queryable datasets.
- Co design the benchmarking data model with end users and map extracted information to standardized terminologies.
- Integrate human in the loop review, confidence scoring, and vocabulary/units normalization.
Oferta
- Sport Subscription
- Private healthcare
- International environment
- Life insurance
Karta sportowa
Opieka zdrowotna
Ework Group
64 aktywne oferty