Data Engineer
42 - 45 EUR/ godz.B2B (netto)
SeniorFull-time·B2B
#327669·Dodano 21 dni temu·20
Źródło: nofluffjobs.comTech Stack / Keywords
DatabricksETLNLPSparkPythonCloudSQL
Firma i stanowisko
For our client, one of the Global Pharmaceutical Company, we realize a recruitment process for Data Engineer role.
Wymagania
- Higher Education level within IT or similar is prefered.
- At least 5 years of experience as Data Engineer.
- Having hands-on experience with DataBricks, Python/SQL and Spark.
- Knowledge of cloud environment - AWS or Azure.
- Proven experience designing and implementing ETL pipelines in Databricks / Spark and Delta Lake.
- Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
- Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.
- Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
- Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
- Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
- Fluency in English both written and spoken.
Nice to have:
- Prior experience in the pharmaceutical sector or clinical research environments.
- Knowledge of data governance, privacy regulations and secure handling of patient data.
Obowiązki
- Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models.
- Implement harmonisation workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance.
- Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning.
- Develop configurable, input-driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs.
- Integrate AI/automation components (e.g., model-assisted mapping, NLP for free text) with human-in-the-loop review and confidence scoring.
- Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance.
- Collaborate with clinical data scientists, data stewards and stakeholders to define requirements, data contracts and success metrics.
Oferta
- Sport Subscription
- Private healthcare
- Life insurance
- International environment
Karta sportowa
Opieka zdrowotna
Ubezpieczenie
Ework Group
64 aktywne oferty