Data Engineer

42 - 45 EUR/ godz.B2B (netto)
SeniorFull-time·B2B
#327669·Dodano 21 dni temu·20
Źródło: nofluffjobs.com
Aplikuj teraz

Tech Stack / Keywords

DatabricksETLNLPSparkPythonCloudSQL

Firma i stanowisko

For our client, one of the Global Pharmaceutical Company, we realize a recruitment process for Data Engineer role.


Wymagania

  • Higher Education level within IT or similar is prefered.
  • At least 5 years of experience as Data Engineer.
  • Having hands-on experience with DataBricks, Python/SQL and Spark.
  • Knowledge of cloud environment - AWS or Azure.
  • Proven experience designing and implementing ETL pipelines in Databricks / Spark and Delta Lake.
  • Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
  • Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.
  • Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
  • Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
  • Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
  • Fluency in English both written and spoken.

Nice to have:

  • Prior experience in the pharmaceutical sector or clinical research environments.
  • Knowledge of data governance, privacy regulations and secure handling of patient data.

Obowiązki

  • Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models.
  • Implement harmonisation workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance.
  • Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning.
  • Develop configurable, input-driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs.
  • Integrate AI/automation components (e.g., model-assisted mapping, NLP for free text) with human-in-the-loop review and confidence scoring.
  • Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance.
  • Collaborate with clinical data scientists, data stewards and stakeholders to define requirements, data contracts and success metrics.

Oferta

  • Sport Subscription
  • Private healthcare
  • Life insurance
  • International environment
Karta sportowa
Opieka zdrowotna
Ubezpieczenie
Ework Group

Ework Group

64 aktywne oferty

Zobacz wszystkie oferty
Aplikuj teraz