Data Engineer
42 EUR/ godz.B2B (netto)
SeniorFull-time·B2B
#327791·Dodano 20 dni temu·23
Źródło: emagineTech Stack / Keywords
Data ScienceDatabricksAIETLArchitectureNLPTestingCI/CD
Firma i stanowisko
This is a full-time consultant role focused on developing advanced data solutions in clinical research and Real World Data integration. The position offers remote work and collaboration with various stakeholders across the organization.
Wymagania
- Proven experience designing and implementing ETL pipelines in Databricks/Spark and Delta Lake.
- Strong knowledge of OMOP Common Data Model and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
- Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical and Real World Data datasets.
- Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
- Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
- Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
- Good communication skills and experience working with domain experts to capture requirements.
Nice to have:
- Prior experience in pharmaceutical or clinical research environments.
- Knowledge of data governance, privacy regulations, and secure handling of patient data.
- Experience with Unity Catalog, Databricks Delta Sharing, and cloud infrastructure (Azure/AWS).
Obowiązki
- Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest Real World Data (registries, claims, EHR extracts) and transform into standard models.
- Implement harmonisation workflows to map incoming Real World Data to OMOP and to the internal CDISC SDTM canonical model, including vocabulary mapping, units normalization, and provenance.
- Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning, and performance tuning.
- Develop configurable, input-driven transformation frameworks enabling clinical experts to drive mapping rules via config files and catalogs.
- Integrate AI/automation components such as model-assisted mapping and NLP for free text with human-in-the-loop review and confidence scoring.
- Establish testing, CI/CD, monitoring, and alerting for ETL jobs and automations to ensure reproducibility, versioning, and governance.
- Collaborate with clinical data scientists, data stewards, and stakeholders to define requirements, data contracts, and success metrics.
Oferta
- Long Term B2B Contract
- Full Remote work
- 42 euro/hour + VAT
emagine
205 aktywnych ofert