Senior AI Platform Engineer (Python, AWS, Data Pipelines)

170 PLN/ godz./ godz.B2BB2B (netto)

SeniorFull-time·B2B

#378356·Dodano wczoraj·0

Źródło: emagine

Aplikuj teraz

Tech Stack / Keywords

AIPythonAWSLLMPostgreSQLSQLRESTCI/CD

Firma i stanowisko

We are building CaaS (Content as a Service) — a platform that transforms publisher content (PDF textbooks and Excel manifests) into structured, enriched, AI-ready data. The platform processes content once and exposes it through a unified service layer used by multiple downstream applications. The project is in the Media / Telco industry.

Wymagania

Must-have:

Strong Python development experience (production systems)
AWS experience (S3, Glue, Aurora)
Experience with data pipelines (ETL / batch processing)
Strong SQL and PostgreSQL experience
Experience with schema design and migrations

Nice to have:

Experience with LLMs in production (OCR, content processing, enrichment)
Prompt engineering / context engineering
Experience with vector databases (Weaviate, Pinecone, Qdrant, pgvector)
Knowledge of embeddings, semantic search, and RAG
Experience with FastAPI
Experience with Airflow / MWAA
Experience building data platforms serving multiple consumers

Obowiązki

Data Engineering & Pipelines:

Build and maintain multi-stage data ingestion pipelines
Design and implement idempotent, restartable batch processing workflows
Use S3 as core storage layer for raw and processed data
Implement pipeline stages including content ingestion and book identity assignment, PDF-to-markdown conversion (AI OCR), table of contents and structure extraction, hierarchical chunking, and embedding generation

AI / LLM Processing:

Use LLMs and OCR models to extract structured data from PDFs
Design prompts and context strategies for consistent outputs
Generate structured metadata and enrich content for downstream use cases

Data Storage & Consistency:

Maintain PostgreSQL (Aurora) as system of record
Design and maintain SQL schemas and versioned migrations
Ensure data consistency across S3, PostgreSQL (Aurora), and vector database (Weaviate)
Implement reconciliation logic across distributed systems

Retrieval & Vector Search:

Work with Weaviate for vector search and semantic retrieval
Support RAG-based applications
Design data organization strategies by subject, country, and client

APIs & Integration:

Build REST APIs using FastAPI
Expose content as a service for multiple downstream applications
Integrate with internal and external systems

Engineering Practices:

Write strongly typed Python code (mypy)
Follow CI/CD processes with automated checks (ruff, pytest)
Work across dev / staging / production environments
Debug distributed data inconsistencies

emagine

219 aktywnych ofert

Zobacz wszystkie oferty

Aplikuj teraz