emagine
emagine
New

Senior AI Platform Engineer (Python, AWS, Data Pipelines)

170 PLN/ godz.B2B
SeniorFull-time·B2B
#378356·Dodano wczoraj·0
Źródło: emagine
Aplikuj teraz

Tech Stack / Keywords

AIPythonAWSLLMPostgreSQLSQLRESTCI/CD

Firma i stanowisko

We are building CaaS (Content as a Service) — a platform that transforms publisher content (PDF textbooks and Excel manifests) into structured, enriched, AI-ready data. The platform processes content once and exposes it through a unified service layer used by multiple downstream applications. The project is in the Media / Telco industry.

Wymagania

Must-have:

  • Strong Python development experience (production systems)
  • AWS experience (S3, Glue, Aurora)
  • Experience with data pipelines (ETL / batch processing)
  • Strong SQL and PostgreSQL experience
  • Experience with schema design and migrations

Nice to have:

  • Experience with LLMs in production (OCR, content processing, enrichment)
  • Prompt engineering / context engineering
  • Experience with vector databases (Weaviate, Pinecone, Qdrant, pgvector)
  • Knowledge of embeddings, semantic search, and RAG
  • Experience with FastAPI
  • Experience with Airflow / MWAA
  • Experience building data platforms serving multiple consumers

Obowiązki

Data Engineering & Pipelines:

  • Build and maintain multi-stage data ingestion pipelines
  • Design and implement idempotent, restartable batch processing workflows
  • Use S3 as core storage layer for raw and processed data
  • Implement pipeline stages including content ingestion and book identity assignment, PDF-to-markdown conversion (AI OCR), table of contents and structure extraction, hierarchical chunking, and embedding generation

AI / LLM Processing:

  • Use LLMs and OCR models to extract structured data from PDFs
  • Design prompts and context strategies for consistent outputs
  • Generate structured metadata and enrich content for downstream use cases

Data Storage & Consistency:

  • Maintain PostgreSQL (Aurora) as system of record
  • Design and maintain SQL schemas and versioned migrations
  • Ensure data consistency across S3, PostgreSQL (Aurora), and vector database (Weaviate)
  • Implement reconciliation logic across distributed systems

Retrieval & Vector Search:

  • Work with Weaviate for vector search and semantic retrieval
  • Support RAG-based applications
  • Design data organization strategies by subject, country, and client

APIs & Integration:

  • Build REST APIs using FastAPI
  • Expose content as a service for multiple downstream applications
  • Integrate with internal and external systems

Engineering Practices:

  • Write strongly typed Python code (mypy)
  • Follow CI/CD processes with automated checks (ruff, pytest)
  • Work across dev / staging / production environments
  • Debug distributed data inconsistencies
emagine

emagine

219 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz