Senior AI Platform Engineer (Python, AWS, Data Pipelines)
170 PLN/ godz.B2B
SeniorFull-time·B2B
#378356·Dodano wczoraj·0
Źródło: emagineTech Stack / Keywords
AIPythonAWSLLMPostgreSQLSQLRESTCI/CD
Firma i stanowisko
We are building CaaS (Content as a Service) — a platform that transforms publisher content (PDF textbooks and Excel manifests) into structured, enriched, AI-ready data. The platform processes content once and exposes it through a unified service layer used by multiple downstream applications. The project is in the Media / Telco industry.
Wymagania
Must-have:
- Strong Python development experience (production systems)
- AWS experience (S3, Glue, Aurora)
- Experience with data pipelines (ETL / batch processing)
- Strong SQL and PostgreSQL experience
- Experience with schema design and migrations
Nice to have:
- Experience with LLMs in production (OCR, content processing, enrichment)
- Prompt engineering / context engineering
- Experience with vector databases (Weaviate, Pinecone, Qdrant, pgvector)
- Knowledge of embeddings, semantic search, and RAG
- Experience with FastAPI
- Experience with Airflow / MWAA
- Experience building data platforms serving multiple consumers
Obowiązki
Data Engineering & Pipelines:
- Build and maintain multi-stage data ingestion pipelines
- Design and implement idempotent, restartable batch processing workflows
- Use S3 as core storage layer for raw and processed data
- Implement pipeline stages including content ingestion and book identity assignment, PDF-to-markdown conversion (AI OCR), table of contents and structure extraction, hierarchical chunking, and embedding generation
AI / LLM Processing:
- Use LLMs and OCR models to extract structured data from PDFs
- Design prompts and context strategies for consistent outputs
- Generate structured metadata and enrich content for downstream use cases
Data Storage & Consistency:
- Maintain PostgreSQL (Aurora) as system of record
- Design and maintain SQL schemas and versioned migrations
- Ensure data consistency across S3, PostgreSQL (Aurora), and vector database (Weaviate)
- Implement reconciliation logic across distributed systems
Retrieval & Vector Search:
- Work with Weaviate for vector search and semantic retrieval
- Support RAG-based applications
- Design data organization strategies by subject, country, and client
APIs & Integration:
- Build REST APIs using FastAPI
- Expose content as a service for multiple downstream applications
- Integrate with internal and external systems
Engineering Practices:
- Write strongly typed Python code (mypy)
- Follow CI/CD processes with automated checks (ruff, pytest)
- Work across dev / staging / production environments
- Debug distributed data inconsistencies
emagine
219 aktywnych ofert