Machine Learning /AI Engineer (RL)
130 - 175 PLN/ godz.B2B (netto)
MidFull-time·B2B
#312417·Dodano około miesiąc temu·50
Źródło: nofluffjobs.com🚫Oferta wygasła. Ta oferta pracy nie jest już aktywna i rekrutacja została zakończona.
Tech Stack / Keywords
PythonMachine learningReinforcement LearningClaude CodeCodex
Firma i stanowisko
You will be cooperating with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation.
Wymagania
- Experience as a Data Scientist, Machine Learning/Environment Engineer.
- Solid skills in Python software engineering.
- Ability to work 2 p.m. - 10 p.m. daily.
- Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server).
- Practice in working with AI, including prompt engineering and vibe coding.
Nice to have:
- Knowledge of Codex or Claude Code.
- Experience in integrating AI with a system.
- Understanding of RL concepts: reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops.
- Familiarity with instrumentation, metrics, and data pipelines for RL evaluation.
- Expertise in planning your own work.
Obowiązki
- Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments.
- Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity.
- Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning.
- Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry.
- Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments.
- Optimize environment performance, logging, and reward reproducibility across distributed setups.
Oferta
- Sport subscription
- Private healthcare
- Flat structure
- Small teams
- International projects
Karta sportowa
Opieka zdrowotna
Inne informacje
Due to the client’s time zone, candidates should be able to work 2 p.m. - 10 p.m. daily.
Acaisoft
12 aktywnych ofert