NoFluffJobs Praca zdalna Senior

AI Systems Engineer (Agents & Evaluation)

Acaisoft

⚲ Warszawa

26 880 - 40 320 PLN (B2B)

Wymagania

Python
Machine learning
Reinforcement Learning (nice to have)
Claude Code (nice to have)
Codex (nice to have)

Opis stanowiska

O projekcie: We’re looking for AI/ML/Environment Engineers to cooperate with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation. Our new Engineer will design and build end-to-end RL environments for large-scale agent evaluation and experimentation. The role combines research, systems engineering, and infrastructure to create scalable, reproducible environments with automated evaluation, reward modeling, and simulation capabilities across API, web, and multi-agent tasks. In this role, you will work on generating tasks in Reinforcement Learning environments. We create environments for producing training data that can be used to train models. Due to the client’s time zone, we work 2 p.m. - 10 p.m. daily Wymagania: - Being able to work 2 p.m. - 10 p.m.- 4+ years of experience in data engineering, simulation systems, or ML infrastructure. - Strong command of Python and systems-level programming.- Practical experience in working with AI, including frameworks (Langchain, Langraph, mcp-server ) and prompt engineering.- Deep understanding of ML concepts.- Curiosity and conviction around building environments that steer AGI. Nice to have: - Knowledge of RL concepts - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops.- Familiarity with instrumentation, metrics, and data pipelines for RL evaluation.- Knowledge of Codex or Claude Code. - Experience in integrating AI with a system would be an advantage. Codzienne zadania: - Design and develop RL environments for large-scale agent evaluation and reinforcement learning workflows. - Build task generation pipelines, dynamic datasets, and scripted simulations with controlled complexity and stochastic behavior. - Implement verification systems and reward models to automatically assess agent trajectories and reasoning quality. - Collaborate with infrastructure and systems teams to ensure environments are scalable, reproducible, and fully instrumented for telemetry and monitoring. - Develop APIs and orchestration frameworks for executing, resetting, and evaluating agents across multiple environments. - Work closely with research and customer-facing teams to transform open-ended requirements into measurable and testable solutions. - Optimize environment performance, logging systems, and reward reproducibility across distributed architectures.

🔍 Dekoder Ogłoszenia

🟡

We’re looking for AI/ML/Environment Engineers to cooperate with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models.

Firma pozycjonuje się jako lider, ale faktyczne zadania mogą być bardziej operacyjne niż strategiczne.

🔴

The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation.

Bardzo ambitna misja dotycząca AGI może oznaczać pracę nad bardzo eksperymentalnymi i potencjalnie niestabilnymi systemami.

🟡

The role combines research, systems engineering, and infrastructure to create scalable, reproducible environments with automated evaluation, reward modeling, and simulation capabilities across API, web, and multi-agent tasks.

Połączenie wielu obszarów może oznaczać, że będziesz musiał zajmować się wszystkim po trochu, zamiast skupiać się na jednej specjalizacji.

🟡

In this role, you will work on generating tasks in Reinforcement Learning environments. We create environments for producing training data that can be used to train models.

Głównym zadaniem może być tworzenie danych treningowych, co jest kluczowe, ale może być mniej ekscytujące niż bezpośrednie budowanie modeli AI.

🔴

Due to the client’s time zone, we work 2 p.m. - 10 p.m. daily

Praca w godzinach popołudniowo-wieczornych może być uciążliwa i wpływać na życie prywatne.

2026-05-18

Aplikuj - przejdz do oferty ↗