Pracuj.pl Praca zdalna Mid

Machine Learning / AI Engineer (RL)

ACAISOFT POLAND Sp. z o.o.

⚲ Warszawa, Mokotów

160–240 zł netto (+ VAT) / godz.

Wymagania

  • Python
  • Reinforcement Learning

Opis stanowiska

Nasze wymagania: 5+ years of overall experience in the IT industry. Minimum 3 years in Machine Learning/Environment Engineering, Data Scientist Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server ). Extensive practical experience in working with AI, including prompt engineering and vibe coding. Experience in working with business requirements (analysis, summarizing, responding to changes). Expertise in planning your own work or that of a small team. Being able to work 2 p.m. - 10 p.m Mile widziane: Knowledge of Codex or Claude Code. Experience in integrating AI with a system would be an advantage. Understanding of RL concepts - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops. Familiarity with instrumentation, metrics, and data pipelines for RL evaluation. O projekcie: You will be cooperating with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models. In this role, you’ll help develop advanced reinforcement learning (RL) environments and scalable evaluation systems that guide and shape the behavior of cutting-edge AI models. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation. Due to the client’s time zone, we would appreciate a candidate who can work 2 p.m. - 10 p.m. Join us and make a real impact! If you’re ready to broaden your horizons and work with an innovative company at the forefront of AI, we’d love to hear from you. You’ll help build the environments that shape how future AI systems are trained, evaluated, and aligned - and collaborate with world-class engineers and researchers on one of the most important technical challenges of our time. Zakres obowiązków: Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments. Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity. Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning. Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry. Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments. Optimize environment performance, logging, and reward reproducibility across distributed setups.

🔍 Dekoder Ogłoszenia

🔴
Extensive practical experience in working with AI, including prompt engineering and vibe coding.
Szukają kogoś, kto potrafi efektywnie tworzyć zapytania do modeli AI i intuicyjnie rozumieć ich działanie, co może być subiektywne.
🔴
Expertise in planning your own work or that of a small team.
Oznacza, że będziesz musiał samodzielnie zarządzać swoimi zadaniami i być może nadzorować pracę innych, bez formalnego tytułu managerskiego.
🔴
Being able to work 2 p.m. - 10 p.m
Wymóg pracy w późnych godzinach popołudniowych i wieczornych, co może wpływać na równowagę między życiem zawodowym a prywatnym.
🟢
Join us and make a real impact!
Standardowe, motywacyjne hasło, które niekoniecznie gwarantuje znaczący wpływ na projekt.
🔴
If you’re ready to broaden yo
Niedokończone zdanie sugeruje pośpiech w tworzeniu ogłoszenia lub brak dbałości o szczegóły.