JustJoin.IT Praca zdalna Senior New

Staff/Senior Machine Learning Engineer

VirtusLab

⚲ Kraków, Poland (Remote)

23 520 - 31 080 PLN netto (B2B)

Wymagania

  • Python
  • Airflow
  • PySpark
  • GCP
  • Machine Learning
  • LLM

Opis stanowiska

We foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering our team. As part of the expanding VirtusLab Group, we offer a compelling environment for those seeking to make a substantial impact in the software industry within a forward-thinking organization.
About the role
Join our team to drive business innovation with production-ready machine learning pipelines. You will play a key role in deploying and maintaining ML workflows, leveraging GCP for cloud computing and on-prem clusters for ETLs. Collaborating closely with Data Scientists, you will contribute to AI-powered projects while shaping the organization’s technical culture.

Python ExpertAirflow Advanced
PySpark Advanced
GCP Regular
ML/LLM Advanced
Project

Anomalsky

Project Scope
Our client is a NASDAQ-listed B2B data company powering Go-To-Market strategies with a 360-degree view of every customer, a view whose value depends on the quality of billions of person and company records.
Anomalsky is the ML system we built to catch what traditional observability misses: row-level semantic anomalies (e.g., a first_name, title, company_name). Three layers, an ML layer (embeddings + unsupervised clustering) flags suspicious records at scale, an LLM layer removes false positives and explains each cluster, and an optional human-in-the-loop lets domain experts resolve whole clusters at once. The MVP already drove ~40k crucial record corrections in production.
What’s next: the MVP is landing on GCP now. Once it’s operational, the mission is to scale Anomalsky across the entire organization, embedding it into Acquisition pipelines and building a real-time variant that scans data before it reaches customers.

What you'll work on
• Productionize Anomalsky on GCP and scale it to operational, organization-wide use.
• Evolve the ML / LLM / human-in-the-loop design and the feedback loop that turns expert reviews into reusable knowledge.
• Prototype the low-latency real-time variant.
• Integrate Anomalsky into existing workflows, starting with Acquisition.

Tech Stack
Python, Airflow, BigQuery, Snowflake, Spark (Dataproc), Databricks, Iceberg, Starburst, Trino, AWS, GCP, Docker, Terraform, Jenkins, GitHub, Scikit Learn,  unsupervised anomaly detection (kNN, Isolation Forest, autoencoders), recursive clustering, classifiers on real + synthetic data, MLflow, LLM-based reasoning.

Team
ML and data engineers from VirtusLab working alongside customer data engineers, a manager.

What we expect in general:
• 5+ years of hands-on machine learning engineering experience
• Hands-on experience in deploying Python projects.
• Strong experience in writing high-quality Python code.
• Experience with orchestration tools such as Airflow.
• Knowledge of Spark or other distributed data processing tools.
• Experience with Kubernetes ecosystem as a user.
• Strong experience in Cloudand Docker
• Ability to work in a team and participate in the design process.
• Good command of English (B2/C1).

Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!
A few perks of being with us
• Building tech community
• Flexible hybrid work model
• Home office reimbursement
• Language lessons
• MyBenefit points
• Private healthcare
• Training Package
• Virtusity / in-house training
• And a lot more!
Apply now

🔍 Dekoder Ogłoszenia

🔴
foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering our team
Może oznaczać kulturę, w której oczekuje się proaktywności i samodzielności, ale także potencjalnie braku jasnych procesów i nadmiernego obciążenia pracą.
🔴
play a key role in deploying and maintaining ML workflows
Może oznaczać, że będziesz odpowiedzialny za utrzymanie istniejących systemów, a niekoniecznie za tworzenie od podstaw innowacyjnych rozwiązań.
🟡
leveraging GCP for cloud computing and on-prem clusters for ETLs
Wskazuje na potrzebę pracy zarówno w chmurze, jak i z infrastrukturą lokalną, co może wymagać szerszego zakresu umiejętności technicznych.
🟡
collaborating closely with Data Scientists
Oznacza ścisłą współpracę z zespołem Data Science, co może wymagać dobrej komunikacji i zrozumienia ich potrzeb, ale także potencjalnie może oznaczać, że rola inżyniera ML jest bardziej wspierająca niż samodzielna.
🟢
shaping the organization’s technical culture
Sugestia wpływu na kulturę techniczną może oznaczać zarówno realną możliwość wprowadzania zmian, jak i oczekiwanie od kandydata aktywnego uczestnictwa w kształtowaniu procesów i standardów.