Pracuj.pl Hybrydowo Expert New

T Hub - AI Expert

T-Mobile

⚲ Warszawa, Mokotów

Wymagania

RAG
LLM
Python
PyTorch
HuggingFace
LangChain

Opis stanowiska

Nasze wymagania: Bachelor’s/Master’s/PhD in Computer Science, AI, or related field. 3+ years in ML/NLP roles, with 2+ years focused on RAG systems. Proven experience deploying LLMs in on-prem or hybrid environments. Proficiency with vLLM, LiteLLM, and open-source LLMs (e.g., LLAMA 3.2, Deepseek, Mistral). Strong Python expertise with frameworks like PyTorch, Hugging Face Transformers, and LangChain. Experience with vector databases (e.g. qdrant). Familiarity with Linux-based systems and RedHat OpenShift Ability to communicate complex AI concepts to non-technical stakeholders. Strong problem-solving skills and adaptability in fast-paced environments. O projekcie: We seek an AI Expert with deep expertise in designing, implementing, and optimizing Retrieval Augmented Generation (RAG) systems in on-premises environments. The ideal candidate will have hands-on experience with vLLM, liteLLM, and open-source LLMs like gpt-oss or qwen, along with a proven ability to integrate these tools into scalable, secure, and high-performance enterprise workflows. Zakres obowiązków: RAG System Development: Architect and deploy end-to-end RAG pipelines, combining retrieval mechanisms (e.g., vector databases like qdrant) with generative models for enterprise use cases. Fine-tune and optimize retrieval models to ensure high accuracy and low latency in on-prem environments. Model Integration & Deployment: Implement and customize inference servers using vLLM for efficient LLM serving and LiteLLM for lightweight model orchestration. Integrate open-source LLMs with proprietary data sources and APIs. On-Prem Infrastructure Management: Design GPU-optimized, scalable infrastructure for LLM training and inference, ensuring compliance with security and data governance policies. Collaborate with DevOps teams to containerize workflows using Docker/Kubernetes and automate MLOps pipelines. Performance Optimization: Apply techniques like quantization, pruning, and dynamic batching to maximize resource efficiency in resource-constrained on-prem setups. Monitor system performance, troubleshoot bottlenecks, and ensure high availability. Cross-Functional Collaboration: Partner with data engineers to curate and preprocess domain-specific datasets for retrieval and generation tasks. Translate business requirements into technical solutions for stakeholders in telco environments. Oferujemy: A dynamic environment where you’ll consecutively lead your contributions across diverse projects. Opportunity to become an expert in some of the most exciting cutting-edge technologies like Conversational AI platforms and VoIP solutions. A collaborative team setup that supports your growth in a customer-facing technical consulting role. Room for individual technological exploration while shaping innovative enterprise solutions.

2026-03-27 Aplikuj - przejdz do oferty ↗