Bulldogjob Praca zdalna Senior

Senior Product Manager, AI Agents Testing

Do uzgodnienia

Wymagania

LLM
SaaS
QA
A/B testing

Opis stanowiska

Role

Zendesk AI Agents are fully autonomous agents that resolve customer issues end-to-end — reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed. They operate across messaging, email, and voice channels, handling millions of conversations for brands like Liberty London, Unity, and Motel Rocks. As these agents grow more capable and more autonomous, the stakes of every deployment decision increase: a misconfigured procedure, a hallucinated response, or a broken escalation path can erode customer trust at scale.

Today, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely. You'll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.

Key Responsibilities

- Own product strategy and roadmap for AI agent testing — simulation, quality scoring, experimentation, regression detection, and conversation tracing

- Ship testing as an integrated experience embedded in the builder and deployment flow

- Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where

- Build the experimentation layer — A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies

- Design a pre-publish readiness gate that gives admins a quantified view of risk before every deployment — specific issues, coverage gaps, comparison to current production behavior

- Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture

- Make all of this usable by non-technical admins — CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets

Success in the Role

- Testing becomes part of how customers build and deploy agents — not something they do separately, but part of the flow

- Customers can quantify whether their agent is ready to go live, and catch regressions before end users hit them

- Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing

- The testing platform becomes a shared capability used beyond AI Agents — consumed by other product teams that need to validate AI-powered experiences

The intelligent heart of customer experience

Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.

Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.

🔍 Dekoder Ogłoszenia

🔴

You'll own the end-to-end product strategy for our Testing & Observability suite

Oznacza to pełną odpowiedzialność za strategię produktu, od pomysłu po wdrożenie, co może wiązać się z dużą presją i szerokim zakresem obowiązków.

🔴

This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI

Podkreśla kluczowe znaczenie roli, ale może sugerować, że obecne rozwiązania nie są wystarczająco dobre, a sukces zależy od wprowadzenia znaczących zmian.

🔴

Ship testing

Jest to bardzo zwięzłe i może oznaczać zarówno dostarczenie gotowego produktu, jak i konieczność szybkiego wdrażania funkcji, potencjalnie kosztem jakości lub kompletności.

🟡

reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed

Opisuje zaawansowane możliwości agentów AI, ale może sugerować, że obecne testowanie musi uwzględniać złożone scenariusze i potencjalne błędy w integracji.

🟢

admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live

Wskazuje na istniejący problem i potrzebę rozwiązania, co jest pozytywne dla osoby obejmującej to stanowisko, ale może też oznaczać dużą presję na szybkie dostarczenie działających narzędzi.

2026-05-28

Aplikuj - przejdz do oferty ↗