Senior Product Manager, AI Agents Testing
Do uzgodnienia
Wymagania
- LLM
- SaaS
- QA
- A/B testing
Opis stanowiska
Role
Zendesk AI Agents are fully autonomous agents that resolve customer issues end-to-end — reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed. They operate across messaging, email, and voice channels, handling millions of conversations for brands like Liberty London, Unity, and Motel Rocks. As these agents grow more capable and more autonomous, the stakes of every deployment decision increase: a misconfigured procedure, a hallucinated response, or a broken escalation path can erode customer trust at scale.
Today, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely. You'll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.
Key Responsibilities
- Own product strategy and roadmap for AI agent testing — simulation, quality scoring, experimentation, regression detection, and conversation tracing
- Ship testing as an integrated experience embedded in the builder and deployment flow
- Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where
- Build the experimentation layer — A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies
- Design a pre-publish readiness gate that gives admins a quantified view of risk before every deployment — specific issues, coverage gaps, comparison to current production behavior
- Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture
- Make all of this usable by non-technical admins — CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets
Success in the Role
- Testing becomes part of how customers build and deploy agents — not something they do separately, but part of the flow
- Customers can quantify whether their agent is ready to go live, and catch regressions before end users hit them
- Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing
- The testing platform becomes a shared capability used beyond AI Agents — consumed by other product teams that need to validate AI-powered experiences
The intelligent heart of customer experience
Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.
Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.
Zendesk AI Agents are fully autonomous agents that resolve customer issues end-to-end — reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed. They operate across messaging, email, and voice channels, handling millions of conversations for brands like Liberty London, Unity, and Motel Rocks. As these agents grow more capable and more autonomous, the stakes of every deployment decision increase: a misconfigured procedure, a hallucinated response, or a broken escalation path can erode customer trust at scale.
Today, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely. You'll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.
Key Responsibilities
- Own product strategy and roadmap for AI agent testing — simulation, quality scoring, experimentation, regression detection, and conversation tracing
- Ship testing as an integrated experience embedded in the builder and deployment flow
- Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where
- Build the experimentation layer — A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies
- Design a pre-publish readiness gate that gives admins a quantified view of risk before every deployment — specific issues, coverage gaps, comparison to current production behavior
- Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture
- Make all of this usable by non-technical admins — CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets
Success in the Role
- Testing becomes part of how customers build and deploy agents — not something they do separately, but part of the flow
- Customers can quantify whether their agent is ready to go live, and catch regressions before end users hit them
- Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing
- The testing platform becomes a shared capability used beyond AI Agents — consumed by other product teams that need to validate AI-powered experiences
The intelligent heart of customer experience
Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.
Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.
🔍 Dekoder Ogłoszenia
🔴
You'll own the end-to-end product strategy for our Testing & Observability suite
Będziesz odpowiedzialny za całą strategię produktu, co może oznaczać dużą autonomię, ale też brak wsparcia ze strony innych zespołów.
🔴
This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.
Praca ma kluczowe znaczenie dla firmy, co może wiązać się z dużą presją i odpowiedzialnością za sukces produktu.
🔴
Ship testing
Jest to bardzo zwięzłe i może oznaczać, że oczekuje się od Ciebie szybkiego dostarczania funkcjonalności testowych, bez szczegółowego określenia zakresu.