Patronus AI Products

Central to our product evolution is dynamism. Our industry landscape has evolved rapidly over the past few years, and so have we.

Phase I (2022-2025): We evaluated models
on static data sets.

Phase II (2025-): We are improving agents on long horizon problems in real world like settings.

We develop the most important infrastructure
to power agentic products.

01
Platform
02
Percival
03
World Models
for Digital Workflows

Our core evaluation platform provides teams with a centralized solution for experiments, logging, comparisons, and traces, among more

LLM-as-a-Judge

Enables developers to score multimodal AI systems for image to text

Explore
Glider

Powerful 3B evaluator LLM that can score any text input on user-defined criteria

Explore
Lynx

A SOTA hallucination detection LLM that is capable of advanced reasoning

Explore

Percival is our evaluation copilot for agentic systems built to detect 20+ failure modes in agentic traces, suggesting optimizations, and evaluating a suite of reasoning and planning errors

Percival

Eval copilot that analyzes traces, identifies issues, and suggests optimizations

Explore
Percival Chat Assistant

Interactive Al agent that lets you unlock the power of Percival

Explore

We are a team of AI researchers and engineers formerly from companies such as Meta AI, Amazon AGI, and Google.

Generative Simulators

Adaptive environments that co-generate tasks, world dynamics, and reward functions

Explore
MemTrack

Benchmark to evaluate long-term memory and state tracking in multi-platform agent environments

Explore
01
Platform

Our core evaluation platform provides teams with a centralized solution for experiments, logging, comparisons, and traces, among more

LLM-as-a-Judge

Enables developers to score multimodal AI systems for image to text

Explore
Glider

Powerful 3B evaluator LLM that can score any text input on user-defined criteria

Explore
Lynx

A SOTA hallucination detection LLM that is capable of advanced reasoning

Explore
02
Percival

Percival is our evaluation copilot for agentic systems built to detect 20+ failure modes in agentic traces, suggesting optimizations, and evaluating a suite of reasoning and planning errors

Percival

Eval copilot that analyzes traces, identifies issues, and suggests optimizations

Explore
Percival Chat Assistant

Interactive AI assistant that lets you unlock the power of Percival

Explore
03
World Models
for Digital Workflows

We are a team of AI researchers and engineers formerly from companies such as Meta AI, Amazon AGI, and Google. Our work has led to product contributions serving top Fortune 500 clients

Generative Simulators

Adaptive environments that co-generate tasks, world dynamics, and reward functions

Explore
MemTrack

Benchmark to evaluate long-term memory and state tracking in multi-platform agent environments

Explore

Within our lifetimes, we will be able to push out enough computational power to simulate reality

- Tim Sweeney