Powerful AI Evaluation
Deliver AI products safely and confidently. Based on industry-leading AI research, evaluation models, and tools.
Introducing the Patronus API
The fastest way to prevent AI failures in production. Now available for everyone.
Meet the Patronus Evaluators.
State-of-the-art evaluation models at your fingertips. Designed to monitor AI-native workflows like RAG systems and agents.
or
Bring Your Own Evaluator
Use the SDK to configure custom evaluators for function calling, tool use, and more
1from patronus import Client, evaluator, Row
2
3client = Client(api_key="YOUR_API_KEY")
4
5@evaluator
6def iexact_match(row: Row) -> bool:
7 return row.evaluated_model_output.lower().strip() == row.evaluated_model_gold_answer.lower().strip()
8
9client.experiment(
10 "Tutorial",
11 dataset=[
12 {
13 "evaluated_model_input": "Translate 'Good night' to French.",
14 "evaluated_model_output": "bonne nuit",
15 "evaluated_model_gold_answer": "Bonne nuit",
16 },
17 {
18 "evaluated_model_input": "Summarize: 'AI improves efficiency'.",
19 "evaluated_model_output": "ai improves efficiency",
20 "evaluated_model_gold_answer": "AI improves efficiency",
21 },
22 ],
23 evaluators=[iexact_match],
24 experiment_name="Case Insensitive Match",
25)
Platform Capabilities
Start with Patronus on Day 0 and never look back.
Patronus Evaluators
Access industry-leading evaluation models designed to catch RAG hallucinations, prompt injections, and more, using the Patronus API
Patronus Experiments
Measure AI product performance in offline runs using any evaluator and dataset of your choice
Patronus Logs
Continuously evaluate and monitor your AI product in production using the Patronus API
Patronus Comparisons
Compare and benchmark LLMs, RAG systems, and agents side by side
Patronus Datasets
Leverage industry-standard datasets like FinanceBench, EnterprisePII, SimpleSafetyTests, all designed for specific use cases
Patronus Test Suite Generation
Partner with our AI Research team to develop high quality test datasets specific to your domain
Why Us
We take a research-first approach
The team at Patronus has been testing LLMs since before the GenAI boom
Our approach is state-of-the-art → +18% better at detecting hallucinations than other OpenAI LLM-based evaluators*
We offer production-ready LLM evaluators for general, custom, and RAG-enabled use cases
Our off-the-shelf evaluators cover your bases (e.g. toxicity, PII leakage) while our custom evaluators cover the rest (e.g., brand alignment)
We support real-time evaluation with fast API response times (as low as 100ms)
You can start using the Patronus API with a single line of code
We offer flexible hosting options with enterprise-grade security
No need to worry about managing servers with our Cloud Hosted solution
Our On-Premise offering is also available for customers with the strictest data privacy needs
You can rest assured that your proprietary data will never be shared outside our organization
We get vetted by third-party security companies yearly
We are trusted by a strong array of customers and partners
Patronus is the only company to provide an SLA guarantee of 90% alignment between our evaluators and human evaluators
Our customers include OpenAI, HP, and Pearson
Our partners include AWS, Databricks, and MongoDB
Catch AI Failures in Seconds
Use the Patronus API in any stack
1export PATRONUS_API_KEY=<PROVIDE YOUR API KEY>
2
3curl --request POST \
4 --url "https://api.patronus.ai/v1/evaluate" \
5 --header "X-API-KEY: $PATRONUS_API_KEY" \
6 --header "accept: application/json" \
7 --header "content-type: application/json" \
8 --data '
9{
10 "evaluators": [{ "evaluator": "lynx", "criteria": "patronus:hallucination" }],
11 "evaluated_model_input": "Who are you?",
12 "evaluated_model_output": "My name is Barry.",
13 "evaluated_model_retrieved_context": "My name is John."
14}'
1from patronus import Client
2
3client = Client(
4 api_key=os.environ.get("PATRONUS_API_KEY"),
5)
6
7response = client.evaluate(
8 evaluator="lynx",
9 criteria= "patronus:hallucination",
10 evaluated_model_input="Who are you?",
11 evaluated_model_output="My name is Barry.",
12 evaluated_model_retrieved_context="My name is John."
13)
1const apiKey = process.env.PATRONUS_API_KEY;
2
3fetch('https://api.patronus.ai/v1/evaluate', {
4 method: 'POST',
5 headers: {
6 'X-API-KEY': apiKey,
7 'accept': 'application/json',
8 'content-type': 'application/json'
9 },
10 body: JSON.stringify({
11 evaluators: [{ evaluator: "lynx", criteria: "patronus:hallucination" }],
12 evaluated_model_input: "Who are you?",
13 evaluated_model_output: "My name is Barry.",
14 evaluated_model_retrieved_context: "My name is John."
15 })
16})
17 .then(response => response.json())
18 .then(data => console.log(data))
19 .catch(error => console.error(error));
Industry Leading
AI Research
Our AI research team is behind cutting-edge AI evaluation models and benchmarks, which are now used by tens of thousands of organizations and developers around the world.
What they say about us
As scientists and AI researchers, we spend significant time on model evaluation. The Patronus team is full of experts in this space, and brings a novel research-first approach to the problem. We're thrilled to see the increased investment in this area.
"Evaluating LLMs is multifaceted and complex. LLM developers and users alike will benefit from the unbiased, independent perspective Patronus provides."
"Testing LLMs is in its infancy. The best methods today rely on outdated academic benchmarks and noisy human evaluations -- equivalent to sticking your finger in water to get its temperature. Patronus is leading with an innovating approach."
"Engineers spend a ton of time manually creating tests and grading outputs. Patronus assists with all of this and identifies exactly where LLMs break in real world scenarios."
Patronus AI doesn’t just help you build trust in your generative AI products, they make sure your own users trust your products too. They always go one step further to make sure you succeed with your AI use case in production.
The Patronus team is taking a holistic and most innovative approach to finding vulnerabilities in LLM systems. Every company that wants to build LLM-based products will need to solve for it and the Patronus team is the most thoughtful group tackling this problem.
One of the standout features of Patronus is its customizability. I can bring my own evaluations or set up my own Custom Evaluator in 30 seconds, and then do everything else from there within the platform.
Patronus AI is at the forefront of multilingual AI evaluation. DefineX is excited to be using Patronus’ proprietary technology to safeguard generative AI risks in the Turkey & Middle East region and beyond.
Patronus and their straightforward API makes it really easy to reliably evaluate issues with LLMs and mitigate problems like content toxicity, PII leakage, and more. We're excited to partner with Patronus to combine their evaluation capabilities with Radiant's production reliability platform to help customers build great GenAI products.
I love that Patronus supports both offline and online workflows. It’s a game changer when an engineering team has to do no extra work in making their offline evaluation setup work in real-time settings. This is because their API is really easy to use, and is framework-agnostic and platform-agnostic.
In our mission to bring the AI stack close to enterprise data and offering best in class tools to train and deploy AI solutions, we are thrilled to partner with Patronus AI. Our combined platform will help in training, finetuning, rigorously testing, and monitoring LLM systems in a scalable way.
AI won’t take your job but it will change your job description. Safety in the workplace and security in the workspace is the only way to be AI-ready. That’s only possible with Patronus.
One of the neat things about the Patronus experience is the part that comes after catching LLM mistakes - insights with natural language explanations, failure mode identification, and semantic clustering.