Product Features
From novel test suite generation to real-time LLM evaluation, the Patronus suite of features provide end-to-end solutions, so you can confidently deploy LLM applications at scale.
Evaluation Runs
Leverage our managed service to score model performance based on our proprietary taxonomy of criteria.
LLM Failure Monitoring & Observability
“Sentry for LLM Failures”: Continuously evaluate and track LLM performance for your AI product in production using the Patronus Evaluate API
Patronus Datasets
Use our off-the-shelf, adversarial testing sets designed to break models on specific use cases
Developed with 15 financial industry domain experts, FinanceBench is a high quality, large-scale set of 10,000 question and answer pairs based on publicly available financial documents like SEC 10Ks, SEC 10Qs, SEC 8Ks, earnings reports, and earnings call transcripts.
Developed with AI researchers at Oxford University and MilaNLP Lab at Bocconi University, SimpleSafetyTests is a diagnostic test suite to identify critical safety risks in LLMs across 5 areas: suicide, child abuse, physical harm, illegal items, and scams & fraud.
Developed with MosaicML, EnterprisePII is the industry’s first LLM dataset for detecting business-sensitive information. The dataset contains 3,000 examples of annotated text excerpts from common enterprise text types such as meeting notes, commercial contracts, marketing emails, performance reviews, and more.
Test Suite Generation
Auto-generate novel adversarial testing sets at scale to find all the edge cases where your models fail.
Benchmarking
Compare models side by side to understand how they differ in performance in real world scenarios.
Retrieval-Augmented Generation (RAG) Testing
Verify that your LLM-based retrieval systems consistently deliver reliable information with our cutting-edge RAG and retrieval testing workflows.