Announcing our $17M Series A
At Patronus AI, we are on a mission to boost enterprise confidence in generative AI.
Today, we’re excited to announce our $17M Series A led by Notable Capital (formerly GGV Capital) with participation from Lightspeed Venture Partners and Datadog. In addition, the funding round was joined by Gokul Rajaram, Factorial Capital, and leading software and AI executives like Jonathan Frankle, Jason Warner, Tristan Handy, Michael Callahan, Barr Moses, Aparna Sinha, and Nadim Hossain.
As part of the Series A, we’re thrilled that Glenn Solomon will be joining our board. Glenn is a brilliant investor who has supported legendary companies over the years, like Hashicorp, Vercel, Square, Airbnb, Slack, Opendoor, Monte Carlo, Zendesk, and more. We couldn’t be more excited to be partnering with him and the whole Notable team.
How We Got Here
Since we launched out of stealth in September 2023, we’ve moved faster than the speed of light. ⚡
We published a bunch of industry firsts that have been used by tens of thousands of people and widely covered by CNBC, Fortune, VentureBeat, and more.
- FinanceBench, the first standardized LLM benchmark for the financial domain
- CopyrightCatcher, the first copyright detection API for LLMs
- Enterprise Scenarios Leaderboard on Hugging Face, the first LLM leaderboard for real world use cases
- EnterprisePII, the first evaluation API and evaluation dataset for business-sensitive information
Today, our product is used by numerous Fortune 500 enterprises and leading AI companies around the world. They have run millions of requests through the Patronus AI platform and caught hundreds of thousands of hallucinations and other mistakes in AI outputs, both in offline and online settings. This is all because we apply the AI research work we do to drive real product and customer value. We have developed best in class evaluation models, developed proprietary synthetic evaluation data generation methods, and implemented powerful alignment techniques that have done wonders. For example, our retrieval-hallucination evaluator outperforms alternatives by a significant margin on both open and sourced benchmarks:
Why We Need Automated LLM Evaluation
Generative AI continues to be adopted at a breakneck pace. We have witnessed a Cambrian-like explosion in the number of generative AI applications, in industries ranging from retail to software. In the past year alone, we have seen the release of GPT-4o, Mixtral, Gemini, Llama 2, Llama 3..and the expansion of multimodal capabilities to vision, audio and beyond.
Yet enterprises deploying AI continue to be exposed to the risks, often with disastrous consequences. While headlines lament Air Canada’s “lying AI assistant” and users poke fun at Chevrolet’s chatbot selling cars for $1, Alphabet’s $70B loss after the Gemini AI scandal and Cruise’s driverless cars recall reveal the dangerous consequences of AI gone wrong. We are AI optimists, but we believe that models should not be deployed to production without rigorous evaluation to identify and fix issues.
Our solution is a single AI powered platform that enables end-to-end evaluation of AI systems in a domain and model-agnostic way. Core to our platform is our set of Patronus evaluator models. Customers use our evaluators to assess model performance in a broad range of categories, ranging from accuracy and hallucination, to brand alignment and PII leakage. The best part is, you can do this all with just one line of code!
Our Vision For Scalable Oversight
AI will soon significantly outperform humans in many real world tasks. How can humans continue to supervise such systems? We believe that in this paradigm shift, manual evaluation of AI does not scale.
While humans may never fully control or even explain emergent properties of large AI models, we can learn to understand their strengths and weaknesses, and guide their behavior towards our desired preferences. Having both humans and AI in the loop makes this possible. Our vision is to achieve scalable oversight - humans act as overseers in a world where AI evaluates AI. We are solving this problem through building AI that assists humans with AI evaluation. We are at the forefront of making scalable human-AI collaboration possible!
What’s Next
We always say that the best is yet to come. We have lots of exciting things planned for 2024 and beyond, including training State-Of-The-Art AI models for evaluation, developing AI powered features for automated testing, and continuing to innovate on automated LLM evaluation.
We are hiring across a range of positions: AI Research, Product, Engineering, GTM, and Marketing. If our mission resonates with you, we’d love to chat!