Scalable Oversight
Research Lab

The Scalable Oversight Research Lab explores how to evaluate and guide AI agents operating in complex, interactive environments

Who we are

We are a team of AI researchers and engineers formerly from companies such as Meta AI, Amazon AGI, and Google. Our work has led to product contributions serving top Fortune 500 clients

Previously, our team has worked on AI research spanning LLM evaluation, fairness, alignment, and embodied agents

Foundational Research for Foundational AI

Tool Use

Activating agent understanding of tool capabilities and differentiation, applying the right tool for the right action

Multi-Turn Interaction

Enforcing multi-step workflows and supporting dialogue between the user and the agent

Long-Horizon Tasks

Developing agents to take on real-world tasks on the horizon of weeks, months, and even years

Memory

Increasing agentic memory with context windows and other tooling

Research Focus Areas

Concrete benchmarks, environments, and methodologies built directly from our research

Agent reliability & failure modes
RL Environment Design

We value realism, diversity, and scalable difficulty in our environments. We set up scenarios with real-world rules, data, tools (actions), and rewards – iterating on complexity as we go.

Evaluation methodologies
RL Training

We experiment with new RL techniques for long-horizon capabilities across domains with a focus on tasks that would take a human being weeks, months, or even years to complete.

Long-horizon & multi-step tasks
Agentic Cognition

We explore memory, knowledge, and behavior patterns unique to agents. We learn how this is similar to or different from the ways humans process information.

Tool use & environment simulation
General

We create benchmark environments to have a live grasp of long-range agent behaviors in our focus domains – research science, coding, conversational dialogue, computer use, and finance.

Latest Updates

Concrete benchmarks, environments, and methodologies built directly from our research