Research

Who we are

We are a team of AI researchers and engineers formerly from companies such as Meta AI, Amazon AGI, and Google. Previously, our team has worked on AI research spanning LLM evaluation, fairness, alignment, and embodied agents

Foundational Research for Foundational AI

Deep Research

Understanding and reasoning over large semantic datasets

Multi-Turn Interaction

Enforcing multi-step workflows and supporting dialogue between the user and the agent

Long-Horizon Tasks

Developing agents to take on real-world tasks on the horizon of weeks, months, and even years

Memory

Increasing agentic memory with context windows and other tooling

Research Work

Generative Simulators

Simulation Frameworks

TRAIL

Simulation Frameworks

BLUR

Simulation Frameworks

TRACE

Simulation Frameworks

Multimodal LLM-as-a-Judge

Agentic Systems

DETOUR

Agentic Systems

Glider: SOTA SLM Judge

Evaluation Models

Lynx: Hallucination Detection Model

Benchmarks & Applied Evaluation

Humanity’s Last Exam

Benchmarks & Applied Evaluation

CopyrightCatcher

Benchmarks & Applied Evaluation

Generative Simulators

Simulation Frameworks

TRAIL

Simulation Frameworks

BLUR

Simulation Frameworks

Research Focus Areas

Agent reliability & failure modes

RL Environment Design

We value realism, diversity, and scalable difficulty in our environments. We set up scenarios with real-world rules, data, tools (actions), and rewards – iterating on complexity as we go.

Evaluation methodologies

RL Training

We experiment with new RL techniques for long-horizon capabilities across domains with a focus on tasks that would take a human being weeks, months, or even years to complete.

Long-horizon & multi-step tasks

Agentic Cognition

We explore memory, knowledge, and behavior patterns unique to agents. We learn how this is similar to or different from the ways humans process information.