Patronus AI white logo
About Company

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Industry

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Exa vs Bing API: A Search Performance Comparison Case Study

Overview

The rise of AI applications has made the quality of search and retrieval systems increasingly critical. We conducted a detailed evaluation comparing Exa's neural search capabilities against Bing's API, focusing on their ability to provide relevant results for real-world queries that are highly semantic. We used the Patronus AI automated evaluation suite to perform the comparison, generating aggregate metrics and handy visualizations in the process.

Search Query Relevance of Exa API vs. Bing API

Methodology

We chose a highly semantic query set and tested for whether results semantically match the search query. We describe our methodology below.

Data Collection

We first constructed a representative evaluation dataset. Our dataset consisted of the following attributes:

  • 150 highly semantic queries
  • 5 results retrieved per query from each API
  • Full text, highlights, and summaries captured for each result

In order for the comparison to be accurate, we augmented the data with Exa contents for Bing search results, since it only returns URLs. This ensures a fair comparison focused only on the relevance of results.

Our code to query Exa and Bing Search is shown below:

# Example implementation

from exa_py import Exa

exa_client = Exa(api_key="TODO")

exa_results = exa_client.search_and_contents(
query,
type="neural",
use_autoprompt=True,
num_results=5,
text=False,
highlights=True,
summary=True
)

bing_results = bing_client.web.search(
query=query,
count=5,
text_decorations=True,
text_format="HTML"
)

Evaluation

Results were evaluated using an independent judge evaluator on the Patronus platform, assessing both summary quality and result relevance. This evaluator allowed us to obtain reliable evaluation results at scale, ensuring high human-AI alignment in the process. Results were evaluated on a PASS/FAIL basis, based on the following judge definition: 

"Given a search query in USER INPUT, a summary of the content from the returned search result in MODEL OUTPUT, and highlights (or snippets) from the returned search results, determine whether the MODEL OUTPUT or RETRIEVED CONTEXT provide useful and relevant information related to the USER INPUT."

We ran the following code to kick off an evaluation with the Patronus experiments framework:

# Example implementation

from patronus import Client

patronus_client = Client(api_key="TODO")

query_result_relevance = patronus_client.remote_evaluator(
evaluator_id_or_alias="judge",
criteria="is-search-query-result-relevant",
)

patronus_client.experiment(
project_name="web-search-comparison",
data=exa_results,
evaluators=[query_result_relevance],
experiment_name="exa",
)

patronus_client.experiment(
project_name="web-search-comparison",
data=bing_results,
evaluators=[query_result_relevance],
experiment_name="bing",
)

Performance Analysis

We see that Exa outperformed Bing Search in search result relevance. The Comparisons view shows that Exa had a pass rate of 60% whereas Bing had a pass rate of 38%.

Let's dig into some example queries to understand the performance differences!

Example Queries

Query: “best online language learning apps with proven effectiveness for native english speakers learning mandarin chinese”

Example Result: Exa

Exa's result recommended Ninchanese for native English speakers learning Mandarin Chinese. Patronus scored the result as PASS as it is relevant to the user query.

Patronus Experiments View of one of the relevant recipes suggested by Exa’s API 

Example Result: Bing

Bing's result provides examples of learning apps for 2024. Patronus scored the result as FAIL. To understand why, we can take a look at the Patronus evaluator's explanation. In this case, the result was marked as FAIL because the results were general in scope and not specific to native English speakers.

Key Findings

1. Semantic Understanding

  • Exa's neural search showed superior performance in understanding complex technical queries
  • Particularly strong in cases requiring deep domain understanding

2. Result Relevance

  • Higher precision in technical and specialized searches

3. Content Depth

  • Exa consistently returned more technically relevant content
  • Better at finding specific, detailed information rather than general overviews

Implications for Developers

The results demonstrate clear advantages for applications requiring:

  • Complex query understanding
  • Accuracy and relevancy of full content within a URL

Conclusion

Our evaluation reveals that Exa's neural search capabilities provide significantly more relevant results for technical and complex queries compared to traditional search APIs. This makes it particularly valuable for applications requiring deep semantic understanding and technical content retrieval.