> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Autoevals

> Use pre-built, battle-tested scorers for common evaluation tasks like factuality checking, semantic similarity, and format validation.

The `autoevals` library provides pre-built scorers for common evaluation tasks. They are open-source, deterministic where possible, and optimized for speed and reliability. Autoevals evaluate individual spans. They do not evaluate entire traces.

Available scorers include:

* **Factuality**: Check if output contains factual information
* **Semantic**: Measure semantic similarity to expected output
* **Levenshtein**: Calculate edit distance from expected output
* **JSON**: Validate JSON structure and content
* **SQL**: Validate SQL query syntax and semantics

See the [TypeScript](/sdks/typescript/related/autoevals/latest) or [Python](/sdks/python/related/autoevals/latest) reference for the complete list.

You can use autoevals inline in SDK evaluation code, or select them in the UI when running experiments, testing in playgrounds, or setting up online scoring rules. There is no CLI push step. Autoevals are library imports, not pushed scorers.

## Install

Install the `autoevals` package for your language:

<CodeGroup>
  ```bash TypeScript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  # pnpm
  pnpm add autoevals
  # npm
  npm install autoevals
  ```

  ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  pip install autoevals
  ```
</CodeGroup>

## Score with the SDK

Use autoevals inline in your evaluation code:

<CodeGroup dropdown>
  ```typescript wrap theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  import { Eval, initDataset } from "braintrust";
  import { Factuality } from "autoevals";

  Eval("My Project", {
    experimentName: "My experiment",
    data: initDataset("My Project", { dataset: "My Dataset" }),
    task: async (input) => {
      // Your LLM call here
      return await callModel(input);
    },
    scores: [Factuality],
    metadata: {
      model: "gpt-5-mini",
    },
  });
  ```

  ```python wrap theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  from braintrust import Eval, init_dataset
  from autoevals import Factuality

  Eval(
      "My project",
      experiment_name="My experiment",
      data=init_dataset(project="My project", name="My dataset"),
      task=lambda input: call_model(input),  # Your LLM call here
      scores=[Factuality],
      metadata={
          "model": "gpt-5-mini",
      },
  )
  ```
</CodeGroup>

Autoevals automatically receive these parameters when used in evaluations:

* `input`: The input to your task
* `output`: The output from your task
* `expected`: The expected output (optional)
* `metadata`: Custom metadata from the test case

## Score in the UI

* **Use in playgrounds**: When testing prompts in [playgrounds](/evaluate/playgrounds), add autoevals in the scoring section to evaluate results interactively.
* **Use in experiments**: When creating [experiments](/evaluate/run-evaluations#run-in-ui), select autoevals from the scorer dropdown to measure output quality across your dataset.
* **Use in online scoring**: Add autoevals to [online scoring rules](/evaluate/score-online) to automatically evaluate production logs.

## Set pass thresholds

Define minimum acceptable scores to automatically mark results as passing or failing. When configured, scores that meet or exceed the threshold are marked as **passing** (green highlighting with checkmark), while scores below are marked as **failing** (red highlighting).

In the UI, use the **Pass threshold** slider when selecting a scorer in an experiment, playground, or online scoring rule configuration.

## Next steps

* [LLM-as-a-judge](/evaluate/llm-as-a-judge) for subjective judgments like tone or helpfulness
* [Custom code](/evaluate/custom-code) for business rules, pattern matching, or calculations
* [Run evaluations](/evaluate/run-evaluations) using your scorers