> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Vitest

> Run Braintrust evaluations from Vitest and report test results as experiments

If you are a coding agent, prefer the Braintrust [`bt` CLI](/reference/cli/quickstart) for repeatable, scriptable work: running evals, instrumenting code, querying logs, syncing data, managing functions, and configuring coding agents. Use the MCP server for reasoning over Braintrust data in conversation, such as ad-hoc lookups and exploration from your IDE.

[Vitest](https://vitest.dev/) is a test runner for JavaScript and TypeScript. Braintrust supports two Vitest workflows:

* Use the Braintrust `wrapVitest` helper to write Vitest tests that run as Braintrust evals.
* Use the `vitest-evals` reporter to report [`vitest-evals`](https://www.npmjs.com/package/vitest-evals) test runs to Braintrust.

## Setup

Install Braintrust alongside Vitest:

<CodeGroup>
  ```bash npm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  npm install braintrust vitest
  ```

  ```bash pnpm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  pnpm add braintrust vitest
  ```
</CodeGroup>

Set your Braintrust API key as an environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export BRAINTRUST_API_KEY=<your-api-key>
```

### Separate evals from unit tests

Eval files are regular Vitest files and can live anywhere in your project. Evals can run slower and log results to Braintrust, so a common convention is a `.eval.ts` suffix or a dedicated `evals/` directory with a separate Vitest config:

```typescript title="vitest.eval.config.ts" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    include: ["**/*.eval.ts"],
    testTimeout: 30000,
  },
});
```

Run evals separately from unit tests:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
# Unit tests
npx vitest run

# Evals
npx vitest run --config vitest.eval.config.ts
```

## Run evals with `wrapVitest`

Call `wrapVitest` once at the top of your test file, passing in the Vitest globals. Use the returned object in place of the standard `test`, `describe`, and `expect`.

```typescript title="my-eval.eval.ts" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const { test, expect, describe } = wrapVitest(vitest, {
  projectName: "my-project", // Replace with your project name
});

describe("My eval suite", () => {
  test(
    "basic check",
    {
      input: { prompt: "What is 1 + 1?" },
      expected: "2",
    },
    async ({ input, expected }) => {
      const output = await myModel(input.prompt);
      expect(output).toBe(expected);
      return output;
    },
  );
});
```

Run it with the eval config:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
npx vitest run --config vitest.eval.config.ts
```

After the suite finishes, Braintrust prints a summary to your terminal and creates an experiment with one traced span per test case.

## Report `vitest-evals` runs to Braintrust

Use the Braintrust Vitest evals reporter when you already write evaluations with the [`vitest-evals`](https://vitest-evals.sentry.dev/) package and want those runs logged to Braintrust. This workflow is separate from the standard Braintrust `Eval()` framework and from the `wrapVitest` helper.

Install the reporter dependencies:

<CodeGroup>
  ```bash npm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  npm install braintrust vitest vitest-evals
  ```

  ```bash pnpm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  pnpm add braintrust vitest vitest-evals
  ```
</CodeGroup>

Configure Vitest with both the `vitest-evals` reporter and the Braintrust reporter:

```typescript title="vitest.evals.config.mts" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { defineConfig } from "vitest/config";
import BraintrustVitestEvalsReporter from "braintrust/vitest-evals-reporter";

export default defineConfig({
  test: {
    include: ["**/*.eval.ts"],
    reporters: [
      "default",
      "vitest-evals/reporter",
      new BraintrustVitestEvalsReporter({
        projectName: "refund-agent", // Replace with your project name
        experimentName: `vitest-evals-${new Date().toISOString()}`,
      }),
    ],
    testTimeout: 30000,
  },
});
```

Write eval tests with `vitest-evals` primitives. The Braintrust reporter reads the eval metadata produced by `vitest-evals/reporter` and logs each eval case as a Braintrust span.

```typescript title="refund.eval.ts" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { expect } from "vitest";
import { createHarness, createJudge, describeEval } from "vitest-evals";

type RefundOutput = {
  message: string;
  status: "approved" | "denied";
};

const refundHarness = createHarness<string, RefundOutput>({
  name: "refund-harness",
  run: async ({ input }) => ({
    output: {
      message: "Invoice inv_123 is refundable and the refund is approved.",
      status: "approved",
    },
    events: [
      { type: "message", role: "user", content: input },
      {
        type: "tool_call",
        id: "call_lookup",
        name: "lookupInvoice",
        arguments: { invoiceId: "inv_123" },
      },
      {
        type: "tool_result",
        toolCallId: "call_lookup",
        name: "lookupInvoice",
        content: { refundable: true },
      },
      {
        type: "message",
        role: "assistant",
        content: "Invoice inv_123 is refundable and the refund is approved.",
      },
    ],
    usage: {
      inputTokens: 11,
      outputTokens: 13,
      totalTokens: 24,
      toolCalls: 1,
    },
  }),
});

const StatusJudge = createJudge<
  string,
  RefundOutput,
  { expectedStatus: RefundOutput["status"] }
>("StatusJudge", async ({ output, expectedStatus }) => ({
  metadata: {
    expectedStatus,
    observedStatus: output.status,
  },
  score: output.status === expectedStatus ? 1 : 0,
}));

describeEval("refund agent", { harness: refundHarness }, (it) => {
  it("approves refundable invoice", async ({ run }) => {
    const result = await run("Refund invoice inv_123");

    expect(result.output.status).toBe("approved");
    await expect(result).toSatisfyJudge(StatusJudge, {
      expectedStatus: "approved",
      threshold: 1,
    });
  });
});
```

Run Vitest with the reporter config:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
npx vitest run --config vitest.evals.config.mts
```

The reporter creates or reuses a Braintrust experiment for the run. Each eval test logs:

* The test input, output, status, file path, and full test name.
* Scores from judges and assertions, including `avg_score` and `pass` when provided by `vitest-evals`.
* Harness metadata, session messages, tool calls, artifacts, usage metrics, and errors.
* Nested model, tool, and trace spans when the harness includes normalized trace data.

### Reporter options

Pass options to `new BraintrustVitestEvalsReporter()` to control where results are logged:

| Option             | Description                                                       |
| ------------------ | ----------------------------------------------------------------- |
| `projectName`      | Braintrust project name. Required unless `projectId` is set.      |
| `projectId`        | Braintrust project ID. Required unless `projectName` is set.      |
| `experimentName`   | Experiment name. Defaults to a timestamped `vitest-evals-*` name. |
| `displaySummary`   | Whether to print the Braintrust experiment summary after the run. |
| `metadata`         | Experiment-level metadata.                                        |
| `tags`             | Experiment-level tags.                                            |
| `baseExperiment`   | Base experiment name for comparisons.                             |
| `baseExperimentId` | Base experiment ID for comparisons.                               |

## Key concepts

### `wrapVitest`

`wrapVitest` wraps Vitest's `test`, `describe`, and `expect` with Braintrust tracking.

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const { test, expect, describe } = wrapVitest(vitest, {
  projectName: "my-project",
  displaySummary: true,
});
```

Each `describe` creates one Braintrust experiment. Braintrust appends a timestamp to make each run unique. The project groups experiments together and defaults to the suite name if `projectName` is not set.

### Test configuration

`test` accepts an optional config object between the name and the test function:

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
test(
  "test name",
  {
    input: { prompt: "Hello" },
    expected: "Hello!",
    metadata: { category: "greeting" },
    tags: ["smoke"],
    scorers: [myScorer],
    data: [{ input: "Hello", expected: "Hello!" }],
  },
  async ({ input, expected, metadata }) => {
    return myFunction(input);
  },
);
```

### Scorers

A scorer receives `{ output, expected, input, metadata }` and returns a name and score:

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
const exactMatch = ({ output, expected }: { output: unknown; expected: unknown }) => ({
  name: "exact_match",
  score: output === expected ? 1 : 0,
});
```

Scorers run after each test, including failed tests. Errors inside scorers are caught and logged.

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Factuality, Levenshtein } from "autoevals";

test("quality", { scorers: [Factuality, Levenshtein] }, async ({ input }) => {
  return myModel(input.prompt);
});
```

### Logging helpers

Use `logOutputs` and `logFeedback` inside a `wrapVitest` test to log additional data to the current span:

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
logOutputs({ summary, tokens_used: 412 });
```

### Inline data and dataset support

Define data inline:

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
test(
  "sentiment",
  {
    data: [
      { input: "great product!", expected: "positive" },
      { input: "terrible experience", expected: "negative" },
    ],
    scorers: [
      ({ output, expected }) => ({
        name: "accuracy",
        score: output === expected ? 1 : 0,
      }),
    ],
  },
  async ({ input }) => classifySentiment(input),
);
```

Or load from a managed Braintrust dataset:

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initDataset } from "braintrust";

const data = await initDataset({
  project: "my-project",
  dataset: "my-dataset",
}).fetchedData();

test("eval", { data, scorers: [Factuality] }, async ({ input }) => {
  return myModel(input.prompt);
});
```

Both approaches expand into separate test cases and Braintrust spans automatically.

## Resources

* [Braintrust autoevals library](/evaluate/autoevals)
* [Vitest documentation](https://vitest.dev/)
* [`vitest-evals` package](https://www.npmjs.com/package/vitest-evals)
