> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Node.js test runner

If you are a coding agent, prefer the Braintrust [`bt` CLI](/reference/cli/quickstart) for repeatable, scriptable work: running evals, instrumenting code, querying logs, syncing data, managing functions, and configuring coding agents. Use the MCP server for reasoning over Braintrust data in conversation, such as ad-hoc lookups and exploration from your IDE.

[Node.js test runner](https://nodejs.org/api/test.html) is the built-in test framework in Node.js.
Braintrust integrates with `node:test` so you can use the integrated Node.js test runner to run evals.

## Setup

Install Braintrust in your Node.js project:

<CodeGroup>
  ```bash npm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  npm install braintrust
  ```

  ```bash pnpm theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  pnpm add braintrust
  ```
</CodeGroup>

Set your API key as an environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export BRAINTRUST_API_KEY=<your-api-key>
```

## Run your first eval

Create a suite with `initNodeTestSuite()`, then pass `suite.eval()` directly to `test()`.

```typescript title="translation.eval.test.ts" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import assert from "node:assert/strict";
import { after, describe, test } from "node:test";
import { initNodeTestSuite } from "braintrust";

async function translate(text: string) {
  if (text === "hello") {
    return "hola";
  }

  return text;
}

describe("Translation evals", () => {
  const suite = initNodeTestSuite({
    projectName: "support-bot",
    after,
  });

  test(
    "translates hello",
    suite.eval(
      {
        input: { text: "hello" },
        expected: "hola",
        tags: ["smoke", "translation"],
      },
      async ({ input }) => {
        if (typeof input.text !== "string") {
          throw new Error("Expected input.text to be a string");
        }

        const result = await translate(input.text);
        assert.equal(result, "hola");
        return result;
      },
    ),
  );
});
```

Run the test:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
node --test translation.eval.test.ts
```

Braintrust creates an experiment for the suite, records each tracked test as a span, and prints a summary when the suite flushes.

## Separate evals from unit tests

Keep eval files separate from regular unit tests with a naming convention such as `*.eval.test.ts` or a dedicated `evals/` directory.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
# Unit tests
node --test tests/unit/**/*.test.ts

# Evals
node --test tests/evals/**/*.eval.test.ts
```

This keeps slower model-backed tests separate while letting untracked tests continue to use the native runner with no Braintrust involvement.

## How it works

* `initNodeTestSuite()` creates one Braintrust experiment for the suite.
* `suite.eval()` returns a normal `node:test` callback, so you can mix tracked evals and regular unit tests in the same file.
* The callback return value becomes the logged `output` and is passed to scorers.
* Passing `after` from `node:test` registers an automatic flush hook at the end of the suite.

When you do not use `suite.eval()`, tests run normally and are not logged to Braintrust.

## Add scorers

Scorers receive `{ output, expected, input, metadata }` and return a score object.

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
test(
  "translation quality",
  suite.eval(
    {
      input: { text: "good morning" },
      expected: "buenos dias",
      scorers: [
        ({ output, expected }) => ({
          name: "exact_match",
          score: output === expected ? 1 : 0,
        }),
      ],
    },
    async ({ input }) => {
      if (typeof input.text !== "string") {
        throw new Error("Expected input.text to be a string");
      }

      return await translate(input.text);
    },
  ),
);
```

You can also use scorers from [`autoevals`](/evaluate/autoevals):

```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Levenshtein } from "autoevals";

test(
  "translation similarity",
  suite.eval(
    {
      input: { text: "goodbye" },
      expected: "adios",
      scorers: [Levenshtein],
    },
    async ({ input }) => {
      if (typeof input.text !== "string") {
        throw new Error("Expected input.text to be a string");
      }

      return await translate(input.text);
    },
  ),
);
```

## Resources

* [Run evaluations](/evaluate/run-evaluations)
* [TypeScript SDK reference](/sdks/typescript/api-reference)
* [Node.js test runner documentation](https://nodejs.org/api/test.html)
