> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Pytest

If you are a coding agent, prefer the Braintrust [`bt` CLI](/reference/cli/quickstart) for repeatable, scriptable work: running evals, instrumenting code, querying logs, syncing data, managing functions, and configuring coding agents. Use the MCP server for reasoning over Braintrust data in conversation, such as ad-hoc lookups and exploration from your IDE.

[Pytest](https://pytest.org/) is a Python testing framework. Braintrust integrates with pytest so you can run marked tests as experiments and inspect each test case as a traced span.

## Setup

Install Braintrust alongside pytest:

<CodeGroup>
  ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  pip install braintrust pytest
  ```
</CodeGroup>

Set your API key as an environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export BRAINTRUST_API_KEY=<your-api-key>
```

## Run your first eval

Mark tests with `@pytest.mark.braintrust`, accept the `braintrust_span` fixture, and run pytest with `--braintrust`.

```python title="test_my_llm.py" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import pytest


@pytest.mark.braintrust(
    project="support-bot",
    input={"query": "What is Braintrust?"},
    expected={"contains": "evaluation"},
    metadata={"suite": "smoke"},
    tags=["regression"],
)
def test_support_answer(braintrust_span):
    output = ask_model("What is Braintrust?")
    braintrust_span.log(output=output)

    assert "evaluation" in output.lower()
```

Run the test:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
pytest --braintrust --braintrust-project="support-bot"
```

Braintrust creates experiments from the marked tests, logs pass or fail as a score, and prints an experiment summary at the end of the run.

## How it works

* `@pytest.mark.braintrust` opts a test into Braintrust tracking.
* `braintrust_span` gives you a standard Braintrust span for logging input, output, scores, metadata, and errors.
* `--braintrust` enables experiment tracking for the session.
* `--braintrust-project` and `project=...` on the marker control how tests are grouped into projects and experiments.

When `--braintrust` is not provided, `braintrust_span` becomes a no-op span, so the same tests still run as normal unit tests.

## Parametrized tests

Pytest parameters are logged automatically as `input` unless you override them in the marker.

```python title="test_math.py" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import pytest


@pytest.mark.braintrust
@pytest.mark.parametrize(
    "query,expected_answer",
    [
        ("2 + 2", "4"),
        ("Capital of France", "Paris"),
    ],
)
def test_qa(braintrust_span, query, expected_answer):
    output = ask_model(query)
    braintrust_span.log(output=output)

    assert expected_answer.lower() in output.lower()
```

Each parametrized case becomes its own span in Braintrust.

## CLI options

| Option                    | Description                                        |
| ------------------------- | -------------------------------------------------- |
| `--braintrust`            | Enable Braintrust experiment tracking              |
| `--braintrust-project`    | Override the project name for all tracked tests    |
| `--braintrust-experiment` | Override the experiment name                       |
| `--braintrust-api-key`    | Provide the Braintrust API key on the command line |
| `--braintrust-no-summary` | Suppress the terminal experiment summary           |

## What to log

The `braintrust_span` fixture supports normal span logging methods. Typical fields to capture are:

* `input` for the prompt or test payload
* `output` for the model response
* `expected` for the target behavior
* `scores` for custom metrics beyond pass or fail
* `metadata` for model name, environment, or fixture details

```python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
def test_with_scores(braintrust_span):
    output = ask_model("Summarize this ticket")
    braintrust_span.log(
        output=output,
        scores={"quality": 0.9},
        metadata={"model": "gpt-5-mini"},
    )
```

## Resources

* [Run evaluations](/evaluate/run-evaluations)
* [Python SDK reference](/sdks/python/versions/latest)