> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Set up human review

> Capture structured human judgment on production traces to build ground truth, validate automated scores, and surface edge cases your scorers miss.

export const feature_0 = "Unlimited human review scorers";

export const verb_0 = "are";

Human review is a critical part of evaluating AI applications. While Braintrust helps you automatically evaluate AI software with scorers, human feedback provides essential ground truth and quality assessment.

Braintrust integrates human feedback from end users, subject matter experts, and product teams in one place. Use human review to:

* Evaluate and compare experiments.
* Assess the efficacy of automated scoring methods.
* Curate production logs into evaluation datasets.
* Label categorical data and provide corrections.
* Track quality trends over time.

A typical workflow has three stages, each covered on its own page:

1. **Configure review scores** (this page) so reviewers have something to capture.
2. [Score traces and datasets](/annotate/human-review/score-traces) to record judgments row by row.
3. [Manage review work](/annotate/human-review/manage-review-work) to assign, filter, and track review across your team.

When the same span is scored by more than one person, see [Review with multiple reviewers](/annotate/human-review/multiple-reviewers) for how Braintrust combines their scores.

## Configure review scores

Review scores appear in all logs and experiments in a project. Use them for quality control, data labeling, or [feedback collection](/annotate/human-review/score-traces).

<Note>
  {feature_0} {verb_0} only available on [Pro and Enterprise plans](/plans-and-limits#plans).
</Note>

1. Go to **<Icon icon="settings-2" /> Settings** > [**<Icon icon="list-checks" /> Human review**](https://www.braintrust.dev/app/~/configuration/review).
2. Click **+ Human review score**.
3. Enter a name and description for your score. Descriptions support Markdown.
4. Select a score type:
   * **Categorical score**: Predefined options with assigned scores. Each option gets a unique percentage value between 0% and 100% (stored as 0 to 1). Use for classification tasks like sentiment or correctness categories. Also supports writing to the `expected` field instead of creating a score.
   * **Continuous score**: Numeric values between 0% and 100% with a slider input control. Use for subjective quality assessments like helpfulness or tone.
   * **Free-form input**: String values written to the `metadata` field at a specified path. Use for explanations, corrections, or structured feedback.
5. (Optional) Expand **Score visibility** to configure who sees this score during review:
   * Select members or permission groups to limit visibility to specific reviewers. If you don't select anyone, the score is visible to everyone.
   * Click **+ Condition** to show the score only when a filter condition is true, such as when another score exceeds a threshold. See [Show scores conditionally](/annotate/human-review#show-scores-conditionally) for details.
6. Click **Save**.

<Note>
  Score visibility controls which reviewers see a score in the review modal. It declutters the review experience for large teams. It is not an access control or security boundary: any reviewer with hidden scores can reveal them with the **Show all scores** toggle.
</Note>

<Tip>
  You can also create human review scores as you review traces. In the trace view, click **+ Human review score** and define the score as described above.
</Tip>

## Restrict score visibility

By default, every reviewer sees every configured score. Restrict a score to specific members or permission groups so only relevant reviewers see it in the review modal, which keeps the review experience focused for large teams.

To set visibility on a new score, expand **Score visibility** while configuring it (see the steps above) and select the members or permission groups that should see it.

To change visibility on an existing score:

1. Go to **<Icon icon="settings-2" /> Settings** > [**<Icon icon="list-checks" /> Human review**](https://www.braintrust.dev/app/~/configuration/review), or open the review panel while reviewing.
2. Select the <Icon icon="pencil" /> edit icon next to the score name.
3. Expand **Score visibility** and select the members or permission groups that should see the score. To make it visible to everyone again, deselect all.
4. Click **Save**.

If a row has configured scores but none are visible to the current reviewer, the review panel shows **No scores are available to you for this row**.

Score visibility is a display filter, not an access control rule. Any reviewer who has hidden scores can reveal them with the **Show all scores** toggle in the review panel. To enforce who can read scores, use [project permissions](/admin/access-control) instead.

## Show scores conditionally

You can configure filter conditions that control when a score appears in the review panel. A score with conditions only shows when all its conditions evaluate to true for the span being reviewed.

This is useful for dependent workflows. For example, show a detailed quality rubric only when a triage score indicates the trace needs closer review, or surface a correction score only when the expected output matches a specific category.

To add conditions to a new score, expand **Score visibility** while configuring it and click **+ Condition**. To add or edit conditions on an existing score:

1. Go to **<Icon icon="settings-2" /> Settings** > [**<Icon icon="list-checks" /> Human review**](https://www.braintrust.dev/app/~/configuration/review), or open the review panel while reviewing.
2. Select the <Icon icon="pencil" /> edit icon next to the score name.
3. Expand **Score visibility** and click **+ Condition**.
4. Add conditions using SQL syntax. Conditions are organized into three scopes:

   * **Span**: Evaluates against the current span. Can reference other scores (`scores.ScoreName`), expected values (`expected.field`), or metadata (`metadata.path`).
   * **Trace**: Evaluates against all spans in the trace and is true when at least one span matches. Can reference `span_attributes`, `metrics`, `scores`, `error`, and `tags`.
   * **Subspan**: Evaluates against all child spans of the current span and is true when at least one child span matches. Uses the same fields as **Trace** conditions.

   Within each scope, multiple conditions are joined with AND. Conditions across scopes are also joined with AND: all configured scopes must pass for the score to appear.
5. Click **Save**.

Score names in the settings table display an indicator icon when conditions are configured. Hover over it to see the full "Show when" expression.

User or group visibility and conditional visibility are evaluated together: both must pass for a score to appear. Conditional visibility is a display rule based on the score data, not an access control boundary.

## Create and edit scores inline

While reviewing, create new score types or edit existing configurations without navigating to settings:

* To create a new score, click **+ Human review score**.
* To edit an existing score, select the <Icon icon="pencil" /> edit icon next to the score name.

Changes apply immediately across your project.

<Note>
  Editing a score configuration affects how that score works going forward. Existing score values on traces remain unchanged.
</Note>

## Annotate in playgrounds

For a lighter-weight alternative to the full review workflow, you can [annotate outputs directly in playgrounds](/evaluate/playgrounds#annotate-outputs) and then get prompt improvement suggestions based on your annotations.

Playground annotations help with rapid iteration during prompt development, while the [**<Icon icon="list-checks" /> Review**](https://www.braintrust.dev/app/~/review) page is better for systematic evaluation of production logs and experiments.

## Capture production feedback

In addition to internal reviews, capture feedback directly from production users. Production feedback helps you understand real-world performance and build datasets from actual user interactions.

See [Capture user feedback](/instrument/user-feedback) for implementation details and [Build datasets from user feedback](/annotate/datasets/create#curate-from-user-feedback) to learn how to turn feedback into evaluation datasets. You can also use [dashboards](/observe/dashboards) to monitor user satisfaction trends and correlate automated scores with user feedback.

## Next steps

* [Score traces and datasets](/annotate/human-review/score-traces) to start recording human judgments
* [Manage review work](/annotate/human-review/manage-review-work) across your team
* [Add labels and corrections](/annotate/labels) to categorize and tag traces
* [Build datasets](/annotate/datasets/create#promote-traces-from-logs) from reviewed logs
* [Run evaluations](/evaluate/run-evaluations) with human-reviewed datasets
