Start reviewing
- Go to Review and select the type of data to review:
- Log spans: production traces and debugging sessions.
- Experiment spans: evaluation results and test runs.
- Dataset rows: test cases and examples.
- Select a row and set scores. You can also add comments and tags while reviewing.
- Click Mark complete to record your review.
Not all score types appear on dataset rows. Only categorical scores configured to “write to expected” and free-form scores are available for dataset reviews, since datasets store test data (input/expected pairs) rather than subjective quality assessments.
Change the trace layout
While reviewing log and experiment traces, you see detailed information about the flagged span by default. To switch between hierarchy, timeline, thread, and other layouts, see Examine traces. When the raw trace is hard to read, build a custom view that renders each span as a purpose-built annotation interface. This is especially useful for large-scale review and for subject-matter experts who shouldn’t have to parse JSON to score accurately.Next steps
- Manage review work to assign and track review across your team
- Review with multiple reviewers on the same span
- Add labels and corrections to categorize and tag traces