Tracing
Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: callinit_logger() then auto_instrument(), and supported libraries are traced with no further code changes (see Install and instrument). The APIs below set up logging, trace your own code, and flush and link to your traces.
init_logger()
Creates a project logger for production traces and makes it the current logger by default. Call it once on startup.
Logger.
Arguments (all optional):
project(str): project name for logs. If omitted, logs go to the global project.project_id(str): project ID. Takes precedence overproject.api_key(str): API key. Defaults toBRAINTRUST_API_KEY.app_url(str): Braintrust app URL. Defaults tohttps://www.braintrust.dev.org_name(str): organization name, useful when credentials can access multiple orgs.async_flush(bool): defaults totrue. Setfalsewhen you need synchronous flush behavior.set_current(bool): defaults totrue. Controls whethercurrent_logger()returns this logger.
auto_instrument()
Patches supported AI and ML libraries so their calls are traced to Braintrust automatically. This is the recommended way to capture AI calls.
auto_instrument() after init_logger() and before creating provider or framework clients. If your app imports provider classes directly, such as from openai import OpenAI, call auto_instrument() before those imports when possible so the SDK can patch the imported symbols.
Returns: dict[str, bool], mapping each integration name to whether it was successfully instrumented. Missing optional dependencies are skipped.
Arguments (all optional): each supported integration has a boolean flag that defaults to true. Set a flag to false to skip that integration. For the full list of integration flags, see Disabling specific integrations.
For example, disable OpenAI instrumentation while keeping the other integrations enabled:
traced()
Decorates a function so each call creates a span, logs thrown errors, and ends the span automatically.
start_span()
Starts a span manually when you need more control than @traced gives you, such as when the work doesn’t fit inside a single function.
Span.
Arguments (all optional):
name(str): span name shown in Braintrust.type(SpanTypeAttribute): span type, such as a task or LLM span.span_attributes(Mapping[str, Any]): additional span attributes.set_current(bool): whether to make this span the current span.parent(str): explicit parent span or object ID.event(Any): initial event data to log on the span.
current_logger() and current_span()
Return the currently active logger or span, so you can add data without holding a direct reference.
current_span() returns a no-op span object when no span is active, so it is safe to call from helper functions.
flush()
Flushes pending rows to Braintrust.
logger.flush(), span.flush(), or braintrust.flush() before the process exits.
permalink()
Builds a Braintrust app URL for an exported span slug, so you can link straight to a trace from your own logs or app.
str.
set_masking_function()
Installs a global masking function that runs over logged data before it leaves your process, so you can redact sensitive values before they reach Braintrust.
None to disable masking.
Evaluations
An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models.Eval() is the main entry point. The other APIs here run async evaluations and customize reporting.
Eval()
Runs an evaluation from your data, a task, and scorers: it runs the task over every case, scores the outputs, logs each row to an experiment, and returns a summary you can compare across runs.
EvalResultWithSummary.
Arguments:
name(str, required): project name in Braintrust. Passed as the first positional argument.data(EvalData, required): iterator over evaluation cases. Each case should include aninputand can includeexpected,metadata, andtags.task(EvalTask, required): function under test. Receives one input and returns the output to score.scores(Sequence[EvalScorer], required): scorers that evaluate the task output.experiment_name(str): experiment name. If omitted, one is generated automatically.trial_count(int): number of times to run each input, useful for non-deterministic applications.metadata(Metadata): extra experiment metadata for filtering and analysis.tags(Sequence[str]): tags to associate with the experiment.timeout(float): per-evaluation timeout in seconds.max_concurrency(int): maximum concurrent tasks and scorers.project_id(str): project ID. Takes precedence overname.base_experiment_name(str): experiment to compare against.base_experiment_id(str): experiment ID to compare against. Takes precedence overbase_experiment_name.no_send_logs(bool): run locally without sending logs to Braintrust.parameters(EvalParameters | RemoteEvalParameters): parameters to pass to the evaluator.
EvalAsync()
Asynchronous version of Eval(). Use it when your task or scorers perform async I/O.
EvalResultWithSummary. Accepts the same arguments as Eval(), with async tasks and scorers. For data, use a synchronous callable that returns a list or an async generator. An async function that returns a list is not supported.
Reporter()
Creates a reporter for custom evaluation reporting, such as emitting results to CI.
name(str, required): reporter name.report_eval(Callable, required): called with each evaluator and its result.report_run(Callable, required): called with all evaluator reports and returns whether the run succeeded.
Experiments
An experiment is a single evaluation run logged to a project. Use these APIs when you want to create an experiment and log rows yourself, instead of lettingEval() manage one for you.
init() / init_experiment()
Creates or opens an experiment in a project for manual experiment logging.
Experiment.
Arguments:
project(str): project name. Provideprojectorproject_id.project_id(str): project ID. Takes precedence overproject.experiment(str): experiment name. If omitted, one is generated automatically.dataset(Dataset | DatasetRef): dataset associated with the experiment.base_experiment(str): experiment name to compare against.base_experiment_id(str): experiment ID to compare against.metadata(Metadata): extra metadata for filtering and analysis.tags(Sequence[str]): tags to associate with the experiment.set_current(bool): defaults totrue. Sets the current experiment forlog().update(bool): continue logging to an existing experiment if it exists.
Datasets
A dataset is a versioned collection of cases you manage in Braintrust and reuse across experiments and evals. Useinit_dataset() to create a dataset or open an existing one.
init_dataset()
Creates or opens a dataset in a project.
Dataset.
Arguments:
project(str): project name. Provideprojectorproject_id.project_id(str): project ID. Takes precedence overproject.name(str): dataset name. If omitted, one is generated automatically.description(str): dataset description.version(str | int): dataset version to read. Defaults to latest.metadata(Metadata): extra dataset metadata.
Prompts and functions
In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt withload_prompt(), or invoke a deployed function with invoke().
load_prompt()
Loads a saved prompt from a Braintrust project. Use the returned prompt’s build() to render request parameters with runtime variables.
Prompt.
Arguments:
project(str): project name. Provideproject,project_id, orid.project_id(str): project ID. Takes precedence overproject.slug(str): prompt slug.id(str): prompt ID. Takes precedence over project, slug, and version.version(str | int): prompt version. Defaults to latest.environment(str): environment to load from.versiontakes precedence when both are provided.defaults(Mapping[str, Any]): default variables used when rendering the prompt.no_trace(bool): if true, built prompt metadata is not included in traces.
invoke()
Invokes a Braintrust function and returns either a plain Python object or a BraintrustStream.
BraintrustStream when stream=True.
Arguments:
input(Any, required): input passed to the function.function_id(str): function ID to invoke.project_name(str): project containing the function.project_id(str): project ID containing the function.slug(str): function slug.version(str): function version.stream(bool): return a stream when supported.
init_function()
Creates a Python callable for a Braintrust function, usable as an eval task or scorer.
project_name(str, required): project containing the function.slug(str, required): function slug.version(str): function version. Defaults to latest.
Attachments
Attachments let you log files or large payloads without storing the full bytes inline in the span. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.Attachment
Wraps file data so you can attach it to logged data. The uploaded value is replaced with an attachment reference in Braintrust logs.
ReadonlyAttachment
Reads an already-uploaded attachment.
data: lazily downloads the attachment contents as bytes.metadata(): returns attachment metadata.status(): returns upload or availability status.
Configuration
Configure the SDK with environment variables, or pass the equivalent options toinit_logger() and login().
set_http_adapter()
Sets a custom requests HTTP adapter for Braintrust network requests. Use it for custom retry policies and timeouts.
Environment variables
BRAINTRUST_API_KEY(required): Braintrust API key.BRAINTRUST_API_URL: Braintrust API URL. Defaults tohttps://api.braintrust.dev.BRAINTRUST_APP_URL: Braintrust app URL. Defaults tohttps://www.braintrust.dev.BRAINTRUST_ORG_NAME: organization name, useful when credentials can access multiple orgs.