Tally

createTally

Create a Tally container and run evaluations.

createTally()

The main entry point for running evaluations. Creates a Tally container that orchestrates the evaluation pipeline.

Import

import { createTally } from 'tally';
import type { Tally, TallyRunReport } from 'tally';

createTally()

Creates a type-safe Tally container for running evaluations. Pass your data and evals directly - no need for intermediate evaluator objects.

data:readonly TContainer[]

Array of data containers to evaluate (Conversation[] or DatasetItem[]).

evals:readonly Eval[]

Array of eval definitions (single-turn, multi-turn, or scorer evals).

context?:EvaluationContext

Optional shared evaluation context for all evals.

EvaluationContext
singleTurn?:SingleTurnRunPolicy

Single-turn run policy (which steps to evaluate).

metadata?:Record<string, unknown>

Optional metadata for this run.

tally.run()

Executes the evaluation pipeline and returns a type-safe report.

cache?:MemoryCache<MetricScalar>

Optional cache for metric results (avoids recomputing identical inputs).

llmOptions?:GenerateObjectOptions

Optional LLM execution options (temperature, retries, etc.).

metadata?:Record<string, unknown>

Optional metadata to include in the report.

Example

import {
  createTally,
  defineSingleTurnEval,
  defineMultiTurnEval,
  thresholdVerdict,
} from 'tally';
import type { Conversation } from 'tally';
import { createAnswerRelevanceMetric, createRoleAdherenceMetric } from 'tally/metrics';
import { google } from '@ai-sdk/google';

const model = google('models/gemini-2.5-flash-lite');

// Create metrics
const answerRelevance = createAnswerRelevanceMetric({ provider: model });
const roleAdherence = createRoleAdherenceMetric({
  expectedRole: 'helpful assistant',
  provider: model,
});

// Create evals with verdict policies
const relevanceEval = defineSingleTurnEval({
  name: 'Answer Relevance',
  metric: answerRelevance,
  verdict: thresholdVerdict(0.7),
});

const roleEval = defineMultiTurnEval({
  name: 'Role Adherence',
  metric: roleAdherence,
  verdict: thresholdVerdict(0.8),
});

// Create Tally and run
const conversation: Conversation = {
  id: 'conv-1',
  steps: [
    {
      stepIndex: 0,
      input: { role: 'user', content: 'Hello!' },
      output: [{ role: 'assistant', content: 'Hi there! How can I help?' }],
      timestamp: new Date(),
    },
  ],
};

const tally = createTally({
  data: [conversation],
  evals: [relevanceEval, roleEval],
});

const report = await tally.run();

// Type-safe access via view API
const view = report.view();
const step0 = view.step(0);
console.log(step0['Answer Relevance']?.outcome?.verdict);

On this page