Getting Started

Installation

Install the @tally-evals/tally package along with its peer dependencies.

pnpm add @tally-evals/tally

Peer Dependencies

Tally requires AI SDK providers if you use LLM-based metrics.

pnpm add ai @ai-sdk/google # or @ai-sdk/openai, etc.

Basic Configuration

Tally works out of the box with default settings, but you can configure shared storage and evaluation parameters in your project.

TypeScript Setup

Tally uses subpath exports (@tally-evals/tally/scorers, etc.) which require modern module resolution. Ensure your tsconfig.json uses moduleResolution: "bundler" (or "node16"/"nodenext"):

{
  "compilerOptions": {
    "moduleResolution": "bundler"
  }
}

Your First Evaluation

Let's walk through a complete, minimal example of evaluating an agent's response relevance.

import { 
  createTally, 
  createEvaluator, 
  runAllTargets,
  defineSingleTurnEval,
  thresholdVerdict
} from '@tally-evals/tally';
import { createAnswerRelevanceMetric } from '@tally-evals/tally/metrics';
import { google } from '@ai-sdk/google';

// 1. Define your evaluation model
const model = google('models/gemini-2.0-flash');

// 2. Setup metrics and evals
const relevanceMetric = createAnswerRelevanceMetric({ provider: model });

const relevanceEval = defineSingleTurnEval({
  name: 'Answer Relevance',
  metric: relevanceMetric,
  verdict: thresholdVerdict(0.7),
});

// 3. Create the evaluator
const evaluator = createEvaluator({
  name: 'Agent Quality',
  evals: [relevanceEval],
  context: runAllTargets(),
});

// 4. Run on your data
const conversations = [
  {
    id: 'conv-1',
    steps: [
      {
        stepIndex: 0,
        input: { role: 'user', content: 'What is Tally?' },
        output: [{ role: 'assistant', content: 'Tally is an evaluation framework.' }],
        timestamp: new Date(),
      },
    ],
  },
];

const tally = createTally({ data: conversations, evaluators: [evaluator] });
const report = await tally.run();

// 5. Check results
console.log(JSON.stringify(report.evalSummaries, null, 2));