Getting Started
Installation and basic setup for Tally.
Installation
Install the @tally-evals/tally package along with its peer dependencies.
pnpm add @tally-evals/tallyPeer Dependencies
Tally requires AI SDK providers if you use LLM-based metrics.
pnpm add ai @ai-sdk/google # or @ai-sdk/openai, etc.Basic Configuration
Tally works out of the box with default settings, but you can configure shared storage and evaluation parameters in your project.
TypeScript Setup
Tally uses subpath exports (@tally-evals/tally/scorers, etc.) which require modern module resolution. Ensure your tsconfig.json uses moduleResolution: "bundler" (or "node16"/"nodenext"):
{
"compilerOptions": {
"moduleResolution": "bundler"
}
}Your First Evaluation
Let's walk through a complete, minimal example of evaluating an agent's response relevance.
import {
createTally,
createEvaluator,
runAllTargets,
defineSingleTurnEval,
thresholdVerdict
} from '@tally-evals/tally';
import { createAnswerRelevanceMetric } from '@tally-evals/tally/metrics';
import { google } from '@ai-sdk/google';
// 1. Define your evaluation model
const model = google('models/gemini-2.0-flash');
// 2. Setup metrics and evals
const relevanceMetric = createAnswerRelevanceMetric({ provider: model });
const relevanceEval = defineSingleTurnEval({
name: 'Answer Relevance',
metric: relevanceMetric,
verdict: thresholdVerdict(0.7),
});
// 3. Create the evaluator
const evaluator = createEvaluator({
name: 'Agent Quality',
evals: [relevanceEval],
context: runAllTargets(),
});
// 4. Run on your data
const conversations = [
{
id: 'conv-1',
steps: [
{
stepIndex: 0,
input: { role: 'user', content: 'What is Tally?' },
output: [{ role: 'assistant', content: 'Tally is an evaluation framework.' }],
timestamp: new Date(),
},
],
},
];
const tally = createTally({ data: conversations, evaluators: [evaluator] });
const report = await tally.run();
// 5. Check results
console.log(JSON.stringify(report.evalSummaries, null, 2));