Tally

Aggregators

Summarize per-target scores into dataset-level statistics.

Aggregators

Aggregators summarize arrays of values into dataset-level statistics. They are discriminated by kind and attached to single-turn metrics via the aggregators field.

Import

import {
  // Custom definitions
  defineNumericAggregator,
  defineBooleanAggregator,
  defineCategoricalAggregator,
  // Prebuilt factories
  createMeanAggregator,
  createPercentileAggregator,
  createThresholdAggregator,
  createTrueRateAggregator,
  createFalseRateAggregator,
  createDistributionAggregator,
  createModeAggregator,
  // Defaults
  getDefaultAggregators,
} from '@tally-evals/tally';

Aggregator Types

Aggregators are discriminated by kind:

KindOperates OnReturnsUse Case
'numeric'number[]numberScores, numeric raw values
'boolean'boolean[]numberBoolean raw values (rate)
'categorical'string[]Record<string, number>String/ordinal raw values

Custom Aggregator Definitions

defineNumericAggregator()

Creates a custom aggregator for numeric values. Use this when built-in aggregators don't fit your needs.

name:string

Aggregator name (appears in reports).

description?:string

Description of what this aggregator computes.

aggregate:(values: readonly number[]) => number

Aggregation function.

metadata?:Record<string, unknown>

Optional metadata.

Example:

const stdDevAggregator = defineNumericAggregator({
  name: 'StdDev',
  description: 'Standard deviation of values',
  aggregate: (values) => {
    const mean = values.reduce((a, b) => a + b, 0) / values.length;
    const squaredDiffs = values.map(v => (v - mean) ** 2);
    return Math.sqrt(squaredDiffs.reduce((a, b) => a + b, 0) / values.length);
  },
});

defineBooleanAggregator()

Creates a custom aggregator for boolean values. Use this for custom true/false counting logic.

name:string

Aggregator name (appears in reports).

description?:string

Description of what this aggregator computes.

aggregate:(values: readonly boolean[]) => number

Aggregation function returning a rate (0-1).

metadata?:Record<string, unknown>

Optional metadata.

Example:

const consecutiveTrueAggregator = defineBooleanAggregator({
  name: 'MaxTrueStreak',
  description: 'Longest consecutive true values',
  aggregate: (values) => {
    let maxStreak = 0, currentStreak = 0;
    for (const v of values) {
      if (v) { currentStreak++; maxStreak = Math.max(maxStreak, currentStreak); }
      else { currentStreak = 0; }
    }
    return maxStreak;
  },
});

defineCategoricalAggregator()

Creates a custom aggregator for categorical (string) values. Returns a distribution or other summary of categories.

name:string

Aggregator name (appears in reports).

description?:string

Description of what this aggregator computes.

aggregate:(values: readonly string[]) => Record<string, number>

Aggregation function returning category counts/frequencies.

metadata?:Record<string, unknown>

Optional metadata.


Prebuilt Aggregators

createMeanAggregator()

Computes the arithmetic mean of numeric values. The most common way to summarize scores across targets.

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: NumericAggregatorDef<'Mean'>


createPercentileAggregator()

Computes a specific percentile of numeric values. Use for p50 (median), p90, p95, p99, etc.

percentile:number

Percentile value (0-100).

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: NumericAggregatorDef<'P50'>, NumericAggregatorDef<'P95'>, etc. (name is derived from percentile value)

Example:

const p95 = createPercentileAggregator({ percentile: 95 });
// typeof p95.name is 'P95'

createThresholdAggregator()

Computes the proportion of values that meet or exceed a threshold. Useful for "% above target" metrics.

threshold?:number

Threshold value (0-1). Default: 0.5.

name?:string

Custom aggregator name.

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: NumericAggregatorDef<'Threshold >= 0.5'>, etc. (name is derived from threshold value)


createTrueRateAggregator()

Computes the proportion of true values in a boolean array. Equivalent to a "success rate".

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: BooleanAggregatorDef<'TrueRate'>


createFalseRateAggregator()

Computes the proportion of false values in a boolean array. Equivalent to a "failure rate".

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: BooleanAggregatorDef<'FalseRate'>


createDistributionAggregator()

Computes the frequency distribution of categorical values. Returns counts or proportions for each unique value.

description?:string

Custom description.

proportions?:boolean

Return proportions (true) or counts (false). Default: true.

metadata?:Record<string, unknown>

Optional metadata.

Returns: CategoricalAggregatorDef<'Distribution'>


createModeAggregator()

Returns the most frequent value in a categorical array. Useful for finding the "typical" category.

description?:string

Custom description.

metadata?:Record<string, unknown>

Optional metadata.

Returns: CategoricalAggregatorDef<'Mode'>


Default Aggregators

Tally provides default aggregators based on the metric's valueType:

import { getDefaultAggregators } from '@tally-evals/tally';

const numericAggs = getDefaultAggregators('number');
// ['Mean', 'P50', 'P75', 'P90']

const booleanAggs = getDefaultAggregators('boolean');
// ['Mean', 'P50', 'P75', 'P90', 'TrueRate']

const stringAggs = getDefaultAggregators('string');
// ['Mean', 'P50', 'P75', 'P90', 'Distribution']

Attaching Aggregators to Metrics

Aggregators are attached to single-turn metrics at definition time:

import { defineSingleTurnCode, defineBaseMetric } from '@tally-evals/tally';
import { createPercentileAggregator } from '@tally-evals/tally';

const latencyMetric = defineSingleTurnCode({
  base: defineBaseMetric({ name: 'latencyMs', valueType: 'number' }),
  compute: ({ data }) => data.latencyMs,
  aggregators: [
    createPercentileAggregator({ percentile: 50 }),
    createPercentileAggregator({ percentile: 95 }),
    createPercentileAggregator({ percentile: 99 }),
  ],
});

Default aggregators are automatically added based on valueType. Custom aggregators are merged with defaults.


Reading Aggregation Results

Access aggregation results through the report's summary:

const report = await tally.run();
const summaries = report.view().summary();

const relevanceSummary = summaries?.['Answer Relevance'];
if (relevanceSummary?.aggregations?.score) {
  console.log('Mean:', relevanceSummary.aggregations.score.mean);
  console.log('P90:', relevanceSummary.aggregations.score.p90);
}

On this page