AI Agent Cost Benchmarks

Weekly anonymized cost benchmarks from the Metrx community. See how different LLMs perform in real-world agent deployments.

Model	Provider	Avg Cost	p95 Cost	Latency	Calls
Loading benchmarks...

Methodology

Benchmarks are computed from anonymized, aggregated telemetry across all Metrx community members. Individual call data is never exposed — only per-model aggregates with a minimum threshold of API calls are included.

Costs are measured at the API response level using actual billing amounts where available, falling back to token-count × published pricing. Latency is measured end-to-end from request initiation to final response byte. All values are aggregated over a rolling 90-day window and refreshed weekly via a Supabase materialized view.

The public JSON API at /api/benchmarks returns the same data displayed on this page, cached for 1 hour. You are free to consume it programmatically — we only ask that you attribute the source.

Machine-Readable API

Agents can consume benchmarks programmatically for optimization decisions

Fetch real-time benchmarks as JSON via our public API:

GET https://metrxbot.com/api/benchmarks

Example usage in your agent code:

const response = await fetch('https://metrxbot.com/api/benchmarks');
const benchmarks = await response.json();

// Find most cost-efficient model
const cheapest = benchmarks.summary.most_cost_efficient;
console.log(`Use ${cheapest.model} from ${cheapest.provider}`);

// Check average latency before picking a model
const fast_models = benchmarks.models.filter(m => m.avg_latency_ms < 1000);

Get Your Own Cost Data

Sign up for free to track your own agent costs, compare against these benchmarks, and get personalized optimization recommendations.