AI Agent Cost Benchmarks
Weekly anonymized cost benchmarks from the Metrx community. See how different LLMs perform in real-world agent deployments.
| Model | Provider | Avg Cost | p95 Cost | Latency | Calls |
|---|---|---|---|---|---|
| Loading benchmarks... | |||||
Methodology
Benchmarks are computed from anonymized, aggregated telemetry across all Metrx community members. Individual call data is never exposed — only per-model aggregates with a minimum threshold of API calls are included.
Costs are measured at the API response level using actual billing amounts where available, falling back to token-count × published pricing. Latency is measured end-to-end from request initiation to final response byte. All values are aggregated over a rolling 90-day window and refreshed weekly via a Supabase materialized view.
The public JSON API at /api/benchmarks returns the same data displayed on this page, cached for 1 hour. You are free to consume it programmatically — we only ask that you attribute the source.
Machine-Readable API
Agents can consume benchmarks programmatically for optimization decisions
Fetch real-time benchmarks as JSON via our public API:
GET https://metrxbot.com/api/benchmarksExample usage in your agent code:
const response = await fetch('https://metrxbot.com/api/benchmarks');
const benchmarks = await response.json();
// Find most cost-efficient model
const cheapest = benchmarks.summary.most_cost_efficient;
console.log(`Use ${cheapest.model} from ${cheapest.provider}`);
// Check average latency before picking a model
const fast_models = benchmarks.models.filter(m => m.avg_latency_ms < 1000);Get Your Own Cost Data
Sign up for free to track your own agent costs, compare against these benchmarks, and get personalized optimization recommendations.