Leaderboard

Top 10 AI Models

Compare the top AI models across key benchmarks. Hover over cards to see detailed scores.

Claude 3.5 OpusAnthropic

1
LLMAPIWEB
1847pts
active

GPT-4 TurboOpenAI

2
LLMAPIWEBMULTIMODAL
1823pts
trending

Gemini 2.0 UltraGoogle

3
LLMMULTIMODAL
1798pts
new

Claude 3.5 SonnetAnthropic

4
LLMAPI
1752pts
active

Llama 3.1 405BMeta

5
LLM
1698pts
active

Mistral Large 2Mistral AI

6
LLMAPI
1654pts
trending

DeepSeek V3DeepSeek

7
LLM
1621pts
new

Grok-2xAI

8
LLMWEB
1587pts
active

Qwen 2.5 72BAlibaba

9
LLM
1543pts
active

Gemini 1.5 ProGoogle

10
LLMMULTIMODALAPI
1498pts
active

Scroll to focus on a card and view benchmark details

Benchmark Reference

MMLU

Massive Multitask Language Understanding. Tests knowledge across 57 subjects.

HumanEval

Code generation benchmark. Measures functional correctness of synthesized programs.

GPQA

Graduate-level science questions. Tests expert-level reasoning in physics, biology, chemistry.

MATH

Competition mathematics problems. Covers algebra, geometry, calculus, and more.

GSM8K

Grade school math word problems. Tests multi-step arithmetic reasoning.

ARC-C

AI2 Reasoning Challenge. Science questions requiring reasoning beyond retrieval.

Scoring Methodology

Composite scores are calculated using a weighted average of benchmark results, normalized across all models. Weights reflect real-world task importance: reasoning (35%), coding (25%), knowledge (20%), math (20%).

Data Sources

Benchmark data is aggregated from official model papers, provider documentation, and independent evaluations. Last updated: January 2025.

aicomp.optamize.biz

Powered by OPTA
Updated daily