I was looking at the latest Lmarena data and wanted to see how Opus 4.6 has been doing there. Did some analysis of the data, and it's impressive that Anthropic has somehow cracked the code here. It used to be Google models dominating this leaderboard, but
Detailed Analysis
Detailed analysis coming soon.
Read original article →