LiveBench-2024-06-24 (Initial version) vs 2024 end of year (6 months, 8 days later). SOTA Global score increase of 14.51. SOTA biggest increase was reasoning: 27.58. Caveat: Reasoning models including o1 are still poor at spatial and compositional reasoning. Livebench should add these two types.