CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

By 818-aic / January 23, 2025

SAN FRANCISCO: SAN FRANCISCO, Jan. 23, 2025 /PRNewswire/ — The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.

Leave a Comment Cancel Reply