- Published on
MLE-Bench introduces a new benchmark to evaluate AI agents on real-world ML engineering tasks using Kaggle competitions. This post highlights key findings, including resource scaling effects, debugging challenges, and the performance of different agent frameworks.