Companies conduct “evaluations” of AI models by teams of staff and outside researchers. These are standardised tests, known as benchmarks, that assess models’ abilities and the performance of ...
A team of Apple researchers has released a paper scrutinising the mathematical reasoning capabilities of large language models (LLMs), suggesting that while these models can exhibit abstract ...