GPT-4o is far better than other models, but still made illegal moves 13% of the time

A new benchmark for large language models (LLMs) shows that even the latest models aren’t the best chess players.…