Two days ago the annual Apple WWDC event started. The main announcement was Liquid Glass, “a refreshed user interface […] which features shiny, reflective, and transparent visual interface elements that give the software a more glassy look and feel. The reception was mixed, to say the least, with users complaining that the control center is now a mess because the transparency of Liquid Glass makes it look cluttered.
Twitter was not amused, and the memes started dropping immediately, with concerns about readability.
The trajectory of the last three WWDC events is concerning for Apple.
In 2023, they introduced Apple Vision Pro, which did not meet Apple’s sales goals, prompting production halts and public criticism. It sold around 500,000 units right out of the gate, which is not trivial for an AR/VR device, but not enough to crack mainstream adoption either.
In 2024, they tried to jump on the AI bandwagon by announcing their own comprehensive AI system called Apple Intelligence, running on their own technology, from on-device processing to a new private cloud compute infrastructure built with Apple Silicon chips, with the ChatGPT integration as the cherry on top (although the media’s initial reaction fixated on the partnership with OpenAI).
Fast forward one year, and there is no Apple Intelligence in the market, only Liquid Glass. Apple admitted that “the work needed more time to reach their high quality bar, and they look forward to sharing more about it in the coming year.”
The only “product” that was shipped in the AI space is this paper, which shows that:
LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget
They used some old-school puzzle environments, like the Tower of Hanoi, to prove that as the puzzle increases in complexity, the LRM accuracy collapses. These puzzles are from the GOFAI (Good Old-Fashioned Artificial Intelligence) era and they are solved by generic state space search algorithms like A* or by problem-specific recursive solution1. This result is music to the ears of the AI skeptics.
The timing of this paper is a bit suspect. One year after the announcement of Apple Intelligence, Apple appears out of its depth with AI, lagging behind cutting-edge labs like OpenAI and Anthropic, and struggling to even communicate what they can ship and when. It seems that they are missing the train, so their last hope is that the train slows down and stops, and this paper makes the case that, at the end of the day, everything is still based on an auto-regressive LLM with no guarantees like Le Cun has been saying for a long time.
A* would also hit memory limitations if the tower has too many disks, because the state space grows exponentially with the number of disks. However, it’s always possible to limit the search depth in each round of reasoning and generate a suboptimal solution that is not the shortest one, but still achieve the end state goal.