Thursday, July 31, 2025

Hanoi Turned Upside Down – O’Reilly

Prompted partially by Apple’s paper concerning the limits of enormous language fashions (“The Phantasm of Pondering: Understanding the Strengths and Limitations of Reasoning Fashions by way of the Lens of Drawback Complexity”), I spent a while enjoying with Tower of Hanoi. It’s an issue I solved some 50 years in the past after I was in school, and I haven’t felt the will or must revisit it since. Now, after all, “We Can Haz AI,” and all meaning. In fact, I didn’t need to write the code myself. I confess, I don’t like recursive options. However there was Qwen3-30B, a “reasoning mannequin” with 30-billion parameters that I can run on my laptop computer. I had little doubt that Qwen may generate an excellent Tower program, however I believed it could be enjoyable to see what occurred.

First, I requested Qwen if it was accustomed to the Tower of Hanoi downside. In fact it was. After it defined the sport, I requested it to jot down a Python program to resolve it, with the variety of disks taken from the command line. Superb—the outcome appears to be like so much like this system I keep in mind writing in school (besides that was approach, approach earlier than Python—I feel I used a dialect of PL/1). I ran it, and it labored completely.

The output was a bit awkward (only a listing of strikes), so I requested it to animate it on the terminal. The terminal animation wasn’t actually passable, so after a few tries, I requested it to attempt a graphical animation. I didn’t give it any extra data than that. It generated one other program, utilizing Python’s tkinter library. And once more, this labored completely. It generated a pleasant visualization—besides that after I watched the animation, I noticed that it had solved the issue the other way up! Giant disks have been on prime of smaller disks, not vice versa. I need to be clear—the answer was completely right; along with inverting the towers, it inverted the rule about transferring disks, in order that it was by no means placing a smaller disk on prime of a bigger one. Should you stacked the disks in a pyramid (the “regular” approach) and made the identical strikes, you’d get the proper outcome. Symmetry FTW.

So I instructed Qwen that the answer was the other way up and requested it to repair it. It thought for a very long time and finally instructed me that I have to be trying on the visualization the improper approach. Maybe it thought I ought to stand on my head? Proving, if nothing else, that LLMs will be assholes too. Similar to 10x programmers. Perhaps that’s an argument for AGI?

Significantly, there’s some extent right here. It’s actually vital to analysis the boundaries of synthetic intelligence. It’s positively fascinating that reasoning LLMs tended to desert issues that required an excessive amount of reasoning and have been most profitable at issues that solely required a reasonable reasoning finances. Fascinating, however is that stunning? Very onerous issues are very onerous issues for a purpose: They’re very onerous. And most people behave the identical approach: We quit (or lookup the reply) when confronted with an issue too onerous for us to resolve.

However we should additionally take into consideration what we imply by “reasoning.” I had little doubt that Qwen may resolve Tower of Hanoi. In spite of everything, options have to be in lots of of GitHub repos, Stack Overflow questions, and on-line tutorials. Do I, as a consumer, care the least little bit if Qwen appears to be like up the answer in an exterior supply? No, I don’t, so long as the output is right. Do I feel because of this Qwen just isn’t “reasoning”? Ignoring all of the anthropomorphism that we’re caught with, no. If an affordable and reasoning human is requested to resolve a troublesome downside, what can we do? We attempt to lookup a course of for fixing the issue. We confirm that the method is right. And we use that course of in our answer. If computer systems are related, we’ll use them, somewhat than fixing on pencil and paper. Why ought to we anticipate something completely different from LLMs? If somebody instructed me that I needed to resolve Tower of Hanoi with 15 disks (32,767 strikes), I’m positive I’d get misplaced someplace between the start and finish, regardless that I do know the algorithm. However I wouldn’t even consider itemizing the strikes by hand; I’d write a program (just like the one Qwen generated) and have it dump out the strikes. Laziness is a advantage—that’s one thing Larry Wall (creator of Perl) taught us. That’s reasoning—it’s as a lot about on the lookout for the straightforward answer as it’s doing the onerous work.

A weblog put up I learn not too long ago reported one thing comparable. Somebody requested openAI’s o3 to resolve a traditional chess downside by Paul Morphy (most likely the best chess participant of the nineteenth century). The AI realized that its makes an attempt to resolve the issue have been incorrect, so it seemed up the reply on-line, used that as its reply, and gave an excellent rationalization of why the reply was right. It is a completely cheap method to resolve the issue. The LLM experiences no pleasure, no validation, in fixing a troublesome chess downside; it doesn’t really feel a way of accomplishment. It’s simply supplying a solution. Whereas it’s not the type of reasoning that AI researchers need to see, trying up the reply on-line and explaining why the reply is right is nice demonstration of human-like reasoning. Perhaps this isn’t “reasoning” from a researcher’s perspective, nevertheless it’s actually problem-solving. It represents a sequence of thought through which the mannequin decides that it might probably’t resolve the issue by itself, so it appears to be like up the reply on-line. And after I’m utilizing AI, problem-solving is what I’m after.

I need to make it clear that I’m not a convert to the cult of AGI. I don’t think about myself a skeptic both; I’m a nonbeliever, and that’s completely different. We will’t speak about common intelligence meaningfully if we are able to’t outline what “intelligence” means. The hegemony of the technorati has us chasing after problem-solving metrics, as if “intelligence” could possibly be represented by a quantity. It’s all Asimov till that you must run benchmarks—then it’s decreased to numbers. If we all know something about intelligence, we all know it’s not represented by a vector of benchmark outcomes testing the power to resolve onerous issues.

But when AI isn’t the embodiment of some type of undefinable intelligence, it’s nonetheless the best engineering challenge of the twenty first century. The power to synthesize human language accurately is a significant achievement, as is the power to emulate human reasoning—and “emulation” is a good description of what it’s doing. AI’s detractors ignore—bizarrely, in my view—its great utility, as if citing examples the place AI generates incorrect or grossly inappropriate output implies that it’s ineffective. That isn’t the case—nevertheless it does require pondering rigorously about AI’s limitations. Programming with AI help will definitely require extra consideration to debugging, testing, and software program design—all themes that we’ve been watching rigorously over the previous few years, and that we’re speaking about in our AI Codecon conferences. Functions like detecting fraud in welfare purposes might should be scrapped or placed on maintain, as the town of Amsterdam discovered, till we are able to construct AI programs which are free from bias. Constructing bias-free programs is prone to be a lot tougher than fixing troublesome issues in arithmetic. It’s an issue which may not be solvable—we people actually haven’t solved it. Both worrying about or breathlessly anticipating AGI achieves little, apart from diverting consideration away from each helpful purposes of AI and actual harms attributable to AI.


Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles