By Mark Pesce — 21 Jun 2026

Quacks, Ergo Duck

Mark Pesce · University of Sydney · June 2026

tl;dr The failure of AI evaluations is itself the proof of the existence of AGI. AI evals are becoming intractable for the same reasons that measuring human general intelligence has been intractable for 150 years: the constructs resist decomposition, the benchmarks saturate, and the goalposts move. The reasons are the same because the thing being measured is the same. The difficulty is the evidence.

The reason AI evals are becoming intractable is the same reason human intelligence testing has been intractable for 150 years. The constructs resist decomposition into measurable components. The benchmarks saturate because the capability being tested is too general to be captured by any specific test. The goalposts move because the thing being measured keeps exceeding the frame of measurement.

AI evaluation is hard for exactly the same reasons that measuring human general intelligence is hard, because in both cases the thing being measured is general intelligence.

The difficulty is itself the evidence. We do not need a theoretical proof that artificial general intelligence has arrived. The practical failure of our evaluation instruments tells us. When testing AI becomes as hard as testing humans, and hard for the same reasons, the question has already been answered.

If it quacks in practice, it's a duck in principle.

Acknowledgements

This paper emerged from deep discussions with both John Allsopp and Alan Eyzaguirre, and was drafted by Claude Cowork from my extensive notes. I remain responsible for any errors that may have crept in.

Acknowledgements

Subscribe to The Watershed