Skip to main content
Back to KTH start page

Some really serious old research

The Numbers that Shape our World

I wrote a short script that searched for each number between 11 and 1000 on the search engine Alta Vista. (It was not possible to search for numbers 1-10 for some reason.) For each number I recorded the number of hits Alta Vista reported and the result included a few surprises. I got the following diagram. It was perhaps not so surprising that lower numbers are more popular than higher numbers and that even fives, tens, twenty fives and hundreds were vastly overrepresented. What I had not expected, however, was that at each even hundred, the rate of occurence jumped up and then fell within the hundred, only to jump up again at the next. This effect can be explained by Benford's law as pointed out to me by Golan Levin. What also suprised me at first was the distinct difference between the groups of numbers 11-31, 32-60 and 61-94, something which can better be seen on a zoom up of numbers 11-100. I now believe this is explained by date and time strings where days are in the range 1-31 and seconds/minutes are within 0-59. (Credit to Mats Wicksell for this suggestion.) Then there is the issue of individual numbers that are overrepresented. As was seen above, numbers divisible by 5 are in general overrepresented. Therefore I filtered out all those, to make it easier to identify other overrepresented numbers. What I got was the following diagram, where I have annotated some of the numbers that stand out. Hence, the spikes in the diagram correspond to the overrepresented numbers. Some of those I can understand. The even powers of two, 64, 128, 256, 512 turn up in most computer related situations. The number 404 is the code for the ubiquitous error message "Page not found" and 877, 888 are area codes for toll free numbers in the US (which also explains why 800 is the most common even hundred after 100). Some have commercial roots, like the CPUs 386 and 486 and Levi's 501 jeans. Windows 95/98 probably contributes to the large number of hits in the range 95-99 (even more than for 100), but mostly I think those numbers turn up in dates; the 95-99 peak should hence reflect the age and growth of the Internet. (A better view of this is in the second figure above.) Some numbers look funny, like 333 and 999, and are maybe common because of that. (But why not 444 and other similar?) Others are totally puzzling to me. Why are for instance 152, 163, 301, 541, 624, 672, 703 and 972 overrepresented? If anyone has an idea, I would be happy to know. I had a hunch that some pop culture numbers like 187 (California police code for homicide), 242 (Front 242, and the UN resolution) and 666 (number of the beast) would be common, but this turned out to be wrong.

Zoom ups of this last filtered diagram are here:

Numbers 11-100  Numbers 101-200  Numbers 201-300  Numbers 301-400  Numbers 401-500 
Numbers 501-600  Numbers 601-700  Numbers 701-800  Numbers 801-900  Numbers 901-1000 

Finally, the raw data is available here.

The experiments above were carried out in September 2000. I was recently made aware of a very similar but more ambitious website, The Secret Lives of Numbers, by Golan Levin et. al. launched in 2002, based on data collected as early as 1997.

Copyright (c) Olof Runborg, 2002.

Latest update 2 Jul. 2002 

No more links. You've reached the end of Webworld.
Don't fall over the edge.


Profile picture of Olof Runborg

Portfolio