Sunday, July 27, 2025

The classic phone book

The classic example I use to explain data structures and running orders is getting a number from a phone book. For those who have never seen a phone book, it's just a, fairly big, collection of names in alphabetical order together with the phone number of that person.

A phone book is essentially a lookup from a name (key) to a phone number (value.) A very common operation in software. In many cases, software will just linearly iterate over the records until the key is hit to return the matching value. In Linq in C#:

records.First(r => r.Name == name).Phone

This is what I would call a sub-human approach. No human has ever looked through a phone book linearly from the start until coming across the target name. Even though the computer might still be faster using this approach, the approach itself is less efficient.

The human way to find a name in a phone book relies on the fact that the names are ordered. You'd open the phone book in some location to determine whether the target name is in front or behind that location. This reduces the relevant part of the phone book and you can continue doing this until you've found the correct location of the target name. This comes down to a binary search and this would be on par with the human approach.

Using a hash table is an even better approach for this problem and allows the time needed for the lookup to be independent from the size of the phone book. It's quite rare to have an algorithm with a running time that is not related to the size of the input.

Both indexing options (binary tree or hash) are very common in databases and are also generally available in programming languages. It's just a matter of realizing that it would be beneficial to use it. Again, even iterating linearly over a dataset the size of a typical phone book can be done with a computer quite fast.

So, back to running orders. The linear iterations option is denoted O(N), meaning that the time doubles every time the phone book doubles in size. The binary search is denoted O(log N), meaning that the time increments by the same amount every time the phone book doubles in size. The hash table option is O(1), meaning that time stays the same every time the phone book doubles in size.

Quite a difference, but I mainly find it interesting that the typical human behavior gravitates to the middle option, while it's quite common to gravitate to the slower option when implemented in software. Even though an even faster option is readily available.

Of course, for very thin phone books, the difference might be neglectable and then its easy to favor the cleaner notation of a linear search. And it takes time to create the data structure, so it's only useful when used often enough.

No comments:

Post a Comment