Let's take a look at the following pseudo-code to see why locality of reference works (see How C Programming Works to really get into it):
Output to screen « Enter a number between 1 and 100 »
Read input from user
Put value from user in variable X
Put value 100 in variable Y
Put value 1 in variable Z
Loop Y number of time
Divide Z by X
If the remainder of the division = 0
then output « Z is a multiple of X »
Add 1 to Z
Return to loop
End
This small program asks the user to enter a number between 1 and 100. It reads the value entered by the user. Then, the program divides every number between 1 and 100 by the number entered by the user. It checks if the remainder is 0 (modulo division). If so, the program outputs "Z is a multiple of X" (for example, 12 is a multiple of 6), for every number between 1 and 100. Then the program ends.
Even if you don't know much about computer programming, it is easy to understand that in the 11 lines of this program, the loop part (lines 7 to 9) are executed 100 times. All of the other lines are executed only once. Lines 7 to 9 will run significantly faster because of caching.
This program is very small and can easily fit entirely in the smallest of L1 caches, but let's say this program is huge. The result remains the same. When you program, a lot of action takes place inside loops. A word processor spends 95 percent of the time waiting for your input and displaying it on the screen. This part of the word-processor program is in the cache.
This 95%-to-5% ratio (approximately) is what we call the locality of reference, and it's why a cache works so efficiently. This is also why such a small cache can efficiently cache such a large memory system. You can see why it's not worth it to construct a computer with the fastest memory everywhere. We can deliver 95 percent of this effectiveness for a fraction of the cost.
For more information on caching and related topics, check out the links on the next page.