Cantor set

Consider a line segment of unit length. Remove its middle third. Now remove the middle thirds from the remaining two segments. Now remove the middle thirds from the remaining four segments. Now remove the middle thirds from the remaining eight segments. Now remove…well you get the idea. If you could continue this construction through infinitely many steps, what would you have left?

Figure 1: The Cantor Set.

What remains after infinitely many steps is a remarkable subset of the real numbers called the Cantor set, or “Cantor’s Dust.”

At first glance one may reasonably wonder if there is anything left. After all, the lengths of the intervals we removed all add up to 1, exactly the length of the segment we started with:

\[\begin{eqnarray*} \frac{1}{3} + \frac{2}{9} + \frac{4}{27} + \frac{8}{81} + \ldots & = & \frac{1}{3} \sum_{n=0}^\infty \frac{2^n}{3^n} \\ & & \\ & = & \frac{1}{3} \left( \frac{1}{1-\frac{2}{3}} \right) \\ & & \\ & = & \frac{1}{3} \times 3 \\ & & \\ & = & 1 \end{eqnarray*}\]

Yet, remarkably, we can show that there are just as many “points” remaining as there were before we began! This startling fact is only one of the many surprising properties exhibited by the Cantor set.

Before we begin to expose these properties, it is important to be quite precise about this construction. Let us agree that the segments we remove at each stage of the construction are open intervals. That is, in the first step we remove all of the points between \(\frac{1}{3}\) and \(\frac{2}{3}\), but leave the end points, and similarly for each successive stage. A little reflection will convince you that these endpoints we leave behind never get removed, since at each stage we are only removing parts that lie strictly between the endpoints left behind at the previous stage. Thus we see that our Cantor set cannot be empty, since it contains the points 0, 1, \(\frac{1}{3}\), \(\frac{2}{3}\), \(\frac{1}{9}\), \(\frac{2}{9}\), \(\frac{7}{9}\), \(\frac{8}{9}\), \(\frac{1}{27}\), and so on.

But in fact there is much more that remains. To see this, recall that we may choose any number base to represent real numbers. That is, there is nothing necessary or even special about our common use of base ten; we can just as easily represent our numbers using base two, or base three, or any other base:

\[ \begin{array}{rcll} \displaystyle\frac{1}{3} & = & 0.3333\ldots & \,\,\,\mbox{(base 10)} \\ & & & \\ \displaystyle\frac{1}{3} & = & 0.0222\ldots & \,\,\,\mbox{(base 3)} \\ & & & \\ \displaystyle\frac{1}{3} & = & 0.0101\ldots & \,\,\,\mbox{(base 2)} \end{array} \]

When a number is written in base two it is said to be in binary notation, and when it is written in base three it is said to be in ternary notation. Let's focus on the ternary representations of the decimals between 0 and 1. Since, in base three, \(\frac{1}{3}\) is equal to 0.1, and \(\frac{2}{3}\) is equal to 0.2, we see that in the first stage of the construction (when we removed the middle third of the unit interval) we actually removed all of the real numbers whose ternary decimal representation have a 1 in the first decimal place, except for 0.1 itself. (Also, 0.1 is the same as 0.0222… in base three, so if we choose this representation we are removing all the ternary decimals with 1 in the first decimal place.) In the same way, the second stage of the construction removes all those ternary decimals that have a 1 in the second decimal place. The third stage removes those with a 1 in the third decimal place, and so on. (Convince yourself that this is so. Begin by noticing that \(\frac{1}{9}\) is equal to 0.01 and \(\frac{2}{3}\) is equal to 0.02 in base three.)

Thus, after everything has been removed, the numbers that are left—that is, the numbers making up the Cantor set—are precisely those whose ternary decimal representations consist entirely of 0’s and 2’s. What numbers does this include, besides the ones already noted above? How many are there?

Lots. Consider 1/4. This is not one of the endpoints (those all have powers of three in the denominator), but it is not hard to show that 1/4 is in the Cantor set: in ternary notation its decimal expansion is 0.0202…. Since it consists entirely of 0’s and 2’s it was never removed during the construction of the Cantor set, so it’s still there—somewhere.

Asking how many numbers are left, as you can easily see, is to ask how many numbers can be represented in ternary notation with no 1 in any decimal place. But this must be as many as there are real numbers in the unit interval—for consider: we may represent all the real numbers between 0 and 1 in binary, and this is just every possible decimal with a 1 or a 0 in each decimal place. And there can be no more and no less of these than there are ternary decimals with a 0 or a 2 in each decimal place. They correspond in an obvious way.

The conclusion is inescapable: once we remove all those intervals, the number of points remaining is no less than the number we started with.

Let us examine that “correspondence” more closely. The idea is evident: for every number whose ternary decimal expansion consists entirely of 0’s and 2’s, match it with the corresponding number whose binary decimal expansion has 0’s in the same place, and 1’s wherever the ternary number had 2’s. Thus, 1/4 in ternary gets matched with 1/3 in binary:

\[ \begin{array}{rcll} \displaystyle\frac{1}{4} & = & 0.020202\ldots & \,\,\,\mbox{(ternary)} \\ & & & \\ & \Updownarrow & \\ & & & \\ \displaystyle\frac{1}{3} & = & 0.010101\ldots & \,\,\,\mbox{(binary)} \end{array} \]

This is evidently a function that is surjective. Moreover, it is continuous. (Elements that are “close” in the domain are mapped to elements that are “close” in the range.)

We can extend this to a function, called the Cantor function, from the entire unit interval onto itself, by simply agreeing to let its value on the missing intervals be the constant values which equal the values of the original function on the endpoints of those intervals. For example, the Cantor function will map each point in the first middle-third interval (1/3, 2/3) to 1/2, the value of the original function on the points 1/3 and 2/3. (Recall that 1/3 has ternary representation 0.0222... and 2/3 has ternary representation 0.2, which map to 0.0111... and 0.1 respectively, and these both represent the number 1/2 in binary.) Here is the graph:

Figure 2: The Devil’s Staircase

The flat parts are the images of all of the “middle thirds,” and these are all connected by the images of the Cantor set itself. This construction has been called the “Devil's Staircase” since it has infinitely many “steps.”

A few more analytical tidbits: Since each interval removed was open, and there were only countably many of them, their union is also open. Thus, the Cantor set (which is the complement of this union) is closed. That is, it contains all of its accumulation points. Moreover, every point of the Cantor set is an accumulation point, since within any neighborhood of a number whose ternary expansion consists entirely of 0’s and 2’s one may find other such numbers. Consequently, the Cantor set is a perfect set in the topologist’s sense. Finally, since any open neighborhood of any point of the Cantor set contains an open set which is disjoint from the Cantor set, we have that the Cantor set is nowhere dense. Altogether a remarkable set.

Oh, it's also a fractal…

Let us again visualize the construction:

Figure 3: The Cantor Set.

Although the Cantor set itself is to be thought of as the “final row” in this picture, the picture considered altogether is very suggestive. Notice that at each stage the picture is “doubled” into two copies which precisely resemble the whole, but which at each stage become two-thirds smaller.

Together these properties—self-similarity at every scale over a uniform reduction of scale—qualify the Cantor set as a fractal with Hausdorf dimension given by:

\[\frac{\log 2}{\log 3} = 0.630929753\ldots \]

The Cantor set is an instructively simple example of a fractal, demonstrating that our geometrical intuitions about space (even such simple spaces as the unit interval) may draw us, by way of the mathematical imagination, into revelations of deep and even startling structure.

Home