Powers of two


Bram Cohen, inventor of BitTorrent, poses a cheeky question for job applicants:

What is the exponent of the largest power of two whose base seven representation doesn’t contain three zeros in a row?

You might like to think about this for a bit before reading my solution.

At first sight there seems to be no reason why it should have an answer at all: maybe there are infinitely many powers of two with the given property. And a little bit of experimentation shows no discernable pattern to the base-7 representations of powers of 2: if there is any pattern it’s going to need a lot of high-powered maths to discern it.

But in a form of mathematical judo, we can use the very difficulty of the problem to make it easy (under reasonable assumptions). Suppose that the base-7 representation of \(2^n\) is a random sequence of digits (since we know no reason why it isn’t). Then we can work out the probability \(P_n\) that a random sequence of length \(n\) doesn’t contain three zeros in a row. (Note that the probability that a base-7 number with \(n\) digits doesn’t contain three zeros in a row is just \(P_n−1\) since the first digit is not allowed to be zero.)

The small cases are easy: $$ P_0 = P_1 = P_2 = 1. $$ And for longer sequences, $$ P_n = {6\over 7} P_{n−1} + {6 \over 49} P_{n-2} + {6 \over 343} P_{n-3} $$ since the only ways the sequence can avoid three zeros in a row is if it looks like [1-6]R, 0[1-6]R, or 00[1-6]R (with the remainder R also lacking three zeros in a row).

Next, we can represent the sequence \(P_n\) as a generating function \(G(x)\)—this is a formal power series where the coefficient of \(x^n\) is \(P_n\). A little bit of algebra finds $$ G(x) = 1 + {1 \over 7} x + {1 \over 49} x^2 + {6 \over 7} x G(x) + {6 \over 49} x^2 G(x) + {6 \over 343} x^3 G(x) $$ and so $$ G(x) = {343 + 49 x + 7 x^2 \over 343 - 294 x - 42 x^2 - 6 x^3}. $$ If we evaluate the generating function at \(x = 1\), we get the sum of \(P_n\) for all \(n\), which is the expected number of sequences without three zeros in a row in an infinite list of random sequences of all lengths. Now \(G(1) = 399\), so we expect to find only about \(399\) sequences without three zeros in a row in our infinite list. And that means that with probability 1 there’s a last such sequence!

In the list of powers of 2, we see numbers of length \(n\) on average about \(\log_2 7\) times for each \(n\), so we expect to see about 1120 powers of 2 without three zeros in a row. (I make it 1084 in practice, so this estimate is pretty good.)

We could derive an exact formula for \(P_n\) by doing a lot of manipulations of partial fractions. But it’s quicker to note by a bit of calculation that the ratio \(P_{n+1} \over P_n\) quickly converges on \(ρ ≈ 0.9974820691562843\), so \(P_n ≈ ρ^{n-2}\).

So how far do we need to search before we can quit with a reasonably high assurance of having found the answer? Well, suppose we decided to check all powers of 2 with length \(n\) such that \(P_{n−1} > ε = {1 \over 1000000}\). Then we’d have to check powers of 2 with length up to \({\log ε \over \log ρ} + 3 ≈ 5483\), i.e. up to \(2^{15393}\).

After all that, the actual computation is pretty straightforward. No doubt there are more efficient ways to do it using tools like matlab or arbitrary-precision maths libraries like gmp. But for sheer programmer laziness I think you can’t beat this short bash script:

$ for ((X=15393; X>0; X--)); do
>     dc -e "7 o 2 $X ^ p" | tr -cd 0-6 | grep -qv 000 && break
> done; echo $X

Update . A better, but much more complicated, way to choose a stopping point for the search would be to find the point where the probability of finding any more solutions is negligible. This can be done by calculating the probability that there are no more solutions (which I’ll call \(Q_n\)) which is given by the infinite product $$ Q_n = (1 − P_n)(1 − P_{n+1})(1 − P_{n+2})\ldots $$ Since \(P_n\) tends to \(ρ^{n−2}\), this product is $$ Q_n = (1 − ρ^{n−2})(1 − ρ^{n−1})(1 − ρ^n)\ldots $$ which can be rewritten as $$ (1 − ρ^1)(1 − ρ^2)\ldots \over (1 − ρ^1)(1 − ρ^2)\ldots(1 − ρ^{n−3}) $$ where the infinite product on top of the fraction is the Euler function \(φ(ρ)\), which has a nice power series in terms of pentagonal powers.

Update . There’s a recent job posting with this question on the Python Job Board which asks applicants to “Use python to solve the problem.” This is not going to be a one-liner because Python doesn’t have built-in radix conversion. Which would make it tempting to do various optimizations, e.g. doing the arithmetic directly in base 7, or stopping the radix conversion immediately the pattern 000 is found in the base-7 expansion, or skipping the next power of two if 000abc… is found with abc… ≤ 333333…, skipping the next two powers of two if 000abc… is found with abc… ≤ 151515…, and so on.

P.S. If you’re using this journal entry to help with your job application, you should mention that you did so. It’s almost always a good idea to search the literature (or the web) to find a solution to a problem and I reckon that Bram Cohen, being a practical sort of person, would approve.