The Python Challenge

,

The Python Challenge by Nadav Samet is a series of riddles and puzzles needing a bit of programming to solve (Python being recommended). These are my notes and solutions, written as I solved the puzzles, then edited into shape (with many dead ends, typos and errors removed). [Notes in square brackets are things I learned later in the challenge.]

Level 0: 0

The first challenge asks for the value of 238. [It’s really just getting us used to putting the answer in the URL.]

>>> print pow(2,38)
274877906944

Level 1: 274877906944

The second challenge redirects us to the URL map.html, which has an encrypted message, and a picture which implies that we’re being asked to decode the message using a Caesar cipher with a shift of two.

>>> import string
>>> l = string.lowercase
>>> t = string.maketrans(l, l[2:] + l[:2])
>>> m = """g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. 
bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle.
sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj."""
>>> print m.translate(t)
i hope you didnt translate it by hand. thats what computers are for.
doing it in by hand is inefficient and that's why this text is so long.
using string.maketrans() is recommended. now apply on the url.
>>> print "map".translate(t)
ocr

Level 2: ocr

This challenge has a picture of a book, and a clue which says, “recognize the characters. maybe they are in the book, but MAYBE they are in the page source”. Sure enough, there’s a big block of characters in the page source, hidden in a comment, together with the instruction, “find rare characters in the mess below:” I get the feeling it’s going to be useful to fetch the page source for these challenges, so let’s have a helper function:

>>> import urllib
>>> def get_challenge(s):
...     return urllib.urlopen('http://www.pythonchallenge.com/pc/' + s).read()
...
>>> src = get_challenge('def/ocr.html')
>>> import re
>>> text = re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1]
>>> counts = {}
>>> for c in text: counts[c] = counts.get(c, 0) + 1
>>> counts
{'\n': 1221, '!': 6079, '#': 6115, '%': 6104, '$': 6046, '&': 6043, ')': 6186, '(': 6154,
'+': 6066, '*': 6034, '@': 6157, '[': 6108, ']': 6152, '_': 6112, '^': 6030, 'a': 1,
'e': 1, 'i': 1, 'l': 1, 'q': 1, 'u': 1, 't': 1, 'y': 1, '{': 6046, '}': 6105}

So the rare characters are alphabetic. Let’s see them in order:

>>> print ''.join(re.findall('[a-z]', text))
equality

Level 3: equality

This challenge has a clue which says, “One small letter, surrounded by EXACTLY three big bodyguards on each of its sides.” And as in level 2, there’s a large block of text hidden in a comment in the page source.

>>> text = get_challenge('def/equality.html')
>>> print ''.join(re.findall('[^A-Z][A-Z]{3}([a-z])[A-Z]{3}[^A-Z]', text))
linkedlist

Level 4: linkedlist

There’s a picture of a sawhorse, and the clue “follow the chain”. Clicking on the picture takes us to linkedlist.php?nothing=12345 which says “and the next nothing is 92512”. Clearly, we’ve got to chase down a sequence of web pages, each of which refers to the next.

>>> def next_page(p):
...     text = urllib.urlopen('http://www.pythonchallenge.com/pc/def/linkedlist.php?'
...                           'nothing=%s' % p).read()
...     m = re.match('and the next nothing is ([0-9]+)', text)
...     if not m: print text
...     return m.group(1)

This function raises an exception when the match fails, because then m is None and m.group(1) raises AttributeError.

>>> p = 12345
>>> next_page(p)
'92512'
>>> for i in range(400): p = next_page(p)
...
Yes. Divide by two and keep going.
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 6, in next_page
AttributeError: 'NoneType' object has no attribute 'group'
>>> p
'92118'
>>> p = int(p) // 2
>>> for i in range(400): p = next_page(p)
...
peak.html
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 6, in next_page
AttributeError: 'NoneType' object has no attribute 'group'

Level 5: peak

The clues “pronounce it” and “peak hell sounds familiar?” mean nothing to me. But the page source mentions a file banner.p. Let’s fetch it and take a look at the beginning.

>>> banner = get_challenge('def/banner.p')
>>> print banner[:60]
(lp0
(lp1
(S' '
p2
I95
tp3
aa(lp4
(g2
I14
tp5
a(S'#'
p6
I5
t

Looks like some kind of streamed data encoding: S' ' suggests the encoding of a (S)tring containing a single space. So “peak hell” means “pickle”.

>>> import pickle
>>> data = pickle.loads(banner)
>>> data
[[(' ', 95)],
 [(' ', 14), ('#', 5), (' ', 70), ('#', 5), (' ', 1)],
 [(' ', 15), ('#', 4), (' ', 71), ('#', 4), (' ', 1)],
 ...]

It’s a list of lists of pairs, each pair being a character and a number. Within each list of pairs, the numbers add up to 95. So it looks like a run-length encoding of a character array.

>>> print '\n'.join([''.join([p[0] * p[1] for p in row]) for row in data])

              #####                                                                      #####
               ####                                                                       ####
               ####                                                                       ####
               ####                                                                       ####
               ####                                                                       ####
               ####                                                                       ####
               ####                                                                       ####
               ####                                                                       ####
      ###      ####   ###         ###       #####   ###    #####   ###          ###       ####
   ###   ##    #### #######     ##  ###      #### #######   #### #######     ###  ###     ####
  ###     ###  #####    ####   ###   ####    #####    ####  #####    ####   ###     ###   ####
 ###           ####     ####   ###    ###    ####     ####  ####     ####  ###      ####  ####
 ###           ####     ####          ###    ####     ####  ####     ####  ###       ###  ####
####           ####     ####     ##   ###    ####     ####  ####     #### ####       ###  ####
####           ####     ####   ##########    ####     ####  ####     #### ##############  ####
####           ####     ####  ###    ####    ####     ####  ####     #### ####            ####
####           ####     #### ####     ###    ####     ####  ####     #### ####            ####
 ###           ####     #### ####     ###    ####     ####  ####     ####  ###            ####
  ###      ##  ####     ####  ###    ####    ####     ####  ####     ####   ###      ##   ####
   ###    ##   ####     ####   ###########   ####     ####  ####     ####    ###    ##    ####
      ###     ######    #####    ##    #### ######    ###########    #####      ###      ######

This is the kind of output you might get from some variety of the Unix banner command, rotated through a quarter turn.

Level 6: channel

This challenge has a picture of a zip, and the clue “now there are pairs”. Does that mean we going to be using the built-in function zip? Or is there going to be a zipfile involved? If so, the relevent file might be channel.zip. So let’s try that.

>>> import zipfile
>>> z = zipfile.ZipFile(urllib.urlopen('http://www.pythonchallenge.com/pc/def/channel.zip'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "python2.3/zipfile.py", line 210, in __init__
    self._GetContents()
  File "python2.3/zipfile.py", line 230, in _GetContents
    self._RealGetContents()
  File "python2.3/zipfile.py", line 240, in _RealGetContents
    endrec = _EndRecData(fp)
  File "python2.3/zipfile.py", line 83, in _EndRecData
    fpin.seek(-22, 2)               # Assume no archive comment.
AttributeError: addinfourl instance has no attribute 'seek'

That’s a bit annoying. But I guess it’s obvious really: the zipfile modules needs to have random access to the individual files archived in the zip, and so the implementation needs to be able to seek, which method is not provided by the urllib file object. However, there’s a module specifically for getting round this problem, StringIO.

>>> import StringIO
>>> z = zipfile.ZipFile(StringIO.StringIO(get_challenge('def/channel.zip')))
>>> z.namelist()
['29.txt', '100.txt', '109.txt', '176.txt', '226.txt', ..., 'readme.txt']
>>> len(z.namelist())
910

That’s a lot of files!

>>> print z.read('readme.txt')
welcome to my zipped list.

hint1: start from 90052
hint2: answer is inside the zip

>>> print z.read('90052.txt')
Next nothing is 94191
>>> def next(p):
...    text = z.read('%s.txt' % p)
...    m = re.match('Next nothing is ([0-9]+)', text)
...    if not m: print text
...    return m.group(1)
...
>>> zpp = []
>>> p = 90052
>>> for i in range(len(z.namelist())):
...     zpp.append(p)
...     p = next(p)
...
Collect the comments.
Traceback (most recent call last):
  File "<stdin>", line 3, in ?
  File "<stdin>", line 6, in next_page
AttributeError: 'NoneType' object has no attribute 'group'

OK, so the answer is in the zipfile comments, presumably in the order we just visited them in.

>>> print ''.join([z.getinfo('%s.txt' % p).comment for p in zpp])
****************************************************************
****************************************************************
**                                                            **
**   OO    OO    XX      YYYY    GG    GG  EEEEEE NN      NN  **
**   OO    OO  XXXXXX   YYYYYY   GG   GG   EEEEEE  NN    NN   **
**   OO    OO XXX  XXX YYY   YY  GG GG     EE       NN  NN    **
**   OOOOOOOO XX    XX YY        GGG       EEEEE     NNNN     **
**   OOOOOOOO XX    XX YY        GGG       EEEEE      NN      **
**   OO    OO XXX  XXX YYY   YY  GG GG     EE         NN      **
**   OO    OO  XXXXXX   YYYYYY   GG   GG   EEEEEE     NN      **
**   OO    OO    XX      YYYY    GG    GG  EEEEEE     NN      **
**                                                            **
****************************************************************
 **************************************************************

hockey.html tells us to look at the letters making up the banner.

Level 7: oxygen

The clue is “smarty”, and there’s a picture with a strip of greyscale squares across the middle. Maybe the answer is encoded in those greyscale squares, with the greyscale level (running from 0–255) encoding a character in ASCII? Let’s try it.

>>> import Image
>>> def get_image(s): return Image.open(StringIO.StringIO(get_challenge(s)))
...
>>> im = get_image('def/oxygen.png')
>>> w,h = im.size
>>> im.getpixel((0,h//2))
(115, 115, 115, 255)
>>> chr(115)
's'

Looks plausible. Let’s guess that each block is 8 pixels wide:

>>> print ''.join([chr(im.getpixel((i, h//2))[0]) for i in range(0,w,8)])
smartguy, yo made i. the nxt leve is [10, 110, 16, 101 103, 14, 105,116, 12]reb

Oops, so near but not quite there. Closer examination reveals that the colour changes every 7 pixels.

>>> print ''.join([chr(im.getpixel((i, h//2))[0]) for i in range(0,w,7)])
smart guy, you made it. the next level is [105, 110, 116, 101, 103, 114, 105, 116, 121]pe_
>>> print ''.join(map(chr, [105, 110, 116, 101, 103, 114, 105, 116, 121]))
integrity

Level 8: integrity

There’s a picture of a bee and the clues “working hard?” and “Where is the missing link?”, and clicking on the picture takes us to a password-protected page for area “inflate”, suggesting that compression may involved (“deflate” is the name of a popular data compression format, used in PKZIP and gzip and other programs and standardized in RFC 1951; “inflate” is the name of corresponding decompression algorithm). The source for the page contains data for “un” (username) and “pw” (password):

>>> un = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
>>> pw = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'

The letters “BZ” in the data are the magic number for the bzip2 compressed file format (which, ironically, doesn’t use the “inflate” decompression algorithm).

>>> import bz2
>>> bz2.BZ2Decompressor().decompress(un)
'huge'
>>> bz2.BZ2Decompressor().decompress(pw)
'file'

Level 9: good

The clues for this challenge are “connect the dots” and “first+second=?” and we are given long lists of numbers for “first” and “second”. Maybe the numbers are the coordinates of dots in a join-the-dots puzzle, as suggested by the picture? Let’s see.

>>> first = [146,399,...]
>>> len(first), min(first), max(first)
(442, 77, 403)
>>> second = [156,141,...]
>>> len(second), min(second), max(second)
(112, 77, 220)

The values in each list are small numbers in the range [77,403]. It seems unlikely that “first” is a set of x-coordinates and “second” a set of corresponding y-coordinates, because they are different lengths. So maybe we have two lines here, each given as [x0, y0, x1, y1, ...]. Let’s try it. (And we get to use the zip that we didn’t use in challenge 6.)

>>> im = Image.new('1', (500,500), 1)
>>> import ImageDraw
>>> draw = ImageDraw.Draw(im)
>>> draw.line(zip(first[0::2], first[1::2]))
>>> draw.line(zip(second[0::2], second[1::2]))
>>> im.save('first+second.png')

It’s a cow:

Joining the decoded dots results in a picture of a bull.

No, apparently it’s a bull.

Level 10: bull

There’s a picture of a bull and the clue “len(a[30]) = ?”. Clicking on the bull reveals “a = [1, 11, 21, 1211, 111221,” Now, the way to solve almost any sequence puzzle is to type the numbers into the Online Encyclopedia of Integer Sequences. (OEIS) This one is A005150, the well-known “look and say sequence”. There’s Python code in the OEIS for computing the values of the sequence, but it’s simpler to follow the cross-reference for the length of the nth term to find the sequence A005341, and the list gives the length of the 31st term as 5808.

Level 11: 5808

There’s a clue “odd even”, and a JPEG that’s rather blurry: it looks like it’s been passed through some filter. Maybe the answer is encoded in the low-order bit or bits? But on the other hand it’s a JPEG. If the challenge involves twiddling pixels wouldn’t it have been encoded losslessly as a PNG? Still, never mind, let’s give it a go.

>>> im = get_image('return/cave.jpg')
>>> im.size
(640, 480)
>>> im.getpixel((0,0))
(0, 20, 0)
>>> im.getpixel((0,1))
(148, 186, 111)
>>> im.getpixel((0,2))
(0, 20, 0)

I see, the odd pixels have the image in, and the even pixels have something else. Let’s see if we can see what by blanking the odd pixels.

>>> w,h = im.size
>>> for i in range(w):
...     for j in range(h):
...         if (i + j) % 2 == 1:
...             im.putpixel((i,j), 0)
...
>>> im.save('cave2.png')

I find that I have to adjust the angle of my laptop screen in order to see that the pixels with even coordinates show the faint word “evil”:

Pixels with even coordinates only show the faint word “evil”.

Level 12: evil

The picture shows someone dealing cards. And the JPEG has been manipulated in some way. Maybe some kind of shuffling operation has been applied ... hmmm, nothing obvious. Ah, there’s evil2.jpg, which refers me to evil2.gfx. There’s evil3.jpg and the mysterious evil4.jpg too, which isn’t a JPEG at all.

>>> print get_challenge('return/evil4.jpg')
Enter username for inflate at www.pythonchallenge.com: huge
Enter password for huge in inflate at www.pythonchallenge.com: file

Bert is evil! go back!

Hmmm, not sure what this means. [It’s a clue for level 13.] Going back to evil2.gfx, “GFX” is not a well-known file format. What kind of thing is it?

>>> gfx = get_challenge('return/evil2.gfx')
>>> gfx[:25]
'\xff\x89G\x89\xff\xd8PIP\xd8\xffNFN\xff\xe0G8G\xe0\x00\r7\r\x00'
>>> re.sub(r'[^ -~]','?',gfx)
'??G???PIP??NFN??G8G???7??'

It looks like there’s some kind of pattern based on chunks of 5 characters: ?PIP?, ?NFN?, ?G8G?. Hmmm, PNG? In that case, \x89 is familiar, it’s the first byte of a PNG image file, used to ensure that a valid ASCII (or ISO-Latin or UTF-8) file can’t be misinterpreted as a PNG Maybe 5 files have been interleaved. Let’s see what they look like unshuffled.

>>> gfx[0:60:5]
'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01'
>>> gfx[1:60:5]
'\x89PNG\r\n\x1a\n\x00\x00\x00\r'
>>> gfx[2:60:5]
'GIF87a@\x01\xf0\x00\xe7\x00'
>>> gfx[3:60:5]
'\x89PNG\r\n\x1a\n\x00\x00\x00\r'
>>> gfx[4:60:5]
'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01'

Aha, it looks like five graphics files have been shuffled together: two JPEGs, two PNGs and a GIF.

>>> types = ['jpg','png','gif','png','jpg']
>>> for i in range(5): open('evil2%d.%s' % (i, types[i]),'wb').write(gfx[i::5])

The fourth of the five files comes out broken, with a compression encoding error halfway through that scrambles the remainder. The image below shows how it looks in GraphicCoverter.

First of five interleaved images: “dis”.

Second of five interleaved images: “pro”.

Third of five interleaved images: “port”.

Fourth of five interleaved images: “ional”, slightly corrupted.

Fifth of five interleaved images: “ity”, crossed out.

Level 13: disproportional

The clues are “call him” and “phone that evil”. Button number 5 takes us to phonebook.php which has this obscure error message:

faultCode 105 faultString XML error: Invalid document end at line 1, column 1

So we need to look up someone in the phonebook.

>>> print urllib.urlopen('http://www.pythonchallenge.com/pc/phonebook.php?Bert).read()
<?xml version="1.0"?>
<methodResponse>
<fault>
<value>
<struct><member><name>faultCode</name>
<value><int>105</int></value>
</member>
<member>
<name>faultString</name>
<value><string>XML error: Invalid document end at line 1, column 1</string></value>
</member>
</struct>
</value>
</fault>
</methodResponse>

Googling for “xml”, “methodResponse”, and “param”, it seems that this is an XML-RPC response. So presumably we have to interrogate the phonebook using XML-RPC.

>>> import xmlrpclib
>>> phonebook = xmlrpclib.ServerProxy('http://www.pythonchallenge.com/pc/phonebook.php')
>>> phonebook.system.listMethods()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "python2.3/xmlrpclib.py", line 1032, in __call__
    return self.__send(self.__name, args)
  File "python2.3/xmlrpclib.py", line 1319, in __request
    verbose=self.__verbose
  File "python2.3/xmlrpclib.py", line 1083, in request
    return self._parse_response(h.getfile(), sock)
  File "python2.3/xmlrpclib.py", line 1222, in _parse_response
    return u.close()
  File "python2.3/xmlrpclib.py", line 745, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 105: 'XML error: Invalid document end at line 1, column 1'>

Oh. Not so good. I tried guessing the method—“lookup”, “call”, “phone”, etc—but no luck. I looked at the forums but I didn’t learn anything there. So I’m stuck for now.

Interlude: Comments

This has been enjoyable so far (except for being stuck on challenge 13). I haven’t really had to do any programming though so far: part 4 had a five-line function and part 6 had essentially the same function; otherwise it’s been one-liners. It’s been mostly about guessing where data may be hidden, and recognizing file formats. Which gives me an idea for a different kind of programming challenge...

Back to level 13

It turned out the level wasn’t working because the XML-RPC server was broken. But Mr Samet has fixed it now, so trying again (with tracebacks elided this time):

>>> pprint(phonebook.system.listMethods())
['phone',
 'system.listMethods',
 'system.methodHelp',
 'system.methodSignature',
 'system.multicall',
 'system.getCapabilities']
>>> phonebook.system.methodHelp('phone')
'Returns the phone of a person'
>>> phonebook.system.methodSignature('phone')
[['string', 'string']]
>>> phonebook.phone('Bert')
'555-ITALY'

Level 14: italy

The top picture shows a bread roll in spiral form. The clue says “remember: 100*100 = (100+99+99+98) + (...”. There’s also a second picture, which looks like a 100×100 image with vertical black and white stripes. But on closer inspection, it’s not: the HTML source is lying when it says width="100" height="100". The image is actually 10,000×1. So we presumably have to fold this strip of pixels into a 100×100 image in spiral order.

>>> strip = get_image('return/wire.png')
>>> spiral = Image.new(strip.mode, (100,100), 0)
>>> dirs = [(1,0),(0,1),(-1,0),(0,-1)]
>>> x,y,z = -1,0,0
>>> for i in range(200):
...     d = dirs[i % 4]
...     for j in range(100 - (i + 1) // 2):
...         x += d[0]
...         y += d[1]
...         spiral.putpixel((x,y), strip.getpixel((z,0)))
...         z += 1
...
>>> spiral.save('spiral.png')

The image wound into a spiral reveals a cat named Uzi.

I’m not sure what those red pixels are (misdirection?) Anyway, the cat’s name is uzi.

Level 15: uzi

The picture for this challenge shows a calendar with the partially blanked out year “1•6”. The clues say “he ain’t the youngest, he is the second” and “todo: buy flowers for tomorrow”. The inset calendar for February shows 29 days, so this is a leap year (1996, 1976, 1956, ...) and moreover the year starts on a Thursday, which ought to be enough to identify it.

>>> import datetime
>>> for year in range(1996,1582,-20):
...     if datetime.date(year, 1, 1).weekday() == 3:   # 3=Thursday
...         print year,
...
1976 1756

(Note the end of the search at 1582, which was the first year of the Gregorian calendar.) So, “second youngest” refers to 1756, and “buy flowers for tomorrow” suggests that the answer is something to do with , and the title says “whom?” [sic]. Wikipedia comes to the rescue: it’s Mozart’s birthday.

Level 16: mozart

The title says, “let me get this straight”, and there’s a picture that looks like static, except for many five-pixel-wide magenta bars with one white pixel at each end. A close look suggests there might be one of these per line. Let’s see if that’s right.

>>> im = get_image('return/mozart.gif')
>>> w,h = im.size
>>> magenta = 195   # deduced by looking at the GIF's colour palette in GraphicConverter
>>> for j in range(h):
...     for i in range(w - 5):
...         if im.getpixel((i,j)) == magenta and im.getpixel((i+4,j)) == magenta:
...             bars.append((i,j))
...
>>> bars[:10]
[(429, 0), (500, 1), (312, 2), (107, 3), (105, 4), (611, 5), (478, 6), (500, 7), (337, 8), (282, 9)]
>>> for i in range(h): if bars[i][1] != i: print i,
...
>>>

So there’s one bar per line, but what do we have to do? The clue says, “let me get this straight”. So do I shift the rows horizontally until the bars line up?

>>> shift = Image.new(im.mode, (w * 2, h), 0)
>>> shift.palette = im.palette  # share colour table
>>> for j in range(h):
...     for i in range(w):
...         shift.putpixel((i + w - bars[j][0], j), im.getpixel((i,j)))
...
>>> shift.save('shift.png')

When the magenta bars are lined up, the word “romance” is revealed in the static.

Ah, I see I was supposed to wrap the lines around. So that would be (i + w - bars[j][0]) % w. But this will do.

Level 17: romance

The picture shows cookies, and there’s an inset with the sawhorse picture from level 4. So maybe this is going to involve chasing down a linked list of pages based on information in HTTP cookies? Hmmm, that’s a bit annoying, as cookielib is new in Python 2.4, and I only have Python 2.3. Excuse me while I upgrade. ... That was pretty painless. Go Python!

>>> import urllib2, cookielib
>>> auth_handler = urllib2.HTTPBasicAuthHandler()
>>> auth_handler.add_password('inflate', 'www.pythonchallenge.com', 'huge', 'file')
>>> jar = cookielib.CookieJar()
>>> cookie_handler = urllib2.HTTPCookieProcessor(jar)
>>> opener = urllib2.build_opener(auth_handler, cookie_handler)
>>> opener.open('http://www.pythonchallenge.com/pc/return/romance.html')
>>> list(jar)
[]

Oh. No cookie for me. So does my browser have any cookies from pythonchallenge.com? Yes, there is one with the name “info” and value “you+should+have+followed+busynothing...”. Does that mean that I can go back to the level 4 URL?

>>> print opener.open('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345').read()
If you came here from level 4 - go back!
You should follow the obvious chain...

and the next busynothing is 92512

Well, it’ll be slightly disappointing if this level just repeats level 4. Unless ...

>>> list(jar)
[Cookie(version=0, name='info', value='B', port=None, port_specified=False,
domain='.pythonchallenge.com', domain_specified=True, domain_initial_dot=True,
path='/', path_specified=True, secure=False, expires=1179777427, discard=False,
comment=None, comment_url=None, rest={}, rfc2109=False)]

So maybe there’s a message encoded in the cookies we get as we follow the chain?

>>> def next_page(p, pp, message):
...     text = opener.open('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=%s' % p).read()
...     m = re.search('and the next busynothing is ([0-9]+)', text)
...     message.append(list(jar)[0].value)
...     if m: pp.append(m.group(1))
...     else: print text
...     return m.group(1)
...
>>> p = 12345
>>> pp = []
>>> message = []
>>> for i in range(400): p = next_page(p, pp, message)
...
that's it.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in next_page
AttributeError: 'NoneType' object has no attribute 'group'
>>> ''.join(message)
'BZh91AY%26SY%94%3A%E2I%00%00%21%19%80P%81%11%00%AFg%9E%A0+%00hE%3DM%B5%23%D0%D4
%D1%E2%8D%06%A9%FA%26S%D4%D3%21%A1%EAi7h%9B%9A%2B%BF%60%22%C5WX%E1%ADL%80%E8V%3C
%C6%A8%DBH%2632%18%A8x%01%08%21%8DS%0B%C8%AF%96KO%CA2%B0%F1%BD%1Du%A0%86%05%92s
%B0%92%C4Bc%F1w%24S%85%09%09C%AE%24%90'

That looks like a bzip2-compressed datastream that’s been URL-quoted.

>>> bz2.BZ2Decompressor().decompress(urllib.unquote(''.join(message)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: invalid data stream

Why doesn’t that work? Let’s take a look at the unquoted data:

>>> urllib.unquote(''.join(message))
'BZh91AY&SY\x94:\xe2I\x00\x00!\x19\x80P\x81\x11\x00\xafg\x9e\xa0+\x00hE=M\xb5#
\xd0\xd4\xd1\xe2\x8d\x06\xa9\xfa&S\xd4\xd3!\xa1\xeai7h\x9b\x9a+\xbf`"\xc5WX\xe1
\xadL\x80\xe8V<\xc6\xa8\xdbH&32\x18\xa8x\x01\x08!\x8dS\x0b\xc8\xaf\x96KO\xca2\xb0
\xf1\xbd\x1du\xa0\x86\x05\x92s\xb0\x92\xc4Bc\xf1w$S\x85\t\tC\xae$\x90'

Ah, there’s a plus sign in there. Cunning.

>>> print bz2.BZ2Decompressor().decompress(urllib.unquote_plus(''.join(message)))
is it the 26th already? call his father and inform him that "the flowers are on their way".
he’ll understand.

I wonder how many attempts Samet needed before he found a string that compressed to data with a space character in it? Or was just an accidental difficulty? Anyway, “26th” is a reference from level 15 to Wolfgang Amadeus Mozart. His father was Leopold Mozart. And we have to call him, which suggests we need to look in the phonebook from level 13 again.

>>> phonebook.phone('Leopold')
'555-VIOLIN'

But the level doesn’t seem to be over yet. Instead we have Leopold saying, “it’s me. what do you want?”. So the obvious thing is to try violin.php?the+flowers+are+on+their+way. But no luck. So in keeping with the theme of this level, maybe the flowers need to be in a cookie?

>>> list(jar)[0].value = 'the+flowers+are+on+their+way'
>>> print opener.open('http://www.pythonchallenge.com/pc/stuff/violin.php').read()
<html>
<head>
  <title>it's me. what do you want?</title>
  <link rel="stylesheet" type="text/css" href="../style.css">
</head>
<body>
	<br><br>
	<center><font color="gold">
	<img src="leopold.jpg" border="0">
<br><br>
oh well, don't you dare to forget the balloons.</font>
</body>
</html>

This would have been very tricky indeed if my browser hadn’t kept that cookie for me from the time I was playing level 4.

Level 18: balloons

There’s a JPEG divided into two halves, almost identical except that the second one is darker. The title is, “can you tell the difference?” and there’s a clue, “it is more obvious that what you might think” [sic]. Maybe we have to subtract the colour values of the darker half from the lighter, creating a new image?

>>> im = get_image('return/balloons.jpg')
>>> w,h = im.size
>>> diff = Image.new(im.mode, (w//2, h), 0)
>>> for i in range(w//2):
...     for j in range(h):
...         diff.putpixel((i,j), tuple([x[0]-x[1] for x in zip(im.getpixel((i,j)),
...                                                            im.getpixel((i+w//2,j)))]))
...
>>> diff.save('diff.png')

One swan image subtracted from the other reveals no clue: this is a dead end.

No good. And the individual channels of the image are no good either. It looks like there’s no interesting data in this difference. What about the clue, “it is more obvious that what you might think”? No idea. Let me look at the forums. I see, I’m supposed to guess that I have to visit the page brightness.html. (I’m not doing very well at the guessing parts of the riddles.) The page source refers me to delta.gz, which, when uncompressed, looks like this:

89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00   89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00
02 8a 00 00 00 c8 08 02 00 00 00 e0 19 57 95 00 00 00   02 8a 00 00 00 c8 08 02 00 00 00 e0 19 57 95 00 00 00
09 70 48 59 73 00 00 0b 13 00 00 0b 13 01 00 9a 9c 18   09 70 48 59 73 00 00 0b 13 00 00 0b 13 01 00 9a 9c 18

Now, 89 50 4e 47 0d 0a 1a 0a is immediately recognizable as the 8-byte PNG signature. So it looks like we have two PNGs hex-encoded side by side.

>>> deltas = open('deltas').read()
>>> lines = deltas.split('\n')
>>> pairs = [(l[:53], l[56:]) for l in lines]
>>> columns = ['\n'.join([p[i] for p in pairs]) for i in range(2)]
>>> import codecs
>>> def unhex(s): return codecs.getdecoder('hex')(re.sub('[^0-9a-fA-F]', '', s))[0]
...
>>> for i in range(2): open('delta%d.png' % i,'wb').write(unhex(columns[i]))

Unfortunately, the images are broken. Here’s how they come out in GraphicConverter:

The left-hand column of data as an image starts OK but then is corrupted.

The right-hand column of data as an image starts OK but then is corrupted.

They start out OK but then they go wrong. Why is that? Hmmm. A more careful look at the two columns shows that they are identical to start with, but then there are interpolated lines in the left-hand column that don’t appear in the right-hand column, shown underlined in the extract below:

...                                                     ...
26 0d 8e d7 29 22 86 e7 2f 75 f9 c8 8e d5 f4 9e 7e 38   26 0d 8e d7 29 22 86 e7 2f 75 f9 c8 8e d5 f4 9e 7e 38
56 3b 44 a4 a0 00 04 d8 04 99 55 d6 c3 af 78 59 73 dd   56 3b 44 a4 a0 00 04 d8 04 99 55 d6 c3 af 78 59 73 dd
89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00   72 6c 41 7d 6b f7 bd a4 41 41 65 46 79 46 79 4f 6c d6
72 6c 41 7d 6b f7 bd a4 41 41 65 46 79 46 79 4f 6c d6   bc d6 30 1d b5 2d b5 f1 a6 7f eb 29 9a b6 48 0a 94 29
bc d6 30 1d b5 2d b5 f1 a6 7f eb 29 9a b6 48 0a 94 29   32 02 49 8a 66 60 12 a4 0d ea 25 e6 0a 3a 5e a5 88 18
32 02 49 8a 66 60 12 a4 0d ea 25 e6 0a 3a 5e a5 88 18   9e bf 18 32 ca 0f f4 d1 81 3e 2a 55 d5 fb 12 34 bc 6e
9e bf 18 32 ca 0f f4 d1 81 3e 2a 55 d5 fb 12 34 bc 6e   d1 12 95 43 b8 eb 70 35 a3 8b 78 52 7f 63 48 29 1d a8
01 50 00 00 00 8f 08 06 00 00 00 ac f7 83 97 00 00 00   e1 48 8d 13 ba 21 0b a2 c4 52 92 72 1a e0 57 bc f8 f8
d1 12 95 43 b8 eb 70 35 a3 8b 78 52 7f 63 48 29 1d a8   9b 1e 2d 2f ee 32 08 a4 a0 44 0b a6 a1 b7 a9 79 f0 f0
09 70 48 59 73 00 00 0b 13 00 00 0b 13 01 00 9a 9c 18   01 41 6d 39 95 88 88 18 9e bf 0c 06 6a 74 a4 1f 8e d4
e1 48 8d 13 ba 21 0b a2 c4 52 92 72 1a e0 57 bc f8 f8   e4 c6 05 b5 86 ac 45 bd e2 55 46 79 a9 2a ed e3 45 fe
...                                                     ...

The extra lines also start with the PNG signature. So maybe they encode a second image? A bit later some other lines start to appear in the second column but not in the first. So maybe these encode a third image? I guess this is what difflib is for.

>>> import difflib
>>> column_diffs = list(difflib.Differ().compare(columns[0].splitlines(),
    ...                                          columns[1].splitlines()))
>>> import pprint
>>> pprint.pprint(column_diffs[48:59])
['  26 0d 8e d7 29 22 86 e7 2f 75 f9 c8 8e d5 f4 9e 7e 38',
 '  56 3b 44 a4 a0 00 04 d8 04 99 55 d6 c3 af 78 59 73 dd',
 '- 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00',
 '  72 6c 41 7d 6b f7 bd a4 41 41 65 46 79 46 79 4f 6c d6',
 '  bc d6 30 1d b5 2d b5 f1 a6 7f eb 29 9a b6 48 0a 94 29',
 '  32 02 49 8a 66 60 12 a4 0d ea 25 e6 0a 3a 5e a5 88 18',
 '  9e bf 18 32 ca 0f f4 d1 81 3e 2a 55 d5 fb 12 34 bc 6e',
 '- 01 50 00 00 00 8f 08 06 00 00 00 ac f7 83 97 00 00 00',
 '  d1 12 95 43 b8 eb 70 35 a3 8b 78 52 7f 63 48 29 1d a8',
 '- 09 70 48 59 73 00 00 0b 13 00 00 0b 13 01 00 9a 9c 18',
 '  e1 48 8d 13 ba 21 0b a2 c4 52 92 72 1a e0 57 bc f8 f8']

Let’s gather up the three kinds of lines: identical in the two columns (initial space), in the first but not the second (initial minus) and in the second but not the first (initial plus).

>>> pngs = [''.join(filter(lambda l: l[0] == d, column_diffs)) for d in " -+"]
>>> for i in range(len(pngs)): open('delta%d.png' % (i + 2), 'wb').write(unhex(pngs[i]))

Lines that are identical in the two columns reveal the text “../hex/bin.html”.

Lines in the left-hand column but not in the right reveal the text “fly”.

Lines in the right-hand column but not in the left reveal the text “butter”.

Level 19: bin

The source code contains a base64-encoded WAV audio file which we’re asked to decode and listen to.

>>> bin = get_challenge('hex/bin.html')
Enter username for pluses and minuses at www.pythonchallenge.com: butter
Enter password for butter in pluses and minuses at www.pythonchallenge.com: fly

>>> import base64
>>> india64 = re.search('base64\n\n(.*)\n\n', bin, re.S).group(1)
>>> open('india.wav','wb').write(base64.decodestring(india64))

The static at the beginning and end makes me wonder if some extra decoding step is needed. The obvious next page sorry just says, “what are you apologizing for?” Maybe we need to tell Leopold via a cookie, like we did in level 17?

>>> list(jar)[0].value = 'sorry'
>>> print opener.open('http://www.pythonchallenge.com/pc/stuff/violin.php').read()
<html>
<head>
  <title>it's me. what do you want?</title>
  <link rel="stylesheet" type="text/css" href="../style.css">
</head>
<body>
	<br><br>
	<center><font color="gold">
	<img src="leopold.jpg" border="0">
<br><br>
</font>
</body>
</html>

Are we really expected to e-mail him? Let’s try it. Yes, he e-mails back:

From: leopold.moz@pythonchallenge.com
Subject: Re: sorry
Date:  BDT

Never mind that.

Have you found my broken zip?

md5: bbb8b499a0eef99b52c7f13f4e78c24b

Can you believe what one mistake can lead to?

What’s that all about? I don’t know what “broken zip” can be referring to. [This is the correct MD5 checksum for a corrupted zip file in level 24.] Never mind, back to the audio file; maybe there’s something else hidden in that static. Let’s take a look at the audio data.

>>> import wave
>>> iw = wave.open('india.wav')
>>> iw.getnchannels()
1
>>> iw.getsampwidth()
2
>>> iw.getframerate()
11025
>>> iw_frames = iw.readframes(iw.getnframes())
>>> len(iw_frames)
111576
>>> def hex(s): return codecs.getencoder('hex')(s)[0]
...
>>> hex(iw_frames[:40])
'06400c400b40004003400240044004400240024006400540054004400a400b40094008400f400f40'

There’s clearly some kind of period-two behaviour going on there. In each 16-bit sample the least significant byte appears random but the most significant byte is not (in this short extract it’s always 40). So what if we extract those non-random bytes? (And remembering to halve the framerate since we’re halving the number of samples?

>>> iw2 = wave.open('india2.wav','w')
>>> iw2.setnchannels(1)
>>> iw2.setsampwidth(iw.getsampwidth())
>>> iw2.setframerate(iw.getframerate() // 2)
>>> iw2.writeframes(iaudio[1::2])
>>> iw2.close()

Level 20: idiot2

A picture of a chainlink fence with a sign, “private property beyond this fence”, and a clue, “but inspecting it carefully is allowed”. No obvious manipulations of the picture beyond JPEG artifacts. Lots of plausible related filenames (real.jpg, idiot1.html, etc) don’t work. So are there any other channels of information? Nothing in the source. No cookies. Nothing interesting in the IPTC or Exif data from the JPEG. What about HTTPS? Let’s try accessing that page via https. Yes, we get a certificate, but for a different site by the same developer, www.keytrail.com. The certificate seems to be the same in both cases and it’s valid for the latter. Note that www.pythonchallenge.com and www.keytrail.com resolve to the same IP address. So is this a clue or not? [No, it’s just a misconfigured web server.] OK, what about the HTTP response?

>>> import httplib
>>> conn = httplib.HTTPConnection('www.pythonchallenge.com')
>>> conn.request('GET', '/pc/hex/unreal.jpg')
>>> r = conn.getresponse()
>>> r.status, r.reason
(401, 'Unauthorized')

Oops, HTTP authentication needed.

>>> headers = {'Authorization': 'Basic ' + base64.b64encode('butter:fly')}
>>> conn = httplib.HTTPConnection('www.pythonchallenge.com')
>>> conn.request('GET', '/pc/hex/unreal.jpg', '', headers)
>>> r = conn.getresponse()
>>> r.status, r.reason
(200, 'OK')
>>> pprint(r.getheaders())
[('x-powered-by', 'PHP/5.2.2-pl1-gentoo'),
 ('transfer-encoding', 'chunked'),
 ('server', 'lighttpd/1.4.15'),
 ('content-range', 'bytes 0-30202/2123456789'),
 ('date', 'Wed, 16 May 2007 11:24:33 GMT'),
 ('content-type', 'image/jpeg')]

That content-range header is a bit surprising. I wonder what other ranges can be returned?

>>> def get_range(page, base, limit):
...     conn = httplib.HTTPConnection('www.pythonchallenge.com')
...     headers = {'Authorization': 'Basic ' + base64.b64encode('butter:fly'),
...                'Range': 'bytes=%s-%s' % (base, limit)}
...     conn.request('GET', page, '', headers)
...     return conn.getresponse()
...
>>> r = get_range('/pc/hex/unreal.jpg',30203,30300)
>>> pprint(r.getheaders())
[('x-powered-by', 'PHP/5.2.2-pl1-gentoo'),
 ('transfer-encoding', 'chunked'),
 ('server', 'lighttpd/1.4.15'),
 ('content-transfer-encoding', 'binary'),
 ('content-range', 'bytes 30203-30236/2123456789'),
 ('date', 'Wed, 16 May 2007 11:35:01 GMT'),
 ('content-type', 'application/octet-stream')]
>>> r.read()
"Why don't you respect my privacy?\n"

Note that the range of bytes returned is not the full range requested. So it looks like the end of the range doesn’t make any difference. (So why include it? Maybe it’s a clue?) It also looks like the beginning of the range has to be spot-on:

>>> get_range('/pc/hex/unreal.jpg', 30204, 30205).read()
''

If it were working straightforwardly, we’d expect to get the result 'h' here. So let’s see how far this goes:

>>> def next_range(base, bases, results):
...     r = get_range('/pc/hex/unreal.jpg', base, 2123456789)
...     bases.append(b)
...     results.append(r.read())
...     m = re.match('bytes %d-([0-9]+)/2123456789' % base, r.getheader('content-range'))
...     return int(m.group(1)) + 1
>>> bases = []
>>> results = []
>>> b = 30202
>>> for i in range(400): b = next_range(b, bases, results)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in next_range
  File "python2.5/re.py", line 129, in match
    return _compile(pattern, flags).match(string)
TypeError: expected string or buffer
>>> pprint(results)
["Why don't you respect my privacy?\n",
 'we can go on in this way for really long time.\n',
 'stop this!\n',
 'invader! invader!\n',
 'ok, invader. you are inside now. \n',
 '']
>>> bases
[30203, 30237, 30284, 30295, 30313, 30347]

What now? Random checks on ranges above 30347 are rejected. Looking at the (ostensible) end of the data with Range: bytes=-2123456789 just produces the original unreal.jpg. The page invader.html is not very helpful. 2123456789 is quite close to 231, isn’t it?

>>> r = get_range('/pc/hex/unreal.jpg', 2123456789, pow(2,31))
>>> r.read()
'esrever ni emankcin wen ruoy si drowssap eht\n'
>>> _[::-1]
'\nthe password is your new nickname in reverse'
>>> print 'invader'[::-1]
redavni

The password to what? Any more data?

>>> pprint(r.getheaders())
[('x-powered-by', 'PHP/5.2.2-pl1-gentoo'),
 ('transfer-encoding', 'chunked'),
 ('server', 'lighttpd/1.4.15'),
 ('content-transfer-encoding', 'binary'),
 ('content-range', 'bytes 2123456744-2123456788/2123456789'),
 ('date', 'Wed, 16 May 2007 12:51:49 GMT'),
 ('content-type', 'application/octet-stream')]
>>> get_range('/pc/hex/unreal.jpg', 2123456743, '').read()
'and it is hiding at 1152983631.\n'
>>> data = get_range('/pc/hex/unreal.jpg', 1152983631, '').read()
>>> len(data)
239733
>>> data[:100]
'PK\x03\x04\x14\x00\t\x00\x08\x00;\xa7\xaa2\xac\xe5f\x14\xa9\x00\x00\x00\xd3\x00
\x00\x00\n\x00\x15\x00readme.txtUT\t\x00\x03"\xf6\x80B\x19\xf7\x80BUx\x04\x00
\xe8\x03\xe8\x03R\x1d^\xf1\xe5\xbf\xa3\xc2\xc0]\xc2)\xfd|\xdbC\x9b\xa5\xf6B\xc1j
\x1c\x8cJ^6VE\x87\xcd\xaa\x1e\xf3\xd5P\xc4\xb5I'

The first four bytes indicate that this is a zip file.

>>> z = zipfile.ZipFile(StringIO.StringIO(data))
>>> z.namelist()
['readme.txt', 'package.pack']
>>> print z.read('readme.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "python2.5/zipfile.py", line 501, in read
    bytes = dc.decompress(bytes)
zlib.error: Error -3 while decompressing: invalid distance too far back

Python’s zipfile module appears not to have any support for password-protected zipfiles. But that’s OK, there are other tools in the world.

$ unzip -P redavni unreal.zip
Archive:  unreal.zip
  inflating: readme.txt
  inflating: package.pack
$ cat readme.txt
Yes! This is really level 21 in here.
And yes, After you solve it, you'll be in level 22!

Now for the level:

* We used to play this game when we were kids
* When I had no idea what to do, I looked backwards.

Level 21

What kind of thing is package.pack? It’s not anything I recognize, .pack is not a known extension, and file package.pack just says, “data”. Looking at it in a hex editor reveals nothing. But Googling for the first two bytes 78 9c, suggests that this might be a zlib deflate stream, which would explain why the zip file is not compessed very much:

$ wc -c package.pack unreal.zip
  239194 package.pack
  239733 unreal.zip

Anyway, decompressing:

>>> pack = open('package.pack').read()
>>> unpack = zlib.decompress(pack)
>>> hex(unpack[:40])
'789c000740f8bf789c000640f9bf789c00ff3f00c0789c00ff3f00c0789c84767554144ed436dd48'

Another deflate stream (first two bytes: 78 9c). And look, there are several more occurrences of those two bytes inside, suggesting that this is a file that’s been compressed multiple times. (When zlib can make no progress—as is the case on random-looking data such as a compressed file—it stores the data verbatim, plus a short header.)

>>> while 1: unpack = zlib.decompress(unpack)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> len(unpack)
238759
>>> unpack[:10]
'BZh91AY&SY'

OK, now it’s a bzip2-compressed datastream.

>>> def uncompress(data):
...     if data[:2] == 'x\x9c': return zlib.decompress(data)
...     elif data[:2] == 'BZ': return bz2.BZ2Decompressor().decompress(data)
...     else: raise ValueError
>>> while 1: unpack = uncompress(unpack)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in uncompress
ValueError
>>> len(unpack)
184947
>>> hex(unpack[:10])
'808d96cbb572a7000658'

What’s this? Hmmm... the clue said, “When I had no idea what to do, I looked backwards”.

>>> unpack[:-10:-1]
'789c000c40f3bf789c'

It’s another deflate stream in reverse.

>>> def uncompress(data):
...     if data[:2] == 'x\x9c': return zlib.decompress(data)
...     elif data[:2] == 'BZ': return bz2.BZ2Decompressor().decompress(data)
...     elif data[-2:] == '\x9cx': return zlib.decompress(data[::-1])
...     elif data[-2:] == 'ZB': return bz2.BZ2Decompressor().decompress(data[::-1])
...     else: raise ValueError
...
>>> while 1: unpack = uncompress(unpack)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in uncompress
ValueError
>>> len(unpack)
17
>>> print unpack
sgol ruoy ta kool
>>> print unpack[::-1]
look at your logs

Ah, I should have known I would have to keep logs. Let’s try again.

>>> def uncompress(data, log):
...     if data[:2] == 'x\x9c': log.append('z'); return zlib.decompress(data)
...     elif data[:2] == 'BZ': log.append('b'); return bz2.BZ2Decompressor().decompress(data)
...     elif data[-2:] == '\x9cx': log.append('Z'); return zlib.decompress(data[::-1])
...     elif data[-2:] == 'ZB': log.append('B'); return bz2.BZ2Decompressor().decompress(data[::-1])
...     else: raise ValueError
...
>>> log = []
>>> unpack = pack
>>> while 1: unpack = uncompress(unpack, log)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in uncompress
ValueError
>>> ''.join(log[:80])
'zzzzzzbbbzzzzzzzzzzbbbzzzzzzbbbbbbbbzzzzbbbbbbbbzzzzbbbbbbbbbbzzbbbbbbbbZzzzbbbb'

After a fair bit of experimention, it turns out that this is a bitmap, with Z as the line terminator.

>>> pprint(''.join(log).replace('z',' ').split('Z'))
['      bbb          bbb      bbbbbbbb    bbbbbbbb    bbbbbbbbbb  bbbbbbbb',
 '   bbbbbbb      bbbbbbb    bbbbbbbbb   bbbbbbbbb   bbbbbbbbb   bbbbbbbbb',
 '  bb     bb    bb     bb   bb      bb  bb      bb  bb          bb      bb',
 ' bb           bb       bb  bb      bb  bb      bb  bb          bb      bb',
 ' bb           bb       bb  bbbbbbbbb   bbbbbbbbb   bbbbbbbb    bbbbbbbbb',
 ' bb           bb       bb  bbbbbbbb    bbbbbbbb    bbbbbbbb    bbbbbbbb ',
 ' bb           bb       bb  bb          bb          bb          bb   bb ',
 '  bb     bb    bb     bb   bb          bb          bb          bb    bb ',
 '   bbbbbbb      bbbbbbb    bb          bb          bbbbbbbbb   bb     bb ',
 '     bbb          bbb      bb          bb          bbbbbbbbbb  bb      bb']

(The first row seems off by one. Why is that? Maybe we’re supposed to ignore the first unpacking?) Anyway, going on:

Level 22: copper

This challenge has a picture of a joystick and the clues “emulate” and “or maybe white.gif would be more bright”. The image white.gif is a 16-colour 200×200 GIF. All pixels are black, but looking carefully reveals that all use colour index 0 except for a single pixel using colour index 8 at (100,100). Visting joystick.html yields the clue, “are you in level 2, or level 22?” and 22.html yields, “it was a rhetorical question!”. So are we looking for rare things such as that single pixel of colour 8? Hmmm, that GIF is awfully big (at 38K) for such a simple image. Looking at its contents it seems to consist of 133 similar blocks of data. So maybe it’s an animated GIF? Ah, GraphicConverter even reports the number of frames. Now I feel silly for not noticing that. Now, if I edit the colour table in GraphicConverter to brighten colour 8, and play the animation, it shows a single pixel wandering around:

Brightening the palette reveals an animation of a single dot wandering around the centre of a black square.

OK, let’s extract those pixel movements (easy with the getbbox method, which calculates the bounding box of the non-zero regions in the image):

>>> im = get_image('hex/white.gif')
>>> path = [i.getbbox()[0:2] for i in ImageSequence.Iterator(im)]
>>> path[:10]
[(100, 100), (100, 102), (100, 102), (100, 102), (100, 102),
 (100, 102), (100, 102), (100, 102), (100, 102), (102, 102)]
>>> moves = [(path[i+1][0] - path[i][0], path[i+1][1] - path[i][1]) for i in range(len(path) - 1)]
>>> moves[:10]
[(0, 2), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (2, 0), (0, -2)]

Now, I’m guessing we’re supposed to interpret these as joystick movements. So why the (0,0) elements? Maybe we’re supposed to think of a joystick-operated game like Pacman, in which an object keeps moving in a direction until the joystick is moved again.

>>> joy = Image.new(im.mode, (200,200), 0)
>>> pos = (100,100)
>>> dir = (0,0)
>>> for m in moves:
...     joy.putpixel(pos, 9)
...     if m != (0,0): dir = m
...     pos = (pos[0] + dir[0], pos[1] + dir[1])
...
>>> joy.save('joy.gif')

First attempt to interpret the pixel motion as joystick control: no good.

That’s no good. Next idea: maybe we ignore the moves and consider the original path data to represent the position of an analog joystick. So we can imagine driving a point around the image leaving a trail behind it. So when the data says (102,102) we move our point two pixels to the right and two down.

>>> joy2 = Image.new(im.mode, (200,200), 0)
>>> pos = (100,100)
>>> for p in path:
...     joy2.putpixel(pos, 255)
...     pos = (pos[0] + p[0] - 100, pos[1] + p[1] - 100)
...
>>> joy2.save('joy2.gif')

Second attempt to interpret the pixel motion as joystick control: some overlapping letterforms.

Ah, that’s better. That actually looks like a bunch of overlapping letterforms. How do we distinguish the end of each letterform, so that we can displace the start of the next? Well, if you were actually drawing letters with a joystick, you’d probably stop drawing between each letter, and the spring would return the joystick to the centre, that is (100,100) in this case. And sure enough, there are five occurrences of (100,100), which would make five letters if I’m right.

>>> joy3 = Image.new(im.mode, (200,200), 0)
>>> letter = 0
>>> pos = (30,30)
>>> for p in path:
...     if p == (100,100): letter += 1; pos = (letter * 30,) * 2
...     else: pos = (pos[0] + p[0] - 100, pos[1] + p[1] - 100)
...     joy3.putpixel(pos, 255)
...
>>> joy3.save('joy3.gif')

Second attempt to interpret the pixel motion as joystick control, with letters separated.

Level 23: bonus

The title is “what is this module?” and there is a clue, “it can’t find it. this is an undocumented module”, plus the text “va gur snpr bs jung?” which is rot13 for “in the face of what?” Undocumented modules are listed here. None seem particularly apposite. Off to the forums again, which suggest I should be looking in my Python modules directory and read this file.

Level 24: ambiguity

This level shows a maze, the title, “from top to bottom”, and nothing else. Well, let’s solve the maze first. Close inspection reveals that the walls of the maze are white, but the corridors alternate colour: even pixels are black and odd pixels have miscellaneous values in the red channel. For example, starting at the top right and walking down the corridor, you pass pixels with red channels 00 50 00 4b 00 03 00 04 00 14. Taking the even-numbered values, we can recognize the zip file signature. So maybe when we trace the shortest path through the maze, we can construct a zip file from the red channels of the pixels we visited? Let’s find the shortest path by breadth-first search:

def search(maze, tree):
    dirs = (0,1), (0,-1), (1,0), (-1,0)
    wall = (255,) * 4
    w,h = maze.size
    maze.putpixel((1, h-1), wall) # Wall off exit
    maze.putpixel((w-2, 0), wall) # Wall off entrance
    queue = [(1, h-2)]            # Start at exit
    tree[queue[0]] = 'exit'
    while queue:
        pos = queue.pop(0)
        for d in dirs:
            new_pos = pos[0] + d[0], pos[1] + d[1]
            if not tree.has_key(new_pos) and maze.getpixel(new_pos) != wall:
                tree[new_pos] = pos
                queue.append(new_pos)

This is the first bit of programming in the challenge, so I’ll gloss it. The idea is to construct a directed graph in which each reachable position in the maze points at the next position you would visit on the shortest path to the exit. This directed graph is realized by a Python dictionary. We construct the graph by a breadth-first search starting at the exit. It’s convenient to wall off the entrance and exit (these are even positions so no information is lost) to avoid special-case code to prevent the search leaving the maze.

>>> maze = get_image('hex/maze.png')
>>> tree = {}
>>> search(maze, tree)
>>> w = maze.size[0]
>>> tree[(w-2,1)]
(639, 2)

So there is a path to the exit.

>>> path = []
>>> pos = (w-2,1)
>>> while pos != 'exit': path.append(chr(maze.getpixel(pos)[0])); pos = tree[pos]
...
>>> len(path)
44621
>>> hex(''.join(path[:10]))
'50004b00030004001400'

It would have been possible to search the maze faster by only considering the odd squares (the ones with the data), but that would have meant some special-casing to avoid the search leaving the maze, and it would have deprived us of the chance to test the hypothesis that even squares are always zero:

>>> path[1::2] == ['\x00'] * (len(path) // 2)
True

Here’s the shortest path:

>>> pos = (w-2,1)
>>> while pos != 'exit': maze.putpixel(pos, (0,255,)*2); pos = tree[pos]
...
>>> maze.save('maze-solved.png')

The solution to the maze (in green).

Now, let’s construct the zipfile from the non-zero bytes along the shortest path.

>>> mazezip = ''.join(path[::2])
>>> z = zipfile.ZipFile(StringIO.StringIO(mazezip))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "python2.3/zipfile.py", line 210, in __init__
    self._GetContents()
  File "python2.3/zipfile.py", line 230, in _GetContents
    self._RealGetContents()
  File "python2.3/zipfile.py", line 242, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

Uh oh, something’s wrong. But Python’s zipfile module is not our only tool for handling zip files.

>>> open('maze.zip','w').write(mazezip)

unzip comes to the rescue with its generous error handling:

$ unzip maze.zip
Archive:  maze.zip
  inflating: maze.jpg
 extracting: mybroken.zip
$ unzip mybroken.zip
Archive:  mybroken.zip
  inflating: mybroken.gif             bad CRC 31eddaa4  (should be 383782e7)

The intact image shows the way to level 25:

A picture of a lake, extracted from the ZIP archive decoded from the solution to the maze.

But the broken image has a word too:

Another image extracted from the ZIP archive is corrupted, but we can read the word “speed”.

We also have that e-mail from level 19; maybe we can repair the broken zip.

>>> import md5
>>> broken = open('mybroken.zip').read()
>>> for i in range(len(broken)):
...     for j in range(256):
...         repaired = broken[:i] + chr(j) + broken[i+1:]
...         if md5.md5(repaired).hexdigest() == 'bbb8b499a0eef99b52c7f13f4e78c24b':
...             open('unbroken.zip','wb').write(repaired)
...             print i, j
...             raise StopIteration
...
1234 168
Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
StopIteration

We could save the computer a lot of copying by updating broken in place as we go. And we could save half the MD5 computation by copying the MD5 state after each position is reached. But this will do fine.

The corrupted “speed” image corrected using the MD5 signature.

Is it a clue for the next level? (Lake Speed was a NASCAR racing driver.) [No, it’s a clue for level 26.]

Level 25: lake

There’s a picture of a lake in the form of a jigsaw with 25 pieces (same number as the level), a clue “can you see the waves?” and a title, “imagine how they sound”. This suggests we’re going to be dealing with WAV files again. Moreover, the image is lake1.jpg but there’s no lake2.jpg. What about lake1.wav? Aha, it seems that there are 25 WAV files. None has any distinguishable sound. So what do they contain?

>>> wavs = [wave.open(StringIO.StringIO(get_challenge('hex/lake%d.wav' % i))) for i in range(1,26)]
>>> wavs[0].getframerate()
9600
>>> wavs[0].getnsamples()
10800

That’s an odd framerate for an audio file. Let’s open it up in Wave Editor and see what it looks like. Hmmm, there’s a clear period-3 behaviour to the samples, not what we expect from audio at all. Maybe it’s not audio at all, but an encoding of some other form of data, say an image, part of a jigsaw. If we divide by 3 (3 bytes per pixel) we get 3600, which suggests each jigsaw piece is 60×60, which would mean 300×300 for the full jigsaw. Let’s try one piece:

>>> def jig(w): return Image.fromstring('RGB', (60,60), w.readframes(w.getnframes()))
...
>>> jig(wavs[0]).save('lake1.png')

One of twenty-five image tiles decoded from the audio stream.

Good. Now for the whole jigsaw.

>>> jigsaw = Image.new('RGB', (300,300), 0)
>>> for i in range(len(wavs)): jigsaw.paste(jig(wavs[i]), (60 * (i % 5), 60 * (i // 5)))
...
>>> jigsaw.save('jigsaw.png')

The twenty-five image tiles assembled into a picture of a lake with “decent” written over it.

Level 26: decent

The title says, “be a man - apologize!” and there are clues “you’ve got his e-mail” and “Hurry up, I’m missing the boat”. I think this is a reference to level 24, and all I have to do is guess...

Level 27: speedboat

A picture showing an oar with a zigzag line. Title “between the tables”, clues “did you say gif?” and “oh, and this is NOT a repeat of 14”. There’s a link to bell [that’s level 28] but that’s password protected; the login domain is “the order matters”. Trying zigzag.gif as suggested, I see that the GIF is interlaced, which is new. Is that significant? [No, it’s not.] It’s a greyscale image with 256 levels of grey in the pixels, no clear pattern:

>>> zig = get_image('hex/zigzag.gif')
>>> zigdata = zig.tostring()
>>> hex(zigdata[:20])
'd7d0cb0cfe266c743b8b4842bd7fb0ad46aacf27207e8e'

The reference to level 14 suggests that spiral order is not it (and if it were, the opening bytes d7 d0 cb don’t suggest any file format). So what about zigzag order? I can’t find any place to start that looks like the beginning of a file. What about the palette? It appears to have each colour in it once:

>>> len(zig.getcolors())
256
>>> palette = zig.palette.getdata()[1][::3] # 3 bytes per pixel, equal RGB
>>> hex(palette[:20])
'25e5a2883bd40929189c9470fe5b6a31f8d5dc0f'

The values in the image data are the numbers of palette entries. What if we translate the image data by the palette, getting greyscale values?

>>> t = string.maketrans(''.join([chr(i) for i in range(256)]), palette)
>>> zigtrans = zigdata.translate(t)
>>> hex(zigtrans[:20])
'd0cb0cfe266c743b8b4842bd7fb0ad46aacf27207e8ea4'

Still nothing. Hang on, isn’t that very similar to the original data? It’s identical, except that it’s missing the first byte. Is it identical all the way to the end?

>>> zigdata[1:] == zigtrans[:-1]
False

What if we gather up all the bytes which are different in the two strings?

>>> deltas = filter(lambda p: p[0] != p[1], zip(zigdata[1:], zigtrans[:-1]))
>>> diffs = [''.join([p[i] for p in deltas]) for i in range(2)]
>>> diffs[0][:20]
'BZh91AY&SY\xe0\xaaYF\x00\x17\x9a\x11\x80@'
>>> diffs[1][:20]
'99bd5182f289530415450437200495e44e9bd5a8'

On the one side, a bzip2-compressed datastream, on the other, what? I don’t recognize it.

>>> bz = bz2.BZ2Decompressor().decompress(diffs[0])
>>> len(bz)
70644
>>> bz[:100]
'../ring/bell.html del assert repeat raise or class is exec return except print return switch from ex'

It’s a bunch of Python keywords plus ../ring/bell.html. Let’s see how many and what they are.

>>> keywords = bz.split(' ')
>>> keys = {}
>>> for k in keywords: keys[k] = 1
...
>>> keys.keys()
['and', 'elif', 'is', 'global', 'in', 'if', 'from', 'raise', 'for', 'except',
 'switch', 'finally', 'print', 'import', 'pass', 'repeat', 'return', 'exec',
 'else', 'assert', 'not', 'class', '../ring/bell.html', 'yield', 'try', 'while',
 'continue', 'del', 'break', 'or', 'def', 'lambda']
>>> len(keywords)
12000
>>> len(keys.keys())
32
>>> 12000 / 32
375

Are we meant to apply the same technique to this datastream? But if so, where’s the table? Does this datastream code for an image? (I tried several plausible ideas, but it looks like random noise each time.) Finally, in desperation, I tried every possible pair of keywords as username and password:

>>> auth_handler = urllib2.HTTPBasicAuthHandler()
>>> opener = urllib2.build_opener(auth_handler)
>>> for u in keys.keys():
...    for p in keys.keys():
...         try:
...             auth_handler.add_password('the order matters', 'www.pythonchallenge.com', u, p)
...             opener.open('http://www.pythonchallenge.com/pc/ring/bell.html')
...             print u,p
...             raise StopIteration
...         except urllib2.HTTPError:
...             pass
...
<addinfourl at 66194048 whose fp = <socket._fileobject object at 0x2a077a0>>
repeat switch
Traceback (most recent call last):
  File "<stdin>", line 7, in ?
StopIteration

Ah, I see. repeat and switch are the only words in the list that aren’t Python keywords. [I missed a clue: I looked at the data from the differences, but not at their pixel positions. If we make an image using just these pixel positions, we get another clue:]

>>> im = Image.new('1', zig.size, 0)
>>> im.putdata([p[0] == p[1] for p in zip(zigdata[1:], zigtrans[:-1])])
>>> im.save('zigclue.png')

A monochrome image with black pixels at the pixel positions where the zigzag pattern failed reveals the clue “not key word busy?”.

Level 28: bell

Title “many pairs ring-ring” and clue “RING-RING-RING say it out loud”. The picture has been overlaid with many short vertical green lines. The page grin.html says, “you are not happy - you are feeling sick.” And green.html says “yes! green!” so maybe we’re looking for something encoded in the green channel of the image. And that is where I got stuck.