Fourteen years of e-mail


E-mail messages per month, 1994–2008

This was produced by running:

find ~/Library/Mail -name '*.emlx' -print0 |
xargs -0 grep -h -m 1 ^Date: |
perl -ne 'use Date::Parse;
          s/Date: //;
          @d = strptime($_);
          $y = $d[5] + 1900;
          printf "%d %02d\n",$y,$d[4] if 1990 < $y and $y <= 2008;' |
sort |
uniq -c > messages-per-month
gnuplot <<'END'
set term aqua
set size 0.8,0.4
set key left
plot 'messages-per-month' using ($2+$3/12):1 title "Messages per month" with line

And the reason I’m using Perl there, even though I normally avoid the language, is that Date::Parse really does a good job here of making sense of the kind of malformed date headers sent out by crappy excuses for mail user agents. For although (ignoring comments and folding whitespace) RFC 2822 says:

date            =       day month year
year            =       4*DIGIT / 2*DIGIT
month           =       "Jan" / "Feb" / "Mar" / "Apr" /
                        "May" / "Jun" / "Jul" / "Aug" /
                        "Sep" / "Oct" / "Nov" / "Dec"
day             =       1*2DIGIT

nonetheless I find in my e-mail history headers like these:

Date: Mon, Jan 12 2004 18:34:16 +0000
(No un-American date order for us!)
Date: 10/11/2007
(Hey, Joe, it says we've got to have a "Date" field, does it matter what I put in it?)
Date: Wed, 15 October 2003 10:13:31 EDT
(My high school teacher taught me never to abbreviate.)
Date: Sat 28 Jul, 2001  9:00:01 AM Europe/London
(No-one will mind if we move the comma, will they?)

Date::Parse handles these with no problems. Python, as far as I know, has nothing so useful: it has a strptime implementation (in the time module), but that only handles fixed formats.