Emacs 23

,

Emacs 23.1 was released on . In this review I’ll describe some of the new features in Emacs 23 that seem particularly useful to me, and some features from older versions of Emacs that I’ve only discovered recently, or which I think will be interesting or novel to the Emacs users among my readers. But everyone has different requirements and preferences, so if you find something particularly useful that I’ve missed, why not add a comment? (But if all this just reminds you of how much you’ve always hated Emacs, I’d appreciate it if you could vent your spleen elsewhere, thanks.)

Installing

I had no trouble getting Emacs 23.1 working on Windows: the FSF provides native Windows distributions. Emacs has the easiest installation process of any Windows software I know of: you just unzip the distribution into C:\Program Files or wherever and you’re done. If only other Windows software was so easy to install.

On Mac OS 10.5.7 I tried building from source following the EmacsApp instructions (./configure --with-ns && make && make install), which failed. I tried again after upgrading to 10.5.8 and this time it worked. I have no idea what went wrong, and have no particular desire to go back to 10.5.7 and investigate. If you have problems building Emacs 23, you might try one of the universal binary distributions at Emacs for Mac OS X (which I think could be better advertised).

On Mac OS X I experienced a couple of configuration glitches. First, I found that the Alt key had been assigned to the “Alt” modifier flag, a feature which I don’t use (I prefer the native use of the Alt key for entering accented letters and other characters). I fixed this by customizing ns-alternate-modifier. Second, there’s a problem with the PATH environment variable: Emacs gets the system PATH, not the one from my .profile. This problem is noted on the Emacs wiki together with a solution, but if this is a general problem with Mac OS X as claimed there, why am I discovering this for the first time in Emacs 23?

Native Unicode support

From NEWS:

*** The Emacs character set is now a superset of Unicode. (It has about four times the code space, which should be plenty). The internal encoding used for buffers and strings is now Unicode-based and called ‘utf-8-emacs

What I noticed was an immediate improvement in the handling of some large UTF-8-encoded files I’m working on that contain text from several languages. These now open instantly, whereas in Emacs 22 there were delays while Lisp code was run to decode the characters and convert them to Emacs’ internal encoding.

This change probably doesn’t seem like all that much of a big deal. “It’s 2009 and Emacs has only just switched to Unicode internally?” you might well ask, “What a bunch of late-comers!” But in fact Emacs was early to the party, not late: the difficulty in switching to Unicode internally is because Emacs has had multilingual character support since August 1993 (in the form of MULE 1.0), when Unicode was at revision 1.1 and arguments about Han unification were still a live issue. At this point you had to have very good foresight or a lot of luck to see that Unicode was going to be the solution to multilingual character encoding. (So all credit to the Windows NT developers.)

From the perspective of 1993, it’s not at all obvious that Unicode is a good idea. Out in the real world, people want to edit text in all sorts of character encodings, and the most natural way to support this is to remember which encoding you are working in, and represent the data natively in that encoding. That way you can be sure that you don’t break anything, and it’s easy to extend to support more encodings. In the Unicode paradigm, you have to translate your input from its encoding into Unicode, using large mapping tables defined for this purpose by the Unicode consortium, and then when you write the data back out again you apply the mapping tables in reverse. If you’re going to adopt this approach, you’re going to need a lot of confidence that the Unicode consortium will get these mapping tables right (in particular, the tables need to be injective: different code points in other encodings need to be mapped to different code points in Unicode). You also need the disk space and memory for the tables, which was more of an issue back then (when people used to criticize Emacs for being a bloated application; nowadays Emacs.app is about the same size as iCal.app).

So I don’t think we can criticize the MULE developers for failing to adopt Unicode, and in any case they were constrained by design decisions made in MULE’s predecessor Nemacs (“Nihongo Emacs”) which was first released in June 1987, long before the publication of Unicode 1.0.0 in October 1991.

However, the very success of MULE (in 1993 Emacs was streets ahead of other editors when it came to editing multilingual text, the main difficulty for the user being acquiring the necessary fonts) ended up leading Emacs down a bit of a blind alley. The difficulties of merging MULE into GNU Emacs (not the least of which were the copyright problems) lasted until 1997, when Emacs 20.1 had multilingual support built in. And it wasn’t all that long afterwards that it became clear to everyone that Unicode was the way to go.

So the MULE internals had to be backed out and replaced with new Unicode internals, without breaking too much Emacs Lisp code that interacted with characters and their encodings. This exacting project was carried out by Ken’ichi Handa of the AIST in Tokyo. Some measure of the complexity and delicacy of this ten-year process can be gathered by browsing Handa’s e-mails to the emacs-devel mailing list. The 2003 thread “eight-bit char handling in emacs-unicode” is a good example: Simon Josefsson reports a bug whose cause is that the Emacs Lisp implementation of RFC 2104 (message authentication using cryptographic hash functions) is using Lisp strings to represent arbitrary sequences of 8-bit values (what Emacs Lisp calls “unibyte strings”). This worked fine in MULE, but because of the way it constructs these strings, in the Unicode branch of Emacs these strings end up consisting of encoded characters instead (what Emacs Lisp calls “multibyte strings”) and the message authentication fails.

Improved font support

From NEWS:

** New font code. Fonts are handled by new code capable of dealing with multiple font backends. This uses the freetype and fontconfig libraries.
*** Added support for local fonts (fonts installed on the machine where Emacs is running).
*** Added support for the Xft library for antialiasing.
*** Added support for the otf library for complex text layout by OpenType fonts.
*** Added support for the m17n library for text shaping.

I noticed that on Windows Emacs seems much better at finding fonts for all the characters it needs to display. For example, Emacs 22.3 on Windows was unable to find a font containing ~ (U+FF5E FULLWIDTH TILDE) even though I have several such fonts (resulting in a “missing glyph” box). Emacs 23.1 displays it fine.

Character entry by Unicode name

Suppose you want to enter a MULTIPLICATION SIGN but don’t know how to type it on your keyboard and you don’t happen to remember its Unicode character code. It used to be the case that you had to hunt around in the Character Palette (on Mac OS X) or the Character Map (on Windows). In Emacs 23.1 you can type C-x 8 RET MULTIPLICATION SIGN RET. (C-x 8 RET is the rather awkward key sequence for the ucs-insert command; I suppose I’ll get used to it eventually.)

Actually the situation is much better than that because there’s completion on the set of Unicode character names, so you can just type C-x 8 RET multip TAB s TAB RET. Another example: you can type C-x 8 RET greek capital TAB to get a list of Greek capital letters. And there’s wildcard completion too, so if you want some kind of arrow but you’re not sure which, you type C-x 8 RET *arrow TAB and get a list of all the arrows in Unicode. I think I will have a HEAVY BLACK-FEATHERED NORTH EAST ARROW, please: ➹.

The describe-char command now shows the Unicode name of the character, and a (customizable) set of character code properties, for example:

Character code properties: customize what to show
  name: HEAVY BLACK-FEATHERED NORTH EAST ARROW
  general-category: So (Symbol, Other)
  canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
  bidi-class: ON (Other Neutrals)

The list-charset-chars command is also useful: M-x list-charset-chars RET unicode-bmp RET gives you a buffer containing the entire Unicode Base Multilingual Plane.

Native word wrapping

Some history. Word wrapping has previously been a bit of a pain in Emacs. It used to be that you were expected to wrap (“fill” in Emacs terminology) all paragraphs by hand using fill-paragraph (M-q). This was obviously a waste of your time, so you could turn on auto-fill-mode to automatically break lines that got too long. But this only works at the end of a paragraph: if you’re editing lines in the middle of a paragraph, auto-fill-mode doesn’t refill the paragraph for you. (There were add-on libraries for doing this, like Per Abrahamsen’s maniac.el—tagline: “fill paragraphs like a maniac”—and Dave Love’s refill.el.)

However, the problem with all these approaches is that when you “fill” a paragraph, you lose information about which line breaks were inserted deliberately by the author (and should be left alone), and which were inserted automatically for the purposes of wrapping to a certain width (and which can be adjusted automatically). For some kinds of editing, you need to preserve this information.

Emacs 22.1 provided longlines-mode, which distinguished between these two types of line (so-called “soft” and “hard” newlines) using text properties. But this had its own problem, because the soft newlines were inserted when a file was opened, and then removed just before the file was saved, meaning that line numbers were inconsistent between the file on disk and the corresponding buffer in Emacs. And this meant that if you wanted to visit a line in the buffer based on a line number in the output of a command-line tool like grep or p4 diff, you were out of luck.

Emacs 23.1 finally gets it right by adding word wrapping to the display engine, under the control of the visual-line-mode command.

Frame transparency

You can change the opacity of your frames by setting the alpha frame parameter to an integer from 0 to 100 (meaning percent opacity). There’s no user interface for this yet (as far as I can tell), but you can evaluate Lisp code like (modify-frame-parameters nil '((alpha 75))).

I used to think that frame transparency was just an amusing thing you could do to show off the rendering capabilities of your windowing system. But on my 12-inch Powerbook, I have found myself really wanting this feature, so that I can simultaneously look at a full-screen document (or video, image, diagram, etc) and compose text in Emacs. On a bigger display I could position the windows side-by-side, but on my laptop there’s not enough room to do that.

Incremental search

When incremental search fails, the search string you entered is coloured to show the initial portion that matched and the remainder that didn’t match. This makes it quicker to recover from typos.

Line numbering

At long last, a proper line numbering feature (with the line numbers in the editor margins, rather than inserted into the buffer): linum-mode.

File completion with wildcards

(New in Emacs 20) You can type C-x C-f *.c TAB and get a list of all your C files to select from. (This probably works best if you’re using a mouse so you can click on the completion you want; typing C-u - C-x o to move to the completion window gets a bit tedious.) You can also type C-x C-f *.c RET to open all your C files in separate buffers.

Edit file names interactively

(New in Emacs 22) Suppose you have a whole bunch of files that need to be renamed: perhaps their files extensions need to be consistently lowercase, or you’d like to suffix each one with the year it was created. This has always been a bit of a tricky problem: you can roll your own solution in shell script, but it always ends up being hairier than you’d like:

for EXT in JPG PNG; do
    for FILE in *.$EXT; do
        mv $FILE $(basename $FILE $EXT)$(echo $EXT | tr A-Z a-z)
    done
done

Or you can use a specialized tool like rename (if you have it). Perhaps something like this:

rename -v 's/\.(JPG|PNG)$/.\L\1/' *.{PNG,JPG}

(That’s untested! I hope you remember to use the -n option whenever you use rename.)

Anyway, in Emacs 23 you can visit a directory in Dired mode (C-x d) and then run the dired-toggle-read-only command (C-x C-q) and then edit the filenames by typing, by using query-replace-regexp or string-rectangle, or however you like. When you’re done, type C-x C-q again and each file whose name you changed gets renamed.

This seems like magic the first time you do it, but it’s really handy not just because you have all the editing features of Emacs at your disposal, but because you get a preview of your changes: you can look over them, correct mistakes, and revert them if you don’t like them (using the revert-buffer command as usual).

(You can also edit some of the other file metadata in the Dired buffer, such as the permissions.)

Replacement with evaluation

Emacs 21 introduced the commands (query-)replace-regexp-eval, which are like (query-)replace-regexp but instead of a replacement string, you provide an arbitrary Lisp expression to generate the replacement text. Emacs 22 obsoleted these functions with three new escape sequences for the replacement string in (query-)replace-regexp:

\? is a placeholder for a string that you enter interactively for each replacement. (It only works for replace-regexp, not query-replace-regexp, where you can always type e to edit the replacement text.) For example, you can add comments to all your uncommented integer variables by typing M-x replace-regexp RET ^\s-*int\s-+\s_+;$ RET \& /* \? */ RET.

\# expands to the count of replacements already made by the replacement command. So you can number the lines in your file (starting at zero) by typing C-M-% ^ RET \# SPC RET !. Not all that useful by itself, but very useful in combination with:

\, evaluates the following Lisp expression, converts the result to a string and includes it in the replacement text.) Back-references like \1 can appear in the Lisp expression (where they get turned into (match-string 1) etc.), as can escape sequences like \# and \?. For example, to number the lines in your file starting at 1, type C-M-% ^ RET \,(1+ \#) SPC RET !. To multiply all the integers in your file by 5, type C-M-% [0-9]+ RET \,(* (string-to-number \&) 5) RET !. (A slight shame about the need for string-to-number there.)

\, is a feature that I use almost every day.

Emacs Lisp as a scripting language

(New in Emacs 22) You can put #!/usr/bin/emacs --script at the top of your Emacs Lisp files and run them from the shell.