User interface asynchrony


Suppose we’re making a video game on a handheld console, and we want to implement ad-hoc wireless networked play over a short-distance wireless standard like Bluetooth or IEEE 802.11. We’re going to need some kind of user interface for match-making, for offering games to nearby players, finding them and joining them. What does this user interface look like internally and how should it be programmed?

A common approach is to give players a choice: they can offer a game on the local wireless network, or scan the network to find games on offer and join them. This is convenient because the offering player’s console can become the “parent” or “master” of the networked game, and the other consoles become “children” or “slaves”. (In this game programming technique, the parent console performs adjudication of all disputes, such as which level to play on, what random decisions are made, what happens if two players try to move into the same space, and so on.)

In fact, it’s convenient to start scanning for games on offer while the player is deciding whether or not to offer a game; that way the player can see if someone else offered a game first. (Otherwise there needs to be some outside-of-game negotiation among the players over who’s going to offer the game.)

These considerations result in a match-making user interface state diagram looking like this:


Each box in the diagram is a user interface state, implying particular details to display. For example, in the “Scanning” state we display a menu of games on offer, plus the option to offer a game ourselves; in the “Offering” state we show the list of players who have joined the game so far, plus an option to quit.

(There only really needs to be one “Error” state, but it makes the diagram look more symmetrical to draw it like this.)

The transitions with black labels are synchronous with respect to this state diagram: they happen because the player selects an option. The transitions with red labels are asynchronous: they happen because of some event on the network (or in the internals of the networking implementation), and interrupt whatever is going on in that state.

So much for the user interface. What about the states of the internal networking implemention? Well, the details will vary from one console to another but this diagram should give the impression of a typical networking implementation:


This state machine has essentially the same structure as the user interface, but there are more states, partly because there are networking operations for which there’s no user interface (for example, finding a channel to broadcast on) and partly because operations take an unknown amount of time to complete so you need new states to record the fact that you’re waiting. For example, when a child issues a request to join a game, there’s no telling how long it will be until the parent replies (if it ever does), so we enter a new interface state “Joining” to wait for a response, or possibly to time out if there is no response.

From the networking implementation’s point of view, networking API calls and network events are synchronous, labelled in black, while player options are asynchronous, labelled in red.

Because the two diagrams have the same structure, we might think that a sensible approach would be to merge the two state machines, deducing the user interface state from the networking state.

The trouble with that approach is that networking operations take time to complete. Booting the wireless hardware may take a whole second or even longer. Scanning a single wireless channel to see if there are any games on offer may take a good fraction of a second, and there may be lots of channels to scan. While any of these networking operations is underway, a player may make a decision. If we’ve merged the state machines, then the player has to wait for the current networking operation to complete before the result of their choice can be displayed.

Making the player wait for some networking operation to complete seems rather rude. If the player wants to make a decision, shouldn’t they be able to do so immediately? For maximum responsiveness, let’s decouple the two state machines as much as we can:


Since we want the player to be in control, we’ll let the user interface state determine a target state in the networking implementation (red arrows in the diagram). The networking implementation always tries to reach the current target state. For example, suppose the current user interface state is “Offering” and the networking state is “Offering a game”, and the player presses the cancel button. The new user interface state is “Scanning” and therefore the target state is “Found a game”, and the networking implementation must take four steps to get there. After taking two of these four steps the implementation has reached the “Idle” state, and perhaps now the player presses the cancel button again to return to the “Rest of game”. The new target state is “Off”, so there is no need to take the third or fourth steps: instead, the implementation takes the first of a new set of steps heading for the “Off” state.

The networking implementation state, by contrast, determines a set of transitions in the user interface (blue arrows), which are followed if possible, but ignored otherwise. For example, suppose the networking implementation has reached the state “Ready”. If the user interface is in the state “Offering”, then it transitions to “Ready” immediately. But if the user interface is elsewhere (for example, the player has cancelled and is now in the “Scanning” state), then nothing happens.

We could program the implementation to determine the steps to take to reach the target state (for example, by searching the graph of states using Dijkstra’s algorithm), but since the graph is fixed we could just write out a table of the best path from any state to any other.

There’s a lesson here that ought to apply to any user interface design, not just video games. In order to respond to user actions in a timely fashion, we have to decouple the user interface from the implementation it controls. This applies just as much to, say, an “Open File…” dialog, as it does to a ad-hoc wireless match-making interface.

However, almost all personal computer applications have a user interface that’s synchronous with the implementation: after selecting “Open File…” typical applications stop all user interaction until the files have been listed and the dialog is ready. On my Macintosh running OS X, it can sometimes take five seconds to display the “Open File…” dialog. If I change my mind while this is happening, I have to wait until the dialog finishes appearing before I can press “Cancel”, which is rude, and wastes my time. (I think this is because the resources for the application’s “Open File…” dialog have to be loaded from disk before the dialog can be displayed, but this is no better an excuse than it would be for me to say, “well, powering up the networking hardware takes several seconds, so you’ll just have to wait”. The OS X application could display a generic “loading a dialog” dialog and let me cancel it before it’s loaded.)

I should add that it’s not necessary (or at least, not always necessary) to use threads to implement the asynchrony, as long as you can do all the time-consuming work via a non-blocking API like Unix’s select and poll. The match-making interface described above was implemented without threads, for example.