garethrees.org
http://garethrees.org/
enTue, 04 Sep 2018 09:24:58 +0000gdr@garethrees.org (Gareth Rees)gdr@garethrees.org (Gareth Rees)1440Stack nomenclature
http://garethrees.org/2018/03/12/stack/
<p style=width:300px class="box sidebar"><img src="http://garethrees.org/2018/03/12/stack/plate-dispenser.jpg"> Lakeside 992 stainless steel heated two stack plate dispenser. With two stacks you can implement a queue with amortized constant time operations.
<p>As long-term readers of this blog will know, I think that we would benefit from better names for a number of computing concepts. I <a href="http://garethrees.org/2013/06/11/tabular/">previously criticized</a> the obfuscatory name ‘dynamic programming’ for the technique of solving an instance of a problem by making a table of solutions to smaller instances of the same problem. In this post, I’m going to criticize the names ‘top’ and ‘bottom’ for the two ends of a stack.
<p>The trouble with these names is that they are ambiguous. If a stack is regarded as growing upwards, like the stack of plates in a plate dispenser (pictured), then the ‘top’ is the end with the youngest items, where items are pushed and popped, and the ‘bottom’ is the other end, with the oldest items. But if a stack is regarded as growing downwards, as for example the control stack does on essentially every computer architecture, then the ‘bottom’ of the stack would seem to be the lowest-numbered address, and that is the end with the youngest items.
<p>In the <a href="https://www.ravenbrook.com/project/mps/">Memory Pool System</a>, we coped for years with confusion arising from these names. The MPS normally considers a block of memory as being organized in address order, but this means that when the control stack is considered as a block of memory, then the lowest-numbered address is the ‘bottom’ of the block, but this is the logical ‘top’ of the stack. We found that the stack-handling code became much easier to follow when we adopted the names <a href="http://www.memorymanagement.org/glossary/h.html#term-hot-end"><em>hot end</em></a> for the end of the stack with the youngest items, which changes quickly, and <a href="http://www.memorymanagement.org/glossary/c.html#term-cold-end"><em>cold end</em></a> for the end with the oldest items, which changes slowly.
<p>I don't insist on these particular terms—<em>young</em> and <em>old</em>, or <em>accessible</em> and <em>inaccessible</em> would likely do as well, if you don't like <em>hot</em> and <em>cold</em>. The point is that in order to be unambiguous the names must be based on something that’s essential to the nature of the data structure rather than an arbitrary or accidental aspect of a pedagogical metaphor.
<p class=centred>✴
<p>Where does stack nomenclature come from? The concepts involved in a control stack were invented by Alan Turing and described in a 1945 report, described in Carpenter and Doran (1977). I’ve glossed Turing’s descriptions with the modern names for the concepts.<a id=noteref-1 href="http://garethrees.org/2018/03/12/stack/#note-1"><sup>1</sup></a>
<blockquote>When we wish to start on a subsidiary operation [<i>subroutine</i>] we need only make a note of where we left off the major operation [<i>the return address</i>] and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. Each subsidiary operation can end with instructions for this recovery of the note. How is the burying and disinterring of the note to be done? There are of course many ways. One is to keep a list of these notes in one or more standard size delay lines, with the most recent last. The position of the most recent of these will be kept in a fixed TS [<i>temporary store, that is, register</i>], and this reference [<i>the stack pointer</i>] will be modified every time a subsidiary is started or finished. The burying and disinterring processes are fairly elaborate, but there is fortunately no need to repeat the instructions involved, each time, the burying being done through a standard instruction table BURY [<i>jump to subroutine</i>], and the disinterring by the table UNBURY [<i>return</i>].</blockquote>
<p>Knuth (1998) describes the history of the term ‘stack’:
<blockquote>the name “stack” stems from IPL [<a href="https://en.wikipedia.org/wiki/Information_Processing_Language">Information Processing Language</a>] terminology (although “pushdown list” was the more official IPL wording), and it was also independently introduced by E. W. Dijkstra</blockquote>
<p>I can’t find any of the IPL reports online, but here’s Dijkstra (1960):
<blockquote>The basic concept of the method is the so-called stack. One uses a stack for storing a sequence of information units that increases and decreases at one end only, i.e. when a unit of information that is no longer of interest is removed from the stack, then this is always the most recently added unit still present in the stack.</blockquote>
<p>Dijkstra used the name “top” for the hot end of the stack:
<blockquote>there was no need to store the stack in a random access memory, for our interest was at all times restricted to the youngest element in the stack. In principle we could have used a small magnetic tape that would have to move one place forward in writting and one place backward in reading. […] Inside the subroutine we store the most anonymous intermediate results in the "top" of the stack in just the same way.</blockquote>
<h2 id=section-A>References</h2>
<ul>
<li>B. E. Carpenter and R. W. Doran (1977). ‘<a href="https://academic.oup.com/comjnl/article-pdf/20/3/269/2256995/200269.pdf">The other Turing machine</a>’. <cite>The Computer Journal</cite> 20:3 pp. 269–79.
<li>Edsger W. Dijkstra (1960). ‘Recursive Programming’. <cite>Numerische Mathematik</cite> 2 pp. 312–8.
<li>Donald E. Knuth (1998). <cite>The Art of Computer Programming</cite> (3rd edition). Addison-Wesley.
<li>Alan Turing (1945). ‘Proposals for development in the mathematics division of an automatic computing engine (ACE)’. Report E882, Executive Committee, National Physical Laboratory.
</ul>
<hr>
<ol>
<li id=note-1><p><a href="http://garethrees.org/2018/03/12/stack/#noteref-1">↩</a> The story I was taught about the development of the subroutine is that the first implementation was by Maurice Wilkes and David Wheeler during the development of the <a href="https://en.wikipedia.org/wiki/Electronic_delay_storage_automatic_calculator">EDSAC</a> in the late 1940s. But Turing’s report suggests that the key ideas were well known some time before they were implemented: Turing clearly describes subroutines, the call stack, the stack pointer, and dedicated jump-to-subroutine and return instructions. He also goes on to describe the process of <a href="https://en.wikipedia.org/wiki/Linker_(computing)">link editing</a>.</p>
</ol>
Mon, 12 Mar 2018 00:00:00 +0000http://garethrees.org/2018/03/12/stack/Big-O proofs
http://garethrees.org/2018/03/11/proofs/
<p>This <a href="https://twitter.com/EmilyKager/status/967165799071809536">tweet from Emily Kager</a> appeared in my timeline a couple of weeks ago:
<p class=centred><a href="https://twitter.com/EmilyKager/status/967165799071809536"><img alt="My sister (freshman in college) texted me for Big-O proof help for her Algorithms class saying she’s scared CS isn’t for her because she doesn’t love this class. RT if you don’t think about Big-O proofs at your CS job." src="http://garethrees.org/2018/03/11/proofs/kager.png" width=578></a>
<p>I have complex feelings about this. On the one hand, I sympathize with Ms Kager’s sister—it’s horrible to be struggling in class—and it is true that most programmers never do a big-O proof again after they graduate, and many never think about asymptotic complexity at all. But on the other hand, someone who can get to grips with big-O proofs will be a better programmer for it. (I’ll try to explain why below.)
<p>This is all going to be unhelpful for Ms Kager’s sister: no-one ever loved a class just because someone told them it was useful. But I do have some advice, which is that it is never too late to learn something. The reason someone is struggling with analysis of algorithms is probably because they are missing some crucial prerequisite: maybe it’s discrete mathematics, or maybe it’s a lack of hands-on experience with algorithms, or maybe something else. But it’s always possible to come back to the subject later in your career: when something about it grabs your interest the textbooks will still be there.
<p class=centred>✴
<p>So, of course I have had never had to <em>write down</em> a big-O proof at work, at least not in the kind of formal style that I might have used in my undergraduate algorithms homework. Nonetheless, I think about the asymptotic runtime of the code I’m working on <em>all the time</em>, and this means doing informal analysis of algorithms in my head, and the reason that I am able to quickly and reliably do this kind of analysis in my head is that I practised writing all those formal proofs. I can be confident that my informal approach gets the right answer because I know that I could turn my quick rules of thumb into detailed proofs if I ever needed to convince a beginner or a skeptic. Perhaps it would take me a while, or I’d have to refer to a textbook for the difficult cases, but I’d get there in the end.
<p>Expertise in some area always comes with a penumbra, that is, a wider area in which you know a lot (but not as much as a real expert) and a still wider area in which you know a little (and could pick up more if you needed to). An expert in everyday analysis of algorithms almost certainly has some ability at formal proofs too.
<p>And every once in a while, you get lucky and happen upon a problem where the issue is more complex than the usual logarithmic vs linear, or linear vs quadratic, and then you have some trickier analysis to do, and are glad that you have the skills to do it.
<p class=centred>✴
<p>We automate tasks using computer programs in order to save people time and money and effort. Or to give people capabilities that would otherwise require too much of these resources. So the need to check that you are using resources efficiently is ever-present in software development. Normally this does not require particularly sophisticated mathematical techniques: the vast majority of tasks fall into one of three groups:
<ol>
<li><p>If it’s a search task (find an item matching some criteria in a collection), then it ought to run in time that’s logarithmic in the size of the collection, that is, in \(O(\log n)\) time.
<li><p>If it’s a data-processing task (read some data and compute some results), then it ought to run in time that’s linear in the size of the data, that is, in \(O(n)\) time.
<li><p>If it’s a sorting task (read some data and collate it somehow), then it ought to run in \(O(n \log n)\) time.
</ol>
<p>In each case you’re looking for particular kinds of failure. If it’s a search task, are you accidentally looking at each item in the collection, making the runtime \(Ω(n)\)? If it’s a data processing task, are you accidentally doing something taking \(Ω(n)\) time for each item, making the overall runtime \(Ω(n^2)\)? These complexity problems are ubiquitous. In particular, programs whose runtime is \(Ω(n^2)\) when it should be \(O(n\log n)\) or \(O(n)\) (that is, they are “accidentally quadratic”) are <a href="https://accidentallyquadratic.tumblr.com/">common enough to sustain a blog</a>. Here are a few examples that I’ve encountered at Code Review:
<p style=width:400px class="box sidebar"><img src="http://garethrees.org/2018/03/11/proofs/quadratic.png"> Empirical demonstration of quadratic runtime
<ul>
<li>An <a href="https://codereview.stackexchange.com/a/188784/11728">Eulerian cycle algorithm</a> that’s quadratic because it searches the whole cycle each time it needs to find a visited node which still has an unused edge.
<li>A <a href="https://codereview.stackexchange.com/a/188617/11728">random sample algorithm</a> that’s quadratic because it deletes each sample from a list.
<li>A <a href="https://codereview.stackexchange.com/a/188447/11728">local maximization algorithm</a> that’s quadratic because on each iteration it finds the local items by slicing a list.
<li>A <a href="https://codereview.stackexchange.com/a/187368/11728">merge algorithm</a> that’s quadratic because for each emitted item it looks at the heads of all the inputs to find the minimum.
<li>A <a href="https://codereview.stackexchange.com/a/186390/11728">binary search algorithm</a> that’s quadratic because it computes the wrong location of the midpoint of the list.
<li>A <a href="https://codereview.stackexchange.com/a/186355/11728">minimum spanning tree algorithm</a> that’s quadratic because it looks at all the edges on the boundary of the growing tree to find the one with the smallest cost.
<li>A <a href="https://codereview.stackexchange.com/a/171018/11728">multiple string search algorithm</a> that’s quadratic because it searches separately for each string.
<li>A <a href="https://codereview.stackexchange.com/a/169484/11728">common ancestor algorithm</a> that’s quadratic because it visits the whole subtree rooted at a node before deciding which way to descend.
<li>A <a href="https://codereview.stackexchange.com/a/153384/11728">breadth-first search algorithm</a> that’s quadratic because it implements the queue of search nodes using a list.
<li>A <a href="https://codereview.stackexchange.com/a/150226/11728">counting algorithm</a> that’s quadratic because it searches a list by examining all the items.
<li>A <a href="https://codereview.stackexchange.com/a/87537/11728">memory allocation algorithm</a> that’s quadratic because it searches the free list to find the location of the freed block.
<li>A <a href="https://codereview.stackexchange.com/a/86043/11728">solution to the Josephus problem</a> that’s quadratic because it deletes each victim from a list.
</ul>
<p>The idea of giving all these links is not to shame the programmers who accidentally wrote quadratic programs, but to try to give some idea of how common this pitfall is, and how many different kinds of algorithm might be affected. Modern programming languages like Python make it possible to write complex operations tersely, but the flipside of this is that you have to pay attention to the performance of the operations you are using. It’s easy to write <code>if item in collection:</code> and fail to notice that <code>collection</code> is a list, and so this will take time that’s proportional to the length of the list.
Sun, 11 Mar 2018 00:00:00 +0000http://garethrees.org/2018/03/11/proofs/