Operator precedence by textual substitution

A technique from early Fortran

Operator precedence is a traditional feature of mathematical notation where, absent explicit parentheses, some operators bind tighter than others ("have higher precedence"). For example, a + b × c would be interpreted as a + (b × c), not as (a + b) × c, because multiplication has higher precedence than addition.

This is a bit annoying to implement in programming languages. As a result, some early languages just didn't do it. Lisp (developed in the late 1950s) sidesteps the issue by not having infix operator notation in the first place, instead writing the above expression in fully parenthesized prefix notation, (+ a (* b c)). An early compiler from IBM, the Internal Translator (IT) (1956) [pdf] had more traditional infix notation, but no operator precedence: all operators had the same precedence, and were right-associative (see pp. 1.19–1.20). Users were expected to add parentheses where needed.

Alas for this simple solution, in a 1962 paper, "A History of Writing Compilers" (Computers and Automation, Dec. 1962 [pdf], pp. 8–18), Donald Knuth claims that "The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler".

Fortran (also from IBM, released 1957) gave in and did implement operator precedence. But it did so using a method that seems to me equal parts ugly hack and clever solution, a raw textual search-and-replace scheme. As Knuth summarizes:

An ingenious idea used in the first FORTRAN compiler was to surround binary operators with peculiar-looking parentheses:

+ and - were replaced by )))+((( and )))-(((
* and / were replaced by ))*(( and ))/((
** was replaced by )**(

and then an extra ((( at the left and ))) at the right were tacked on. The resulting formula is properly parenthesized, believe it or not. For example, if we consider "(X + Y) + W/Z," we obtain

((((X))) + (((Y)))) + (((W))/((Z))).

This is admittedly highly redundant, but extra parentheses need not affect the resulting machine language code. After the above replacements are made, a parenthesis-level method can be applied to the resulting formulas.

I had to work out a few examples to convince myself that this scheme really works (even Knuth seems a bit surprised).

Update with a bug!: Joomy Korkut pointed out to me that the rules as stated above (and my original demo below) aren't sufficient to correctly translate expressions where existing parentheses in the input are meaningful. Knuth's example does have parentheses in the input, and translates correctly, but the parentheses don't make a difference in this case: (X + Y) + W/Z is equivalent to X + Y + W/Z. However, consider (A+B)*C: it will get parenthesized to ((((A)))+(((B)))*((C))), which changes the order of evaluation.

Update with a fix: The fix is fortunately straightforward: add even more parentheses! To ensure the original parenthesized expressions continue to "contain" all their original contents without anything escaping, add another replacement rule: replace each '(' in the input with '((((', and each ')' with '))))' (similar to the '(((' and ')))' added around the entire expression). This additional substitution rule adapted from a 2000 paper by David Padua, The Fortran I Compiler [pdf], via Rijnard van Tonder via Wikipedia.

Try it out below!

Input expression:


The original scheme has three precedence levels. Lowest precedence are addition (+) and subtraction (-); intermediate precedence are multiplication (*) and division (/); and highest precedence is exponentiation (**). But the same pattern can be extended to an arbitrary number of levels by just adding more parentheses. The more outward-facing parentheses you add around an operator, the lower its precedence.

Nowadays, there are standard ways to handle operator precedence, such as by stratifying a grammar or making use of the precedence declarations that some parser generators offer. But if you find yourself wanting to implement an expression parser and don't want to bother handling precedence, you can do some quick-and-dirty string substitution instead! Since this is purely a textual preprocessing step, the rest of the language implementation doesn't need to know about operator precedence.