Ligatures in programming fonts

Lig­a­tures in pro­gram­ming fonts—a mis­guided trend I was hop­ing would col­lapse un­der its own il­logic. But it per­sists. Let me save you some time—

Lig­a­tures in pro­gram­ming fonts are a ter­ri­ble idea.

And not be­cause I’m a purist or a grump. (Some days, but not to­day.) Pro­gram­ming code has spe­cial se­man­tic con­sid­er­a­tions. Lig­a­tures in pro­gram­ming fonts are likely to ei­ther mis­rep­re­sent the mean­ing of the code, or cause mis­cues among read­ers. So in the end, even if they’re cute, the risk of er­ror isn’t worth it.

First, what are lig­a­tures? Lig­a­tures are spe­cial char­ac­ters in a font that com­bine two (or more) trou­ble­some char­ac­ters into one. For in­stance, in ser­ifed text faces, the low­er­case f of­ten col­lides with the low­er­case i and l. To fix this, the fi and fl are of­ten com­bined into a sin­gle shape (what pros would call a glyph).

fi fj fl ffi gg gyok
fi fj fl ffi gg gywrong
fi fj fl ffi gg gyright

In this type de­signer’s opin­ion, a good lig­a­ture doesn’t draw at­ten­tion to it­self: it sim­ply re­solves what­ever col­li­sion would’ve hap­pened. Ide­ally, you don’t even no­tice it’s there. Con­versely, this is why I loathe the Th lig­a­ture that is the de­fault in many Adobe fonts: it re­solves noth­ing, and al­ways draws at­ten­tion to itself.

Lig­a­tures in pro­gram­ming fonts fol­low a sim­i­lar idea. But in­stead of fix­ing the odd trou­ble­some com­bi­na­tion, well-in­ten­tioned am­a­teur lig­a­tur­ists are adding dozens of new & strange lig­a­tures. For in­stance, these come from Fira Code, a heav­ily lig­a­tured spin­off of the open-source Fira Mono.

So what’s the prob­lem with pro­gram­ming ligatures?

  1. They con­tra­dict Uni­code. Uni­code is a stan­dard­ized sys­tem—used by all con­tem­po­rary fonts—that iden­ti­fies each char­ac­ter uniquely. This way, soft­ware pro­grams don’t have to worry that things like the fi lig­a­ture might be stashed in some spe­cial place in the font. In­stead, Uni­code des­ig­nates a unique name and num­ber for each char­ac­ter, known as a code point. If you have an fi lig­a­ture in your font, you iden­tify it with its des­ig­nated Uni­code code point, which is 0xFB01.

    In ad­di­tion to al­pha­betic char­ac­ters, Uni­code as­signs code points to hun­dreds of sym­bols. Many of the pro­gram­ming lig­a­tures shown above are vi­su­ally sim­i­lar to ex­ist­ing Uni­code sym­bols. So in a source file that uses Uni­code char­ac­ters, how would you know if you’re look­ing at a => lig­a­ture that’s shaped like ⇒ vs. Uni­code char­ac­ter 0x21D2, which also looks like ⇒? The lig­a­ture in­tro­duces an am­bi­gu­ity that wasn’t there before.

  2. They’re guar­an­teed to be wrong some­times. There are a lot of ways for a given se­quence of char­ac­ters, like=>”, to end up in a source file. De­pend­ing on con­text, it doesn’t al­ways mean the same thing.

    The prob­lem is that lig­a­ture sub­sti­tu­tion isdumb” in the sense that it only con­sid­ers whether cer­tain char­ac­ters ap­pear in a cer­tain or­der. It’s not aware of the se­man­tic con­text. There­fore, any global lig­a­ture sub­sti­tu­tion is guar­an­teed to be se­man­ti­cally wrong part of the time.

When we’re us­ing a ser­ifed text font in or­di­nary body text, we don’t have the same con­sid­er­a­tions. An fi lig­a­ture al­ways means f fol­lowed by i. In that case, lig­a­ture sub­sti­tu­tion that ig­nores con­text doesn’t change the meaning.

Still, some ty­po­graphic trans­for­ma­tions in body text can be se­man­ti­cally wrong. For in­stance, foot and inch marks are of­ten typed with the same char­ac­ters as quo­ta­tion marks. (See straight and curly quotes.) But whereas quo­ta­tion marks want to be curly, foot and inch marks want to be straight (or slanted slightly to the up­per right). So if we ap­ply au­to­matic smart (aka curly) quotes, we have to be care­ful not to cap­ture foot and inch marks in the transformation.

Does that mean pro­gram­mers can never have nice things? It’s to­tally fine to re­design in­di­vid­ual char­ac­ters to dis­tin­guish them from oth­ers. For in­stance, in Trip­li­cate, I in­clude a spe­cialCode” vari­ant that in­cludes re­designed ver­sions of cer­tain char­ac­ters that are eas­ily confused.

`$te_fl{1234*567~890} Reg­u­lar
`$te_fl{1234*567~890} Code

But in this case, the point is dis­am­bigua­tion: we don’t want the low­er­case l to look like the digit 1, nor the zero to look like a cap O. Whereas lig­a­tures are go­ing the op­po­site di­rec­tion: mak­ing dis­tinct char­ac­ters ap­pear to be others.

Bot­tom line: this isn’t a mat­ter of taste. In pro­gram­ming code, every char­ac­ter in the file has a spe­cial se­man­tic role to play. There­fore, any kind ofpret­ti­fy­ing” that makes one char­ac­ter look like an­other—in­clud­ing lig­a­tures—leads to a swamp of de­spair. If you don’t be­lieve me, try it for 10 or 15 years.

—Matthew But­t­er­ick
29 March 2019