---
title: "Typography and R"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Typography and R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r}
#| include: false
knitr::opts_chunk$set(
  dev = "ragg_png",
  dpi = 144,
  collapse = TRUE,
  comment = "#>",
  fig.asp = NULL,
  fig.height = 4.326,
  fig.width = 7,
  eval = FALSE
)
```
This text is meant as an introduction to the subject of typography, both in general but more importantly as it relates to R. If you are more interested in how to use systemfonts to use fonts installed on your computer during plotting then please see [the package introduction vignette](systemfonts.html).
> The code examples in this vignette is based on fonts that may not be available on the users machine. As such, you should not expect the provided examples to execute locally out of the box.
## Digital typography
Many books could be, and have been, written about the subject of typography. This blog post is not meant to be an exhaustive deep dive into all areas of this vast subject. Rather, it is meant to give you just enough understanding of core concepts and terminology to appreciate how it all plays into using fonts in R.
### Typeface or font?
There is a good chance that you, like 99% of world, use "font" as the term describing "the look" of the letters you type. You may, perhaps, have heard the term "typeface" as well and thought it synonymous. This is in fact slightly wrong, and a great deal of typography snobbery has been dealt out on that account (much like the distinction between packages and libraries in R). It is a rather inconsequential mix-up for the most part, especially because 99% of the population wouldn't bat an eye if you use them interchangeably. However, the distinction between the two serves as a good starting point to talk about other terms in digital typography as well as the nature of font files, so let's dive in.
When most people use the word "font" or "font family", what they are actually describing is a typeface. A **typeface** is a style of lettering that forms a cohesive whole. As an example, consider the well-known "Helvetica" typeface. This name embraces many different weights (bold, normal, light) as well as slanted (italic) and upright. However, all of these variations are all as much Helvetica as the others - they are all part of the same typeface.
A **font** is a subset of a typeface, describing a particular variation of the typeface, i.e. the combination of weight, width, and slant that comes together to describe the specific subset of a typeface that is used. We typically give a specific combination of these features a name, like "bold" or "medium" or "italic", which we call the **font style**[^1]. In other words, a font is a particularly style within a typeface.
[^1]: Be aware that the style name is at the discretion of the developer of the typeface. It is very common to see discrepancies between the style name and e.g. the weight reported by the font (e.g. Avenir Next Ultra Light is a *thin* weight font).
```{r}
#| echo: false
#| fig.cap: "Different fonts from the Avenir Next typeface"
#| fig.height: 3.5
st <- marquee::classic_style(30, body_font = "Avenir Next", lineheight = 1, margin = marquee::trbl(4)) |>
  marquee::modify_style("anul", weight = "thin") |>
  marquee::modify_style("anub", weight = "ultrabold")
text <- paste(
  "{.anul Avenir Next Ultra Light}  ",
  "{.anul *Avenir Next Ultra Light Italic*}  ",
  "Avenir Next  ",
  "*Avenir Next Italic*  ",
  "{.anub Avenir Next Ultra Bold}  ",
  "{.anub *Avenir Next Ultra Bold Italic*}  ",
  sep = "\n"
)
grid::grid.draw(
  marquee::marquee_grob(text, st)
)
```
{.fig}
In the rest of this document we will use the terms typeface and font with the meaning described above.
### Font files
Next, we need to talk about how typefaces are represented for use by computers. Font files record information on how to draw the individual glyphs (characters), but also instructions about how to draw sequences of glyphs like distance adjustments (kerning) and substitution rules (ligatures). Font files typically encode a single font but can encode a full typeface:
``` r
typefaces <- systemfonts::system_fonts()[, c("path", "index", "family", "style")]
# Full typeface in one file
typefaces[typefaces$family == "Helvetica", ]
#> # A tibble: 6 × 4
#>   path                                index family    style
#>                                        
#> 1 /System/Library/Fonts/Helvetica.ttc     2 Helvetica Oblique
#> 2 /System/Library/Fonts/Helvetica.ttc     4 Helvetica Light
#> 3 /System/Library/Fonts/Helvetica.ttc     5 Helvetica Light Oblique
#> 4 /System/Library/Fonts/Helvetica.ttc     1 Helvetica Bold
#> 5 /System/Library/Fonts/Helvetica.ttc     3 Helvetica Bold Oblique
#> 6 /System/Library/Fonts/Helvetica.ttc     0 Helvetica Regular
# One font per font file
typefaces[typefaces$family == "Arial", ]
#> # A tibble: 4 × 4
#>   path                                                     index family style
#>                                                          
#> 1 /System/Library/Fonts/Supplemental/Arial.ttf                 0 Arial  Regular
#> 2 /System/Library/Fonts/Supplemental/Arial Bold.ttf            0 Arial  Bold
#> 3 /System/Library/Fonts/Supplemental/Arial Bold Italic.ttf     0 Arial  Bold Italic
#> 4 /System/Library/Fonts/Supplemental/Arial Italic.ttf          0 Arial  Italic
```
Here, each row is a font, with **family** giving the name of the typeface, and **style** the font style.
It took a considerable number of tries before the world managed to nail the digital representation of fonts, leading to a proliferation of file types. As an R user, there are three formats that are particularly important:
-   **TrueType** (ttf/ttc). Truetype is the baseline format that all modern formats stand on top of. It was developed by Apple in the '80s and became popular due to its great balance between size and quality. Fonts can be encoded, either as scalable paths, or as bitmaps of various sizes, the former generally being preferred as it allows for seamless scaling and small file size at the same time.
-   **OpenType** (otf/otc). OpenType was created by Microsoft and Adobe to improve upon TrueType. While TrueType was a great success, the number of glyphs it could contain was limited and so was its support for selecting different features during [shaping](#text-shaping). OpenType resolved these issues, so if you want access to advanced typography features you'll need a font in OpenType format.
-   **Web Open Font Format** (woff/woff2). TrueType and OpenType tend to create large files. Since a large percentage of the text consumed today is delivered over the internet this creates a problem. WOFF resolves this problem by acting as a compression wrapper around TrueType/OpenType to reduce file sizes while also limiting the number of advanced features provided to those relevant to web fonts. The woff2 format is basically identical to woff except it uses the more efficient [brotli](https://en.wikipedia.org/wiki/Brotli) compression algorithm. WOFF was designed specifically to be delivered over the internet and support is still a bit limited outside of browsers.
While we have mainly talked about font files as containers for the shape of glyphs, they also carries a lot of other information needed for rendering text in a way pleasant for reading. Font level information records a lot of stylistic information about typeface/font, statistics on the number of glyphs and how many different mappings between character encodings and glyphs it contains, and overall sizing information such as the maximum descend of the font, the position of an underline relative to the baseline etc. systemfonts provides a convenient way to access this data from R:
``` r
systemfonts::font_info(family = "Helvetica")
#> # A tibble: 1 × 24
#>   path  index family style italic bold  monospace weight width kerning color scalable vertical n_glyphs n_sizes
#>                                     
#> 1 /Sys…     0 Helve… Regu… FALSE  FALSE FALSE     normal norm… FALSE   FALSE TRUE     FALSE        2252       0
#> # ℹ 9 more variables: n_charmaps , bbox , max_ascend , max_descend ,
#> #   max_advance_width , max_advance_height , lineheight , underline_pos ,
#> #   underline_size 
```
Further, for each glyph there is a range of information in addition to its shape:
``` r
systemfonts::glyph_info("j", family = "Helvetica", size = 30)
#> # A tibble: 1 × 9
#>   glyph index width height x_bearing y_bearing x_advance y_advance bbox
#>                            
#> 1 j        77     6     27        -1        21         7         0 
```
These terms are more easily understood with a diagram:
```{r}
#| echo: false
systemfonts::plot_glyph_stats("j", family = "Helvetica", size = 30)
```
{.fig}
The `x_advance` in particular is important when rendering text because it tells you how far to move to the right before rendering the next glyph (ignoring for a bit the concept of kerning)
### Text shaping {#text-shaping}
The next important concept to understand is **text shaping**, which, in the simplest of terms, is to convert a succession of characters into a sequence of glyphs along with their locations. Important here is the distinction between **characters**, the things you think of as letters, and **glyphs**, which is what the font will draw. For example, think of the character "f", which is often tricky to draw because the "hook" of the f can interfere with other characters. To solve this problem, many typefaces include **ligatures**, like "fi", which are used for specific pairs of characters. Ligatures are extremely important for languages like Arabic.
A few of the challenges of text shaping include kerning, bidirectional text, and font substitution. **Kerning** is the adjustment of distance between specific pairs of characters. For example, you can put "VM" a little closer together but "OO" needs to be a little further apart. Kerning is an integral part of all modern text rendering and you will almost solemnly notice it when it is absent (or worse, [wrongly applied](https://www.creativebloq.com/design/fonts-typography/the-popes-tomb-was-not-on-my-design-debate-bingo-card-for-2025)).
Not every language writes text in the same direction, but regardless of your native script, you are likely to use arabic numerals which are always written left-to-right. This gives rise to the challenge of **bidirectional** (or bidi) text, which mixes text flowing in different directions. This imposes a whole new range of challenges!
Finally, you might request a character that a font doesn't contain. One way to deal with this is to render a glyph representing a missing glyph, usually an empty box or a question mark. But it's typically more useful to use the correct glyph from a different font. This is called **font fallback** and happens all the time for emojis, but can also happen when you suddenly change script without bothering to pick a new font. Font fallback is an imprecise science, typically relying on an operating system font that has a very large number of characters, but might look very different from your existing font.
Once you have determined the order and location of glyphs, you are still not done. Text often needs to be wrapped to fit into a specific width, it may need a specific justification, perhaps, indentation or tracking must be applied, etc. Thankfully, all of this is generally a matter of (often gnarly) math that you just have to get right. That is, all except text wrapping which should happen at the right boundaries, and may need to break up a word and inserting a hyphen etc.
Like I said, the pit of despair is bottomless...
## Font handling in R {#font-handling-in-r}
You hopefully arrive at this section with an appreciation of the horrors that goes into rendering text. If not, maybe this [blog post](https://faultlore.com/blah/text-hates-you/) will convince you.
Are you still here? Good.
Now that you understand the basics of what goes into handling fonts and text, we can now discuss the details of fonts in R specifically.
### Fonts and text from a user perspective {#fonts-and-text-from-a-user-perspective}
The users perception of working with fonts in R is largely shaped by plots. This means using either base or grid graphics or one of the packages that have been build on top of it, like [ggplot2](https://ggplot2.tidyverse.org). While the choice of tool will affect *where* you specify the font to use, they generally agree on how to specify it.
+-------------------------------------------------------------------------------------------------------------+--------------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
| Graphic system                                                                                              | Argument     |                                                       |                                                                                                               |
+=============================================================================================================+==============+=======================================================+===============================================================================================================+
|                                                                                                             | *Typeface*   | *Font*                                                | *Size*                                                                                                        |
+-------------------------------------------------------------------------------------------------------------+--------------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
| **Base**                                                                                                    | `family`     | `font`                                                | `cra` (pixels) or `cin` (inches) multiplied by `cex`                                                          |
|                                                                                                             |              |                                                       |                                                                                                               |
| *Arguments are passed to `par()` to set globally or directly to the call that renders text (e.g. `text()`)* |              |                                                       |                                                                                                               |
+-------------------------------------------------------------------------------------------------------------+--------------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
| **Grid**                                                                                                    | `fontfamily` | `fontface`                                            | `fontsize` (points) multiplied by `cex`                                                                       |
|                                                                                                             |              |                                                       |                                                                                                               |
| Arguments are passed to the `gp` argument of relevant grobs using the `gpar()` constructor                  |              |                                                       |                                                                                                               |
+-------------------------------------------------------------------------------------------------------------+--------------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
| **ggplot2**                                                                                                 | `family`     | `face` (in `element_text()`) or `fontface` (in geoms) | `size` (points when used in `element_text()`, depends on the value of `size.unit` argument when used in geom) |
|                                                                                                             |              |                                                       |                                                                                                               |
| Arguments are set in `element_text()` to alter theme fonts or directly in the geom call to alter geom fonts |              |                                                       |                                                                                                               |
+-------------------------------------------------------------------------------------------------------------+--------------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
From the table it is clear that in R `fontfamily`/`family` is used to describe the typeface and `font`/`fontface`/`face` is used to select a font from the typeface. Size settings is just a plain mess.
The major limitation in `fontface` (and friends) is that it takes a number, not a string, and you can only select from four options: `1`: plain, `2`: bold, `3`: italic, and `4`: bold-italic. This means, for example, that there's no way to select Futura Condensed Extra Bold. Another limitation is that it's not possible to specify any font variations such as using tabular numbers or stylistic ligatures.
### Fonts and text from a graphics device perspective
In R, a graphics device is the part responsible for doing the rendering you request and put it on your screen or in a file. When you call `png()` or `ragg::agg_png()` you open up a graphics device that will receive all the plotting instructions from R. Both graphics devices will ultimately produce the same file type (PNG), but how they choose to handle and respond to the plotting instructions may differ (greatly). Nowhere is this difference more true than when it comes to text rendering.
After a user has made a call that renders some text, it is funneled through the graphic system (base or grid), handed off to the graphics engine, which ultimately asks the graphics device to render the text. From the perspective of the graphics device it is much the same information that the user provided which are presented to it. The `text()` method of the device are given an array of characters, the typeface, the size in points, and an integer denoting if the style is regular, bold, italic, or bold-italic.
{.fig fig-alt="A diagram showing the flow of text rendering instructions from ggplot2, grid, the graphics engine, and down to the graphics device. Very little changes in the available information about the font during the flow"}
This means that it is up to the graphics device to find the appropriate font file (using the provided typeface and font style) and shape the text with all that that entails. This is a lot of work, which is why text is handled so inconsistently between graphics devices. Issues can range from not being able to find fonts installed on the computer, to not providing font fallback mechanisms, or even handling right-to-left text. It may also be that certain font file formats are not well supported so that e.g. color emojis are not rendered correctly.
There have been a number of efforts to resolve these problems over the years:
-   **extrafont**: Developed by Winston Chang, [extrafont](https://github.com/wch/extrafont) sought to mainly improve the situation for the `pdf()` device which generally only had access to the postscript fonts that comes with R. The package allows the `pdf()` device to get access to TrueType fonts installed on the computer, as well as provide means for embedding the font into the PDF so that it can be opened on systems where the font is not installed. (It also provides the capabilities to the Windows `png()` device).
-   **sysfonts** and **showtext**. These packages are developed by Yixuan Qiu and provide support for system fonts to all graphics devices, by hijacking the `text()` method of the graphics device to treat text as polygons or raster images. This guarantees your plots will look the same on every device, but it doesn't do advanced text shaping, so there's no support for ligatures or font substitution. Additionally, it produces large files with inaccessible text when used to produce pdf and svg outputs.
-   **systemfonts** and **textshaping**. These packages are developed by me to provide a soup-to-nuts solution to text rendering for graphics devices. [systemfonts](https://systemfonts.r-lib.org) provides access to fonts installed on the system along with font fallback mechanisms, registration of non-system fonts, reading of font files etc. [textshaping](https://github.com/r-lib/textshaping) builds on top of systemfonts and provides a fully modern engine for shaping text. The functionality is exposed both at the R level and at the C level, so that graphics devices can directly access to font lookup and shaping.