--- title: "Using `mapcan` to create choropleth maps" author: "Andrew McCormack" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using `mapcan` to create choropleth maps} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, comment = "#>" ) library(mapcan) library(dplyr) library(ggplot2) ``` ## Description The `mapcan()` function returns a data frame with geographic data that can be used in `ggplot2`. ## Overview Visualizing spatial data in R often involves working with large, specialized shape files that require a fair amount of conversion and manipulation before they are ready for use in `ggplot`. `mapcan` has done most of the heavy lifting, providing flexible, `ggplot`-ready geographic data. ## Arguments At the most basic level, `mapcan()` requires two arguments: `boundaries` and `type`. * `boundaries`: set `boundaries = province` for geographic data at the province level, `boundaries = census` for data at the census division level, or `boundaries = ridings` for geographic data at the federal riding level. * `type`: to produce geographic data for a standard choropleth map, set `type = standard`. For population cartogram data (for maps that alter the geography based on the population size at the province or census division level), set `type = cartogram`. For tile grid maps data at the federal riding level, set `type = bins`. Note: while `type = cartogram` will provide *data* (i.e. coordinates) for use in tile grid maps, `mapcan::riding_binplot()` is a convenient function for creating actual tile grid *maps* `ggplot`. By default, `mapcan()` will provide geographic data for the entire country. You may wish to either exclude the territories from your map or create a map of only one province: * `province`: to produce geographic data for only one province (or territory), provide the `province` argument with a provincial alpha code (options are `NL`, `PE`, `NS`, `NB`, `QC`, `ON`, `MB`, `SK`, `AB`, `BC`, `YT`, `NT`, and `NU`). For example, setting `province = BC` will return geographic data only for British Columbia. * `territories`: set `territories = FALSE` to exclude the territorities. ## Examples ### Creating a choropleth map with provincial boundaries ```{r} mapcan(boundaries = province, type = standard) %>% head() ``` `mapcan()` gives us a data frame with necessary components (longitude, latitude, order, hole, piece, and group) for use with `geom_polygon()` in the `ggplot` package. It also provides four different variables to describe the province that make it easy to merge this data with provincial data of your choice. #### Basic ingredients for `ggplot` To create a plot with data from `mapcan()`, the following aesthetic mappings are required: `x = long` (longitude), `y = lat` (latitude), `group = group` (this tells `geom_polygon()` how to group observations---in this case, provinces). Let's initialize the plot: ```{r} pr_map <- mapcan(boundaries = province, type = standard) %>% ggplot(aes(x = long, y = lat, group = group)) pr_map ``` This doesn't tell us much. We need to add a `geom` to visualize the map. #### Using `geom_polygon` to plot the coordinates To visualize the geographic data with `ggplot`, use `geom_polygon()`. It is important to also specify `coord_fixed()`---this fixes the relationship between longitude (the x-axis) and latitude (the y-axis): ```{r fig.width = 6, fig.height=5.5, warning = FALSE} pr_map <- pr_map + geom_polygon() + coord_fixed() pr_map ``` You will notice that the axis text has no substantive significance. You can remove it, along with the axis ticks and background grid using `theme_mapcan` function, a `ggplot` theme that is part of the `mapcan` package: ```{r fig.width = 6, fig.height=5.5, warning = FALSE} pr_map + theme_mapcan() + ## Add a title ggtitle("Map of Canada with Provincial/Territorial Boundaries") ``` Though beautiful, this map is not very informative (unless you are unfamiliar with the shape of Canada). Let's add some province-level data. #### Incorporate province-level statistics It is relatively straightforward to merge your own province-level statistics into the geographic data that `mapcan()` provides. To illustrate, we will work with the `province_pop_annual` data frame that is included in the `mapcan` package. This dataset provides annual provincial/territorial population estimates dating back to 1971. Let's use the most recent population data, from 2017: ```{r} pop_2017 <- mapcan::province_pop_annual %>% filter(year == 2017) head(pop_2017) ``` The next step is to attach these numbers to every point on the polygons of the provinces. To do this, we first create the required geographic with `mapcan()`, then we use `inner_join()` from the `dplyr` package to merge in the `pop_2017` data: ```{r, warning = FALSE} pr_geographic <- mapcan(boundaries = province, type = standard) pr_geographic <- inner_join(pr_geographic, pop_2017, by = c("pr_english" = "province")) ``` To colour the provinces according to their population size, set the population variable as a `fill` aesthetic. Because population is a continuous variable (and because I don't like the default colours), I will use `scale_fill_viridis_c()` colour scale to colour the map. ```{r fig.width = 6, fig.height=5.5, warning = FALSE} pr_geographic %>% ggplot(aes(x = long, y = lat, group = group, fill = population)) + geom_polygon() + coord_fixed() + theme_mapcan() + scale_fill_viridis_c(name = "Population") + ggtitle("Canadian Population by Province") ``` ### Creating a choropleth map with federal riding boundaries #### Generate geographic data with riding boundaries To create a map with federal riding boundaries, we specify `boundaries = ridings`. For the sake of illustration, let's also look at just one province: British Columbia. ```{r} bc_ridings <- mapcan(boundaries = ridings, type = standard, province = BC) head(bc_ridings) ``` #### Plot geographic data with riding boundaries ```{r fig.width = 5, fig.height=5.5, warning = FALSE} ggplot(bc_ridings, aes(x = long, y = lat, group = group)) + geom_polygon() + coord_fixed() + theme_mapcan() + ggtitle("British Columbia \nFederal Electoral Ridings") ``` #### Incorporate riding-level statistics Like with province-level statistics above, we can also merge our own riding-level statistics into the riding-level geographic data that `mapcan()` has produced. We will work with the `federal_election_results` data frame that is included in the `mapcan` package. This dataset provides federal election results for all elections dating back to 1997. We will use the results of 2015 federal election to colour the ridings in British Columbia. *Note: At the moment, `mapcan()` only provides geographic data for the electoral boundaries (2013 Representation Order) of the 2015 federal election.* ```{r} bc_results <- mapcan::federal_election_results %>% # Restrict data to include just 2015 election results from BC filter(election_year == 2015 & pr_alpha == "BC") head(bc_results) ``` Next, we merge the two data frames (i.e. the geographic data and the election results data): ```{r} bc_ridings <- inner_join(bc_results, bc_ridings, by = "riding_code") ``` To colour the ridings according the winning party of the 2015 election, set the `party` variable as a `fill` aesthetic: ```{r} bc_riding_map <- bc_ridings %>% ggplot(aes(x = long, y = lat, group = group, fill = party)) + geom_polygon() + coord_fixed() + theme_mapcan() + ggtitle("British Columbia \n2015 Federal Electoral Results") ``` The colours are not ideal. We can easily provide our own custom colours that correspond to the colours associated with the different parties with `ggplot`'s `scale_fill_manual()`: ```{r fig.width = 5, fig.height=5.5, warning = FALSE} bc_riding_map + scale_fill_manual(name = "Winning party", values = c("blue", "springgreen3", "red", "Orange")) ```