class: center, middle, inverse, title-slide .title[ # AIM RSF R series: Data visualisation with ggplot2 ] .subtitle[ ## Based on Data Carpentry: R for Social Scientists ] .author[ ### Eirini Zormpa ] .date[ ### 22 November 2022 (last updated 2022-11-22) ] --- # Summary of session 3: Data wrangling with `dplyr` and `tidyr` - ✅ Subset columns or rows with `select` or `filter` and create new columns with `mutate`. - ✅ Link the output of one function to the input of another function with the ‘pipe’ operator `%>%`. - ✅ Combine datasets using `join`. - ❌ Reshape a dataframe from long to wide format with the `pivot_wider` function. - ❌ Export a dataframe to a .csv and .tsv file. --- # Learning objectives: Data visualisation with `ggplot2` - ✅ Create line plots, scatter plots, and bar plots using `ggplot2`. - ✅ Apply faceting in `ggplot2`. - ✅ Build complex and customized plots from data in a data frame. --- # Why `ggplot2`? <img src="data-visualisation_files/figure-html/ggplot-1.png" width="100%" /> --- class: center, middle # `ggplot2`  [**ggplot2**](https://ggplot2.tidyverse.org/) is a package (included in **tidyverse**) for creating highly customisable plots that are built step-by-step by adding layers. The separation of a plot into layers allows a high degree of flexibility with minimal effort. --- # `ggplot2` layers .pull-left[ ```r <DATA> %>% ggplot(aes(<MAPPINGS>)) + <GEOM_FUNCTION>() + <CUSTOMISATION> ``` ] -- .pull-right[  .footnote[Artwork by [Allison Horst](https://allisonhorst.com/), reused under a CC-BY licence.] ] --- class: center, middle, inverse # Data visualisation crash-course --- # Aesthetics > Whenever we visualise data, we take data values and convert them in a systematic and logical way into the visual elements that make up the final graphic. [...] All data visualisations map data values into quantifiable features of the resulting graphic. We refer to these features as **aesthetics**. .footnote[Quote from Claus Wilke's [*Fundamentals of Data Visualization*](https://clauswilke.com/dataviz). ] -- ## Commonly-used aesthetics .pull-left[ - position (*x* and *y* coordinates), - colour ] .pull-right[ - size - shape - line type ] --- class: middle # Find the green dot lvl. 1 👶 <img src="data-visualisation_files/figure-html/green-dot1-1.png" width="100%" /> .footnote[Inspired by Kieran Healy's [*Data Visualization: A practical introduction*](https://socviz.co/). ] --- class: middle # Find the green dot lvl. 2 👩🎓 <img src="data-visualisation_files/figure-html/green-dot2-1.png" width="100%" /> .footnote[Inspired by Kieran Healy's [*Data Visualization: A practical introduction*](https://socviz.co/). ] --- class: middle # Find the green dot lvl. 3 👽 <img src="data-visualisation_files/figure-html/green-dot3-1.png" width="100%" /> .footnote[Inspired by Kieran Healy's [*Data Visualization: A practical introduction*](https://socviz.co/). ] --- # Colour considerations In the previous game, people with the most common type of colour-blindness would have struggled to perceive the colour distinction!  --- # Viridis palettes .pull-left[ Are relatively colourblind-friendly... <img src="data-visualisation_files/figure-html/viridis-1.png" width="100%" height="50%" /> <img src="data-visualisation_files/figure-html/inferno-1.png" width="100%" height="50%" /> ] -- .pull-right[ ... and they're pretty 😻 <img src="data-visualisation_files/figure-html/viridis-plot-1.png" width="100%" /> ] .footnote[from the [`viridisLite` site](https://sjmgarnier.github.io/viridisLite/reference/viridis.html) by Simon Garnier] --- class: center, middle, inverse # Data visualisation exercises --- class: center, middle # Exercise 1 🕜 **5 mins** Filter the `covid_data_clean` dataset to contain only observations from Denmark and Sweden for the year 2020 and then create a line plot of the `cases_rate` by `to_date` with the `country` showing in different colours.
−
+
05
:
00
--- class: center, middle, inverse # Exercise 1: Solution ```r covid_data_clean %>% filter(country == "Denmark" | country == "Sweden", year == 2020) %>% ggplot(aes(x = to_date, y = cases_rate, colour = country)) + geom_line(size = 1.2) + scale_colour_viridis_d() ``` <img src="data-visualisation_files/figure-html/exercise-1-sol-1.png" alt="Line plot showing the case rate of COVID cases for Denmark and Sweden in 2020. They're both fairly low until October when they start ticking up with Sweden surpassing 400 cases per 100,000 people approximately in December 2020, when Denmark is at about 300 cases per 100,000 people." width="85%" /> --- class: center, middle, inverse # Exercise 2 🕔 **10 mins** Create a (flipped) bar chart that shows the total death count for 2020 for the countries we selected and order them from the highest death toll to the lowest.
−
+
10
:
00
-- **Added challenge**: do the same but showing the years 2020, 2021, and 2022 in facets (using `facet_wrap()`). --- class: center, middle, inverse # Exercise 2: Solution (code) ```r covid_data_clean %>% drop_na() %>% filter(country %in% countries, year == 2020) %>% group_by(country) %>% summarise(deaths_year = sum(deaths_count)) %>% ungroup() %>% mutate(country = as.factor(country), country = fct_reorder(country, deaths_year)) %>% ggplot(aes(x = country, y = deaths_year)) + geom_col() + coord_flip() ``` --- class: center, middle, inverse # Exercise 2: Solution (plot) <img src="data-visualisation_files/figure-html/exercise-2-sol-plot-1.png" width="100%" /> --- class: center, middle, inverse # Exercise 2 bonus: Solution (code) ```r covid_data_clean %>% drop_na() %>% filter(country %in% countries) %>% group_by(country, year) %>% summarise(deaths_year = sum(deaths_count)) %>% ungroup() %>% mutate(country = as.factor(country), country = fct_reorder(country, deaths_year)) %>% ggplot(aes(x = country, y = deaths_year)) + geom_col() + coord_flip() + facet_wrap(~year) ``` --- class: center, middle, inverse # Exercise 2 bonus: Solution (plot) <img src="data-visualisation_files/figure-html/exercise-2-sol-bonus-plot-1.png" width="100%" /> --- class: center, middle, inverse # Data visualisation crash-course continued --- # The problem with bar plots  .footnote[Image from [*Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm*](https://doi.org/10.1371/journal.pbio.1002128) by Weissgerber, Milic, Winham, & Garovic (2015), reused and adapted under a [CC-BY 4.0 licence](https://creativecommons.org/licenses/by/4.0/).] --- # A better way: Raincloud plots! <img src="images/scherer-raincloud-plots.png" alt="Four raincloud plots, each of which consists of a density plot (the cloud) to illustrate the distribution, a scatterplot (the rain) to illustrate the raw data and a boxplot between the two for further information on central tendency." width="67%" style="display: block; margin: auto;" /> .footnote[Plot created by [Cedric Scherer](https://www.cedricscherer.com/2021/06/06/visualizing-distributions-with-raincloud-plots-and-how-to-create-them-with-ggplot2/) and reused under a [CC-BY licence](https://creativecommons.org/licenses/by/4.0/).)] --- class: center, middle, inverse # Exercise 3 🕕 **5 mins** Build the previous plot again and experiment with at least two themes. Which do you like best? .pull-left[ `theme_minimal` `theme_void` `theme_classic` ] .pull-right[ `theme_dark` `theme_grey` `theme_light` ]
−
+
05
:
00
--- class: center, middle, inverse # Exercise 3: My preference I prefer the white background of `theme_minimal` and I like that it retains the major grid, though that's slightly controversial. I also likes that it gets rid of the black box around the plot. --- ## This is just the beginning! `ggplot2` and compatible packages give you a huge amount of flexibility to create *exactly* the graph you want! You can explore packages that let you play around with: - beautiful palettes (e.g. `ghibli`, `wesanderson`), - new themes (e.g. `hrbrthemes`) - additional fonts (e.g. `extrafont`) - animated graphs (e.g. `gganimate`) - and so much more! There are more resources for you to explore in the HackMD ✨ --- # References Garnier, Simon, Ross, et al. (2022). _viridis - Colorblind-Friendly Color Maps for R_. R package version 0.4.1. DOI: [10.5281/zenodo.4679424](https://doi.org/10.5281%2Fzenodo.4679424). URL: [https://sjmgarnier.github.io/viridis/](https://sjmgarnier.github.io/viridis/). Healy, K. (2018). _Data visualization: A practical introduction_. Princeton: Princeton University Press. URL: [http://www.socviz.co](http://www.socviz.co). Weissgerber, T. L., N. M. Milic, S. J. Winham, et al. (2015). "Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm". In: _PLOS Biology_ 13.4, p. e1002128. DOI: [10.1371/journal.pbio.1002128](https://doi.org/10.1371%2Fjournal.pbio.1002128). URL: [https://doi.org/10.1371/journal.pbio.1002128](https://doi.org/10.1371/journal.pbio.1002128). Wickham, H. (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL: [https://ggplot2.tidyverse.org](https://ggplot2.tidyverse.org). Wilke, C. O. (2019). _Fundamentals of data visualization: A primer on making informative and compelling figures_. Sebastopol, CA.: O'Reilly Media Inc. URL: [https://clauswilke.com/dataviz/](https://clauswilke.com/dataviz/). --- class: center, middle # Thank you for your attention ✨ 🙏 ## See you next week for literate programming with `R Markdown` 📖