ggplot2

  • An R package specifically designed to produce graphics
  • Unlike other packages, ggplot2 has its own grammar
  • The grammar is based on “Grammar of Graphics” (Wilkinson 2005)
  • Independent modules that can be combined in many forms
  • This grammar provides high flexibility

 

Grammar of graphics

The main idea is to start with a base layer of raw data and then add more layers of annotations and statistical summaries. The package allows us to produce graphics using the same structure of thought that we use when designing an analysis, reducing the distance of how we visualize a graphic in the head and the final product.

Learning the grammar will not only be crucial to produce a graph of interest, but also to think about other more complex graphs. The advantage of this grammar is the possibility to create new graphs composed of new combinations of elements.

 

Graph components

All ggplot2 graphs contain the following components:

  • Data - The R object with the information that needs to be plotted
  • layers - The specific data that will be plotted (e.g. ‘x’ & ‘y’)
  • scale - The range of the data to be included
  • coord. - Coordinate system (not used very often)
  • facet - determines how to break the data in subplots in a multipanel
  • theme - controls the plot style  

This components are put together using “+”.

The most common syntax includes the data within the “ggplot” call and a “geom_” layer.

 

First install/load the package:

#install
install.packages("ggplot2") 

# load library
library(ggplot2) 

 

Scatter plots

Let’s use the “iris” data set to create scatter plots:

ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point()

 

This plot is defined by 3 components: 1. “data”- iris 1. “aes” - Sepal.length vs Petal.length 1. “layer” - Points (geom)

 

Aesthetic attributes

We can also add other aesthetic attributes like color, shape and size. This attributes can be included within aes():

# color by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point()

# color and shape by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species)) + geom_point()

 

Note that the aesthetic arguments can also be included in the “geom” layer:

ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point(aes(color = Species, shape = Species))

 

We can also include a fixed aesthetic value:

ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point(color = "red2")

 

Some attributes work better with some data types:

  • Color and shape: categorical variables
  • Size: continuous variables

 


Exercise 1

Using the “hylaeformis” data set:

#read from website
hylaeformis_data <- read.csv("https://marceloarayasalas.weebly.com/uploads/2/5/5/2/25524573/hylaeformis_data.csv", stringsAsFactors = FALSE)

# or  download manually and read from local file
hylaeformis_data <- read.csv("hylaeformis_data.csv", stringsAsFactors = FALSE)

head(hylaeformis_data, 20)


1.1 Create a scatter plot of “duration” vs “meanfreq” (mean frequency)


1.2 Add a aesthetic attribute to show a different color for each locality


1.3 Add another aesthetic attribute to show “dfrange” (dominant frequency range) as the shape size


 

Multipanel plots (Facetting)

  • Another way to visualize categorical variables
  • Allows to create multipanel plots for each level of the variable
  • 2 types: “grid” & “wrap”
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  facet_wrap(~Species)

# or

ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  facet_grid(~Species)

 

The scale can be fixed or free for the x and y axis, and the number of columns and rows can be modified:

# free x
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  facet_wrap(~Species, scales = "free_x")

# free x and 3 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  facet_wrap(~Species, scales = "free_y", nrow = 3)

# both free and 2 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  facet_wrap(~Species, scales = "free", nrow = 2)

 

Note that we can also saved the basic component as an R object and add other components later in the code:

p <- ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() 

p + facet_wrap(~Species, scales = "free_x", nrow = 3)

Additional “geoms”

  • geom_smooth() - adds best fit lines (including CI)
  • geom_boxplot()
  • geom_histogram() & geom_freqpoly() - frequency distributions
  • geom_bar() - frequency distribution of categorical variables
  • geom_path() & geom_line() - add lines to scatter plots

 

geom_smooth()

Best fit regression lines can be added with geom_smooth():

# smoother and CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  facet_wrap(~Species, scales = "free", nrow = 3)

# without CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~Species, scales = "free", nrow = 3)

 

If the relation between X and Y is noisy, a smoothed line can help to detect patterns:

# smoother and CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  geom_smooth() +
  facet_wrap(~Species, scales = "free", nrow = 3)
## `geom_smooth()` using method = 'loess'

# without CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() + 
  geom_smooth(se = FALSE) +
  facet_wrap(~Species, scales = "free", nrow = 3)
## `geom_smooth()` using method = 'loess'


Exercise 2

Using the “msleep” example data set:


2.1 Create a scatter plot of “bodywt”(body weight) vs “brainwt” (brain weight)


2.2 Add “order” as a color aesthetic


2.3 Add a “facet” component to split plots by order using free scales


2.4 Remove the orders with less than 4 species in the data set and make a plot similar to 2.3


2.5 Add a smooth line to each plot in the panel


Boxplots

Again, it only takes a new “geom” component to create a boxplot:

ggplot(iris, aes(Species, Petal.Length)) + geom_boxplot()

An interesting alternative are the violin plots:

ggplot(iris, aes(Species, Petal.Length)) + geom_violin()

 

Histograms

Same thing with histrograms and frequency plots:

ggplot(iris, aes(Petal.Length)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(iris, aes(Petal.Length)) + geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(iris, aes(Petal.Length)) + geom_histogram()  + geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 

We can control the width of the bars:

ggplot(iris, aes(Petal.Length)) + 
  geom_histogram(binwidth = 1, fill = adjustcolor("red2", alpha.f = 0.3))

 

And compare the distribution of different groups within the same histogram:

ggplot(iris, aes(Petal.Length, fill = Species)) + geom_histogram(binwidth = 0.4)

 

Bar plots

Show the distribution of discrete (categorical) variables:

tab <- table(msleep$order)

df <- as.data.frame(table(msleep$order[msleep$order %in% names(tab)[tab > 3]]))

ggplot(df, aes(Var1, Freq)) + geom_bar(stat = "identity")

 

Customizing ggplots

Besides the basic functions (e.g. components) described above, ggplot has many other tools (both arguments and additional functions) to further customize plots. Pretty much every thing can be modified. Here we see some of the most common tools.

 

Themes

ggplot2 comes with some default themes that can be easily applied to modified the look of our plots:

p <- ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point() 

p + theme_classic()

p + theme_bw()

p + theme_minimal()

 

Most themes differ in the use of grids, border lines and axis labeling patterns.

 

Axis customization

Axis limits can be modified as follows:

ggplot(iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point()  + 
  xlim(c(0, 10))

ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + 
  geom_point()  + 
  xlim(c(0, 10)) + 
  ylim(c(0, 9))

 

Axis can also be transformed:

ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + 
  geom_point()  + 
scale_x_continuous(trans = "log") + 
  scale_y_continuous(trans = "log2")

 

or reversed:

ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + 
  geom_point()  + 
scale_y_reverse() 

Saving ggplots

ggplots can be exported as image files using the ggsave function:

ggplot(data = msleep[msleep$order %in% names(tab)[tab > 5], ], mapping = aes(x = bodywt, y = brainwt)) + 
  geom_point() +
  facet_wrap(~order, scales = "free")
## Warning: Removed 21 rows containing missing values (geom_point).

# Export
ggsave("plot.png", width = 5, height = 5)
## Warning: Removed 21 rows containing missing values (geom_point).

 

The image file type will be identify by the extension in the file name

 

Additional axis customizing:

# Log2 scaling of the y axis (with visually-equal spacing)
require(scales)
## Loading required package: scales
p + scale_y_continuous(trans = log2_trans())

# show exponents
p + scale_y_continuous(trans = log2_trans(),
    breaks = trans_breaks("log2", function(x) 2^x),
    labels = trans_format("log2", math_format(2^.x)))

# Percent
p + scale_y_continuous(labels = percent)

# dollar
p + scale_y_continuous(labels = dollar)

# scientific
p + scale_y_continuous(labels = scientific)

### Agregar "tick marks" ###

# Cargar librerías
library(MASS)

data(Animals)

# x and y axis are transformed and formatted
p2 <- ggplot(Animals, aes(x = body, y = brain)) + geom_point(size = 4) +
     scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     theme_bw()

# log-log plot without log tick marks
p2

# Show log tick marks
p2 + annotation_logticks() 

# # Log ticks on left and right
p2 + annotation_logticks(sides = "lr")

# All sides
p2 + annotation_logticks(sides = "trbl")

 

Other graphs

Many other types of graphs can be generated. Here I show a single example of cool contour and “heatmap” graphs:

head(faithful)
##   eruptions waiting
## 1      3.60      79
## 2      1.80      54
## 3      3.33      74
## 4      2.28      62
## 5      4.53      85
## 6      2.88      55
ggplot(faithfuld, aes(eruptions, waiting)) + geom_contour(aes(z = density, colour = ..level..))

ggplot(faithfuld, aes(eruptions, waiting)) + geom_raster(aes(fill = density))

 

Other R graphing packages

  • ggvis (interactive ggplots)
  • vcd (Warnes 2015)
  • plotrix (Lemon et al. 2006)
  • gplots (Warnes 2015)

Check the CRAN Graphics Task View for a more comprehensive list of graphical tools in R.


 

References

  • Lemon J (2006) Plotrix: a package in the red light district of R. R-News 6(4):8–12
  • Warnes GR, Bolker B, Bonebakker L, Gentleman R, Liaw WHA, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B (2015) gplots: various R programming tools for plotting data. R package version 2.17.0. https://CRAN.R-project.org/package=gplots
  • Wickham H (2010) A layered grammar of graphics. J Comput Graph Stat 19(1):3–28
  • Wilkinson L (2005) The grammar of graphics. Statistics and computing, 2nd edn. Springer, New York


Session information

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_0.5.0       MASS_7.3-49        knitr_1.20        
## [4] RColorBrewer_1.1-2 ggplot2_2.2.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14     magrittr_1.5     munsell_0.4.3    colorspace_1.3-2
##  [5] rlang_0.1.6      stringr_1.2.0    plyr_1.8.4       tools_3.4.3     
##  [9] grid_3.4.3       gtable_0.2.0     htmltools_0.3.6  yaml_2.1.16     
## [13] lazyeval_0.2.1   rprojroot_1.3-2  digest_0.6.13    tibble_1.4.1    
## [17] reshape2_1.4.3   evaluate_0.10.1  rmarkdown_1.8    labeling_0.3    
## [21] stringi_1.1.6    compiler_3.4.3   pillar_1.0.1     backports_1.1.2