The main idea is to start with a base layer of raw data and then add more layers of annotations and statistical summaries. The package allows us to produce graphics using the same structure of thought that we use when designing an analysis, reducing the distance of how we visualize a graphic in the head and the final product.
Learning the grammar will not only be crucial to produce a graph of interest, but also to think about other more complex graphs. The advantage of this grammar is the possibility to create new graphs composed of new combinations of elements.
All ggplot2 graphs contain the following components:
This components are put together using “+”.
The most common syntax includes the data within the “ggplot” call and a “geom_” layer.
First install/load the package:
#install
install.packages("ggplot2")
# load library
library(ggplot2)
Let’s use the “iris” data set to create scatter plots:
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point()
This plot is defined by 3 components: 1. “data”- iris 1. “aes” - Sepal.length vs Petal.length 1. “layer” - Points (geom)
We can also add other aesthetic attributes like color, shape and size. This attributes can be included within aes()
:
# color by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point()
# color and shape by species
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species)) + geom_point()
Note that the aesthetic arguments can also be included in the “geom” layer:
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point(aes(color = Species, shape = Species))
We can also include a fixed aesthetic value:
ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Petal.Length)) + geom_point(color = "red2")
Some attributes work better with some data types:
Exercise 1
Using the “hylaeformis” data set:
#read from website
hylaeformis_data <- read.csv("https://marceloarayasalas.weebly.com/uploads/2/5/5/2/25524573/hylaeformis_data.csv", stringsAsFactors = FALSE)
# or download manually and read from local file
hylaeformis_data <- read.csv("hylaeformis_data.csv", stringsAsFactors = FALSE)
head(hylaeformis_data, 20)
1.1 Create a scatter plot of “duration” vs “meanfreq” (mean frequency)
1.2 Add a aesthetic attribute to show a different color for each locality
1.3 Add another aesthetic attribute to show “dfrange” (dominant frequency range) as the shape size
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
facet_wrap(~Species)
# or
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
facet_grid(~Species)
The scale can be fixed or free for the x and y axis, and the number of columns and rows can be modified:
# free x
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free_x")
# free x and 3 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free_y", nrow = 3)
# both free and 2 rows
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free", nrow = 2)
Note that we can also saved the basic component as an R object and add other components later in the code:
p <- ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point()
p + facet_wrap(~Species, scales = "free_x", nrow = 3)
geom_smooth()
- adds best fit lines (including CI)geom_boxplot()
geom_histogram()
& geom_freqpoly()
- frequency distributionsgeom_bar()
- frequency distribution of categorical variablesgeom_path()
& geom_line()
- add lines to scatter plots
Best fit regression lines can be added with geom_smooth()
:
# smoother and CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~Species, scales = "free", nrow = 3)
# without CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~Species, scales = "free", nrow = 3)
If the relation between X and Y is noisy, a smoothed line can help to detect patterns:
# smoother and CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth() +
facet_wrap(~Species, scales = "free", nrow = 3)
## `geom_smooth()` using method = 'loess'
# without CI
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~Species, scales = "free", nrow = 3)
## `geom_smooth()` using method = 'loess'
Exercise 2
Using the “msleep” example data set:
2.1 Create a scatter plot of “bodywt”(body weight) vs “brainwt” (brain weight)
2.2 Add “order” as a color aesthetic
2.3 Add a “facet” component to split plots by order using free scales
2.4 Remove the orders with less than 4 species in the data set and make a plot similar to 2.3
2.5 Add a smooth line to each plot in the panel
Again, it only takes a new “geom” component to create a boxplot:
ggplot(iris, aes(Species, Petal.Length)) + geom_boxplot()
An interesting alternative are the violin plots:
ggplot(iris, aes(Species, Petal.Length)) + geom_violin()
Same thing with histrograms and frequency plots:
ggplot(iris, aes(Petal.Length)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(iris, aes(Petal.Length)) + geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(iris, aes(Petal.Length)) + geom_histogram() + geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We can control the width of the bars:
ggplot(iris, aes(Petal.Length)) +
geom_histogram(binwidth = 1, fill = adjustcolor("red2", alpha.f = 0.3))
And compare the distribution of different groups within the same histogram:
ggplot(iris, aes(Petal.Length, fill = Species)) + geom_histogram(binwidth = 0.4)
Show the distribution of discrete (categorical) variables:
tab <- table(msleep$order)
df <- as.data.frame(table(msleep$order[msleep$order %in% names(tab)[tab > 3]]))
ggplot(df, aes(Var1, Freq)) + geom_bar(stat = "identity")
Besides the basic functions (e.g. components) described above, ggplot has many other tools (both arguments and additional functions) to further customize plots. Pretty much every thing can be modified. Here we see some of the most common tools.
ggplot2 comes with some default themes that can be easily applied to modified the look of our plots:
p <- ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point()
p + theme_classic()
p + theme_bw()
p + theme_minimal()
Most themes differ in the use of grids, border lines and axis labeling patterns.
Axis limits can be modified as follows:
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
xlim(c(0, 10))
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) +
geom_point() +
xlim(c(0, 10)) +
ylim(c(0, 9))
Axis can also be transformed:
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) +
geom_point() +
scale_x_continuous(trans = "log") +
scale_y_continuous(trans = "log2")
or reversed:
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) +
geom_point() +
scale_y_reverse()
ggplots can be exported as image files using the ggsave
function:
ggplot(data = msleep[msleep$order %in% names(tab)[tab > 5], ], mapping = aes(x = bodywt, y = brainwt)) +
geom_point() +
facet_wrap(~order, scales = "free")
## Warning: Removed 21 rows containing missing values (geom_point).
# Export
ggsave("plot.png", width = 5, height = 5)
## Warning: Removed 21 rows containing missing values (geom_point).
The image file type will be identify by the extension in the file name
Additional axis customizing:
# Log2 scaling of the y axis (with visually-equal spacing)
require(scales)
## Loading required package: scales
p + scale_y_continuous(trans = log2_trans())
# show exponents
p + scale_y_continuous(trans = log2_trans(),
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x)))
# Percent
p + scale_y_continuous(labels = percent)
# dollar
p + scale_y_continuous(labels = dollar)
# scientific
p + scale_y_continuous(labels = scientific)
### Agregar "tick marks" ###
# Cargar librerías
library(MASS)
data(Animals)
# x and y axis are transformed and formatted
p2 <- ggplot(Animals, aes(x = body, y = brain)) + geom_point(size = 4) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
theme_bw()
# log-log plot without log tick marks
p2
# Show log tick marks
p2 + annotation_logticks()
# # Log ticks on left and right
p2 + annotation_logticks(sides = "lr")
# All sides
p2 + annotation_logticks(sides = "trbl")
Many other types of graphs can be generated. Here I show a single example of cool contour and “heatmap” graphs:
head(faithful)
## eruptions waiting
## 1 3.60 79
## 2 1.80 54
## 3 3.33 74
## 4 2.28 62
## 5 4.53 85
## 6 2.88 55
ggplot(faithfuld, aes(eruptions, waiting)) + geom_contour(aes(z = density, colour = ..level..))
ggplot(faithfuld, aes(eruptions, waiting)) + geom_raster(aes(fill = density))
Check the CRAN Graphics Task View for a more comprehensive list of graphical tools in R.
Session information
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_0.5.0 MASS_7.3-49 knitr_1.20
## [4] RColorBrewer_1.1-2 ggplot2_2.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.14 magrittr_1.5 munsell_0.4.3 colorspace_1.3-2
## [5] rlang_0.1.6 stringr_1.2.0 plyr_1.8.4 tools_3.4.3
## [9] grid_3.4.3 gtable_0.2.0 htmltools_0.3.6 yaml_2.1.16
## [13] lazyeval_0.2.1 rprojroot_1.3-2 digest_0.6.13 tibble_1.4.1
## [17] reshape2_1.4.3 evaluate_0.10.1 rmarkdown_1.8 labeling_0.3
## [21] stringi_1.1.6 compiler_3.4.3 pillar_1.0.1 backports_1.1.2