In the last class we looked at R through the lens of basic concepts
all the way to presenting an introduction to data wrangling. The
material for the current class focuses on communicating results using
graphical methods. For this, we will use the ggplot2
package. ggplot2
is the most popular and robust package for
data visualization. There are plenty of resources on two packages, but a
good start is with the package website and
a very useful cheat
sheet. Most of the popularity with ggplot2
is because
its quality and numerous types of plot that can render.
How is ggplot2
structured? This packaged was inspired by
the book The
Grammar of Graphics, by Leland Wilkinson. Specifically, the strength
with ggplot2
is a function of building a graphic by
breaking down the plot in different parts, which allows the end-user the
flexibility to build the graphic to their desired liking and goal.
The most important components of ggplot2 graphs are:
In the next sections, we will explore these different components in
more detail. Each section will add one more component to our base graph.
It is important to mention that we will not cover all the different
types of plots that you can do it with ggplot2
, first
because there are many different types, and second, because extensions
for new types of graphics are created frequently. This lesson should
“arm” you with an excellent introduction to be able to explore other
graphical tools available in this package and R in general.
First, lets start with the data the we used last class.
<- readr::read_csv("data/intro_r/data_demo.csv") data_demo
## Rows: 64 Columns: 9
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): trt, var
## dbl (7): plot, blk, sev, inc, yld, don, fdk
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_demo
## # A tibble: 64 × 9
## plot trt var blk sev inc yld don fdk
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 107 A R 1 1.95 35 86.6 0.11 4
## 2 109 A R 1 1.2 20 80.3 0.15 4
## 3 204 A R 2 0.9 20 81.1 0 42
## 4 214 A R 2 4.05 50 85.1 0.07 25
## 5 305 A R 3 1.1 15 84.8 0 29
## 6 316 A R 3 1.8 10 93.7 0 24
## 7 406 A R 4 1.4 25 84.5 0.06 37
## 8 412 A R 4 0.45 10 84.8 0.07 42
## 9 108 A S 1 13.3 55 92.3 0.42 NA
## 10 110 A S 1 12.3 50 103. 0.26 NA
## # … with 54 more rows
Every ggplot2
plot starts with the function,
ggplot()
.
library(tidyverse) # load ggplot2 and other tidyverse package
ggplot()
Note: when you do not inform ggplot2
of what data you
want plot, the function ggplot()
will only print a gray
panel.
This is where data and aesthetics enter the discussion.
ggplot(data = data_demo, aes(x = trt, y = yld)) # aes is short for aesthetics
So far, we have only told ggplot
that our graphic will
have information about treatments values in the x-axis and the variable
yield on the y-axis. However, there are plenty of plots that could be
created with quantitative and qualitative variables, for example,
boxplots or line plots.
At this point, what we will do in the next steps is to indicate to R exactly what type of plot we want to create. This is where information about the geometric elements are considered.
ggplot2
native
geom_*()
In this step, we now indicate to R the specific type of graphic we are going to create, for example, line, points, histogram, etc.
Below is an example plotting the points (or raw data) using
geom_point()
.
ggplot(data = data_demo, aes(x = trt, y = yld)) + # Note: we use the "+" sign to connect lines of code
geom_point()
Because many of the points overlap, a better option for this plot
will be the geom_jitter()
, which adds a small amount of
random variation to the location of each data observation.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_jitter()
What occurred is that each observation is dispersed horizontally by
the differing amounts of random variation, with the goal to avoid
overlapping observations. Nonetheless, given the amount of variation,
the observations appear too disperses, making it difficult to clearly
see where one treatments ends and where the other one starts. We can
modify the geom_jitter()
function to reduce the amount of
dispersion by using the argument width
inside the function
(see below). (Note: all geom_*
functions additional
arguments which can be added, depending on the need.)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_jitter(width = .2) # the default value is 0.4, we reduce the dispersion by half
While points provide a good idea of the dispersion in a dataset, they are not the only way to represent the data. We will illustrate in the next set of code some of the different ways to explore the data:
Boxplot (geom_boxplot()
)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()
Violin plot (geom_violin()
)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin()
Dot plot (geom_dotplot()
)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_dotplot(binaxis = "y", stackdir = "center")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Another way to think about geom_*
functions is to
consider these as layers
. We can begin to stack different
options to improve the quality of the plot, as well as provide more
information about the data itself, the dispersion, and start to compare
different factors of interest.
# Boxplot and points
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(width = .2, size = 4)
While there is no limit to the number of layers (geom_*) that you can use in a plot, remember that these all map to the same order that they appear and it may make it more difficult to make clear comparisons.
# Multiple layers
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(width = .2, size = 4)+
geom_violin() # Note that the violin plot is the last layer to be add, therefore it will overlap with the other layers
One way to overcome one graphic reducing the ability to make a clear interpretation is to change the order of the layers.
# Multiple layers
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin()+
geom_boxplot(width = 0.3)+ # using the argument width we can adjust the box size
geom_jitter(width = .2, size = 4)
There are plenty of other native geom_*
functions in
ggplot2
. Below you can see other examples extracted from
the ggplot2
cheatsheet from RStudio (Download
eng or Download
sp)
There are also extension packages that expand the geometric elements
and other functions from ggplot2
. You can check a gallery
of different extension following this link, and few
examples below.
The next component in ggplot2
plots is aesthetics, which
is how our observations will be mapped in the plot. Aesthetics can have
multiple formats. So far, we have used aesthetics regarding the
positions x
and y
in the aes()
argument. However, there are many other formats that aesthetics can
take, and the most common are:
These three aesthetics have some similarities.
color
refers to the color of point or lines,
fill
refers to the color for an enclosed space (box,
polygon, circle, etc), and alpha
refers to the color
transparency.
In the the plot below we use the color
aesthetic to
differentiate the treatments.
ggplot(data = data_demo, aes(x = trt, y = yld, color = trt)) +
geom_boxplot()+
geom_jitter(width = 0.2, size = 4)
If we want to use color
only for the points, we remove
the argument color
argument from the ggplot()
function, and write the argument inside the
geom_jitter()
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(color = trt), width = 0.2, size = 4) # different colors for each observation/point depending on the treatment
Or, we can use different fill colors in the respective boxplots
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(aes(fill = trt))+
geom_jitter(width = 0.2, size = 4)
Lastly, here is an example using alpha
to define the
transparency in the yield.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = yld), # Here we are using alpha to show a yield gradient
width = 0.2, size = 4) # increase the point size to see the differences better
If we use these arguments (color, fill, and alpha) outside of the aesthetic function, they change the whole element.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin(color = "red")+ # border color equal to red
geom_boxplot(width = 0.3, fill = "green")+ # fill equal to green
geom_jitter(alpha = 0.4, # 60% of transparency
width = 0.2, size = 4)
The size
aesthetic allows us to change the element
size.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(size = sev), #The size of the point changes depending on the severity value
width = 0.2)
There is several options for the shape type in ggplot
,
with the most common present below.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(shape = trt),
width = 0.2, size = 4)
Note that shapes from 21 to 25 have color and fill, while the others only define the color.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(color = trt, # point border color
fill = var), # point fill defined by the variety
shape = 21, # we can change fill and color on shape 21 to 25
stroke = 1.5, # to make point border thicker
width = 0.2, size = 4)
# when we talk about scale we will learn how to chose the colors
# so far we are using default ggplot options
We can also have different line types for each level of a variable.
In the example below, we use two different linetype
aesthetics to separate the varieties by their level of genetic
resistance, where R is resistant and S is susceptible.
# We use pipes to summarize the data by treatment and variety
<- data_demo %>%
data_line group_by(trt, var) %>%
summarise(sev = mean(sev))
## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.
data_line
## # A tibble: 8 × 3
## # Groups: trt [4]
## trt var sev
## <chr> <chr> <dbl>
## 1 A R 1.61
## 2 A S 6.28
## 3 B R 1.7
## 4 B S 1.58
## 5 C R 0.475
## 6 C S 0.669
## 7 D R 0.431
## 8 D S 0.444
# Then we creat a plot with different line types
ggplot(data_line) +
geom_line(aes(x = trt,
y = sev,
linetype = var, # each variety (R or S) will have a different line type
group = var), # group is telling R that it should group the data by variety
size = 1) #
So far we have built our plot using the default options, now we will
start to customize it using scales
. Scales control how the
aesthetics are mapped, therefore, for each aesthetic we will have a
respective scale
function. Let’s start by looking at the x
and y aesthetics.
scale_*_continuous
X-axis, we can use scale_y_continuous
. Y-axis, we can
use scale_x_continuous
.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_y_continuous( # because our y-axis variable is continuous
name = "Title of our axis", # name of the axis
limits = c(70, 130), # here we increase a little our axis limits
breaks = c(seq(70,130,10))) # secondary axis breaks
We can add a second axis.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_y_continuous(name = "Yield (bu/ac)",
limits = c(70, 130),
breaks = c(seq(70,130,10)),
# Here we add a second axis
sec.axis = sec_axis( # if we only want to duplicate an axis, we can use sec.axis = dup_axis())
trans = ~ . *0.0672, # we use a transformation to change yield from bu/ac to ton/ha
name = "Yield (ton/ha)", # secondary axis name
breaks = c(seq(4.5,9,0.5)))) # secondary axis breaks
NOTE: the second axis works for discrete axis as well (see next subsection)
scale_*_discrete
We will use the x-axis to illustrate. With scale_*_discrete you can do things like rearrange or select/excluse levels.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_x_discrete(
limits = c("B", "A", "D")) # here we change the order of treatments and exclude the treatment C
## Warning: Removed 16 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).
NOTE: R give an warning message informing that there are
missing values. This is because our selection made by argument
limits
transformed all values of trt = C
into
missing values, NA
s.
We can also change the labels using the labels
argument.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_x_discrete(
labels = c('B' = 'New Label', # pProvide to ggplot new label information for each variable
'D' = 'New Label \n in two lines')) # you can use the "\n" to break the sentence in more lines
color
and fill
are two aesthetics that are
very similar. To avoid redundancy, we will use examples based on
fill
only, but the same principles apply to the color
aesthetic. To change colors in R, we can use built-in names or the RGB
code. There are 657 built-in color names in R. You can see their names
by using the function color()
and the figure below provides
a few examples.
The RBG code is the additive combination of Red, Green, and Blue in an hexadecimal format, which can take 16 possible “values” (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F) which are in 6-character arrangements. There are also two extra characters which define the color transparency.
scale_fill_manual()
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt),
shape = 21, width = 0.2, size = 4)+
scale_fill_manual(values = c("blue", # color name
"gray67", # color name, but from a different gray intensity
"#9f3b6c", # use RGB code to assign a color
"#9f3b6c80")) # same color as trt C, but with transparency
Although selecting a color for a plot seems trivial, it is actually very challenging. There are many things to consider when we use a color scheme (pallet). We need to consider if the person is color blind or if they follow good practices for data visualization. While we will not specifically discuss these topics during the course, we highly recommend that you look for more information on these topics.
There are plenty of resources to chose color, of which one of the most popular is the website colorbrewer2.org, conceptualized by professor Dr. Cynthia A. Brewer. In this website there are suggestions of pallets for sequential, diverging, and qualitative data, as well as useful information regarding color blind and print friendly options.
This website was so popular within ggplot2
that it has
become a package and now is native to ggplot2 through the function
scale_color/fill_brewer()
. An example using the pallet from
the figure above can be seen in the next plot .
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt),
shape = 21, width = 0.2, size = 4)+
scale_fill_brewer(palette = "Paired") # This scale automatic identify the number of classes and used the defined pallet to fill the colors
Interestingly, there are also developers who have created packages for color pallets, one example is the package ggsci, where the pallets were created using colors common to a select group of scientific journals.
library(ggsci)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt),
shape = 21, width = 0.2, size = 4)+
scale_fill_npg() # This scale automatically identifies the number of classes and uses the defined pallet to fill the colors
scale_fill_gradient*()
If our variable is quantitative, we can use a color fill based on a
gradient. There are basically 3 different scales for gradient,
scale_fill_gradient()
, scale_fill_gradient2()
,
and scale_fill_gradientn()
.
scale_fill_gradient()
: create a gradient from two colors
(low and high)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk),
shape = 21, width = 0.2, size = 4)+
scale_fill_gradient(low = "green",
high = "red",
na.value = "blue")
scale_fill_gradient2()
: create a gradient from three
colors (low, middle, and high)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk),
shape = 21, width = 0.2, size = 4)+
scale_fill_gradient2(low = "blue", # This is the low value
mid = "yellow", # middle value
high = "red", # high value
midpoint = 21, # set up the middle point, otherwise will use the default "0"
na.value = "black") # we can also change the color of NAs values
# Note, ggplot2 distributes the color symmetric from the middle point, so, depending on where the middle point is, it is possible that low or high color values may not be reached.
scale_fill_gradientn()
: create a gradient from n
colors
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk),
shape = 21, width = 0.2, size = 4)+
scale_fill_gradientn(colours = c("#FF0000", "#FFFF00", "#00FF00",
"#00FFFF", "#0000FF", "#FF00FF"))
Alpha and size are two scales that are generally used for quantitative variables.
scale_alpha_*
scale_alpha_continuous()
: plot the variables in a range
of transparency. By default, this range is from 0.1 to 1.0
# Without specifying the scale, R will assume the default values
# therefore, here we would have the same results as if we have
# scale_alpha_continuous(range = c(0.1,1))
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk),
width = 0.2, size = 4)
We can change the range by using the argument range
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk),
width = 0.2, size = 4)+
scale_alpha_continuous(range = c(0.3, 0.6))
Note that there are two observations in treatment A where the color
transparency did not change. These two observations represent points
without values for FDK (NA
s). We can work on some
strategies to deal with this problem.
First, remove those two points from the whole data set. This will however result in another problem, since the calculations used to create boxplots considers those observations.
A better option is to use a different color or shape for those
observations, which will require some work in our code. In the function
scale_alpha_continuous
there is the argument
na.value
, where we will assign values of alpha
to NA
observations. We can make NA
s
observations disappear by attributingna.value = 0
. We then
plot those two observations using another layer
(geom_jitter
), where we could use shape or color to
differentiate those observations. In our example, we use shape.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk),
width = 0.2, size = 4)+
geom_jitter(data = filter(data_demo, is.na(fdk)), # filter the data such that there are only the two obs with NA values
aes(x = trt, y = yld), # position
shape = 8, # we will define a very different shape for NA
width = 0.2, height = 0, size = 4)+
scale_alpha_continuous(range = c(0.4, 1), # our alphas will change from 0.4 to 1
na.value = 0) # Make NAs values completely transparent
This is an example of how we can use the layer design of
ggplot2
and some creativity to overcome a potential
problem. Each scale can have a way to assign values for
NA
s, which is very convenient to other scale
functions, such as color or shape. However, with
scale_alpha_continuous
, changing the transparency for
NA
s does not make them disappear, so we need to
specifically define the NA
s transparency
(na.value = 0
) and add a new layer only for those
two points. This same principle can be applied for other situations,
such as wanting to emphasize a particular point in a database, etc.
scale_size_*
Size is similar to alpha scale.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(size = fdk),
width = 0.2)+
scale_size_continuous(range = c(1, 6)) # this is actually the default value
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Note that ggplot
provides us with a warning message
indicate that two values were removed. This is different to how
scale_alpha_continuous
handled the same issue. We emphasize
that it is important to understand how ggplot will deal with
NA
s. Similar to the previous example, we will add an extra
layer to add these NA
s. We also will change the order,
making values with lower FDK bigger, by adding the argument
trans = "reverse"
.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(aes(size = fdk),
width = 0.2) +
# same as previous example to deal with the NA problem
# warning will continue to show up, but the NAs are now added
geom_jitter(data = filter(data_demo, is.na(fdk)),
aes(x = trt, y = yld),
shape = 8,
width = 0.2, height = 0, size = 4) +
scale_size_continuous(range = c(0.5, 5),
trans = "reverse") # this reverses the size argument, meaning small values have a larger size
## Warning: Removed 2 rows containing missing values (`geom_point()`).
scale_shape_manual
The shape option is very limited in terms of customization.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(shape = var), # We will use the factor, variety, to define the different shapes
width = 0.2, size = 4)+
scale_shape_manual(values = c(8, 15)) # Here we defined two different shapes depending on the variety
scale_linetype_manual
There are a few options for linetype that we can use.
# preparing the data for our plot
= data_demo %>%
line_yld group_by(trt, var) %>%
summarize(yld = mean(yld))
## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.
line_yld
## # A tibble: 8 × 3
## # Groups: trt [4]
## trt var yld
## <chr> <chr> <dbl>
## 1 A R 85.1
## 2 A S 98.3
## 3 B R 104.
## 4 B S 116.
## 5 C R 107.
## 6 C S 116.
## 7 D R 103.
## 8 D S 115.
ggplot(line_yld) +
geom_line(aes(x = trt,
y = yld,
linetype = var, # each variety (R or S) will have a different line type
group = var),
size = 1)
R has a few options for lines. These can selected by their name, or
by the number defined by scale_linetype_manual
.
Illustrated using the linetype for varieity and the options “twodash” or “dashed”.
Facets allow us to create multiple plots based on specified variable.
In our previous examples, we used things like shape to differentiate
some of the variables, for example, variety. Now, we will create
separate plots for the variety, but within the same graphic. To
accomplish this, we apply the option facet_*
, of which
there are two types of facet in ggplot2
,
facet_wrap
and facet_grid.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var)
What occurred? Our original plot was now split in two with one for the resistant variety (R) and the other for the susceptible (S) variety. The configurations for the two plots are similar, as are the scales, meaning that they provide standardized information and can be compared, etc.
One thing to note though is that the data used to create this plot is
different from when we had the two variables combined. As result,
ggplot2
considered that some of our observations are
outliers, when before we did not have this issue. This is a good example
which illustrates that when we change the database structure, we need to
make sure that we verify the outputted result to check for problems. In
this example, we need to an options to suppress the outlier in the
geom_boxplot
.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+ # we define the outliers as completely transparent
# Note: By making these outliers observations transparent, they are still considered in the calucations for boxplots,
# at same time, they are not duplicated in the plot with the geom_jitter
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var)
Below we can see an example of how to change the y-axis scale and the strip text (text above each plot)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+ #outliers defined to be completely transparent
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var,
scales = "free_y", # scales will be independent if free, or only one dimension if free_y/x
labeller = labeller( # here is an example of how to change the
var = c("R" = "Resistant", # need to remember the order of the call to define the variables
"S" = "Susceptible"))) # then give a new name for each level
Below, we can see an example of how to change the y-axis scale and output the graph in a stacked format with labels (text above each plot)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+ # outliers are defined to be completely transparent
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")),
ncol = 1, nrow = 2) # this defines how the graphics are displayed
facet_grid
” create a grid of plots, using one variable
to define the columns and the other to define the rows. In the example
below we split the incidence between “low” (inc \(\leq\) 10%) and high (inc \(\ge\) 10%) using mutate
and
logical operators.
<- data_demo %>%
data_grid # create a new variable by splitting the incidence using a 10% threshold
mutate(inc_cat = if_else(inc <=10, "Low Incidence", "High Incidence"))
ggplot(data = data_grid, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(inc_cat~var, # inc_cat = row; var=column
labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")))
We can use grid with only one variable as well.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(.~var) # grid by column
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+ # outliers defined to be completely transparent
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(var~.) # grid by row
Below we have an example of how to change the y-axis scale and text options
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(
# labels for the plots in general
title = "Here is a title",
subtitle = "Here is the plot subtitle",
caption = "This is figure caption",
tag = "A",
# Labels for the aesthetics
y = "y-axis title",
x = "x-axis title",
fill = "fill title"
)
So far, we built our plots focusing on how to work and under the data
being plotted by using layers (geom
), aesthetics, and
facets. Now we will take a quick look on how to change some of non-data
components of the plot. We can make many of these changes using
theme()
.
ggplot2
has complete themes and options to customize
those. There are also several packages that have extra complete
themes.
ggplot2
has ten built-in themes that are very convenient
to use.
theme_bw
: dark-on-light, works well
with presentations displayed with a projector.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4) +
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
# only enter with the theme name
theme_bw() # theme black and white
theme_minimal
: no background
annotations
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments")+
# only enter with the theme name
theme_minimal()
theme_classic
: classic design, x and y
axis lines, but no gridlines
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments")+
# only enter with the theme name
theme_classic()
Other complete themes in ggplot2
include:
theme_grey()
(default); theme_linedraw()
;
theme_light()
; theme_dark()
;
theme_void()
; theme_test()
There is several package with built-in themes, here we will use only two as examples.
ggthemes package
ggthemes
package - Has several different complete themes. The example below
(theme_economist()
) is inspired by plots made by the
magazine “The Economist”.
library(ggthemes)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments")+
# only enter with the theme name
theme_economist()
bbplot package
bbplot is a package developed by the BBC team.
# This package is not yet in the CRAN, so we need to download from a website called GitHub
# to do this, we need the package 'devtools', if you don't have it, make sure that you run both lines
# install.packages('devtools')
# devtools::install_github('bbc/bbplot')
library(bbplot)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments")+
# only enter with the theme name
bbc_style()
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
Another option is to not use the built-in themes and customize the
plots ourselves. ggplot2
has a lot of
options for customization, from the legend position to the ticks
size. As you can imagine, it would be difficult to cover each of these
options, but we will illustrate a few examples.
Let’s start with a plain plot where we modify a few elements, such as the legend, strip text, axis text and title
# This is the default theme
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")))+
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
theme(
# theme function is empty, so will plot the default theme
)
# This is the default theme
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")))+
labs(y = "Yield (bu/ac)", fill = "Treatments",
x = NULL) + # we suppress the x-axis title, since it will be in the legend
theme(
# change the background color to white
panel.background = element_blank() ,
# the default color is white, so we have to change to gray to be different from the background
panel.grid.major = element_line(colour = "grey88"),
# the default theme does not have a border
panel.border = element_rect(colour = "grey80", fill = NA),
# axis text to black (was a little gray)
axis.text.y = element_text(colour = "black"),
#suppress the x-axis text, since it will be in the legend
axis.text.x = element_blank(),
# Change titles (y-axis and legend[fill]) to black text, size 12, and bold
title = element_text(colour = "black", size = 12, face = "bold"),
# bring the legend to inside of the plot
legend.position = c(0.90, .20),
# Legend key was gray, change to white
legend.key = element_rect(fill = "white"),
# Legend text attribute
legend.text = element_text(colour = "black", size = 12),
# Strip text justified almost completely to the left - (hjust = 0.01)
# hjust = 0.5 is center justification, and hjust = 1 is right justification
strip.text.x = element_text(hjust =0.01, face = "bold", size = 14),
# strip text background
strip.background = element_blank())
Frequently, we need to combine multiple plots into a single one. While many people copy and paste their plots in a MS Power Point (or similar software) and save as new plot, this approach is very inefficient, compromises quality, and may cause unintentional errors.
A better approach is combine these plots in single one and save.
There are a few good packages that can help with this task, including,
gridExtra
,
cowplot
,
and ggpubr
,
but we will explore a new package, and arguably easier one, called patchwork
.
First, let’s build some plots to be combined using
patchwork
. We will build four plots, considering the effect
of treatment on the following variables: yield, severity, incidence, and
FDK.
# plots not display to save space (will be in the combined plot)
<-
plot_yield ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
<-
plot_sev ggplot(data = data_demo, aes(x = trt, y = sev)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Severity (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
<-
plot_inc ggplot(data = data_demo, aes(x = trt, y = inc)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "Incidence (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
<-
plot_fdk ggplot(data = data_demo, aes(x = trt, y = fdk)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(y = "FDK (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
library(patchwork)
# just use the "+" signal to combine the plots
+plot_sev plot_yield
# or the "/" to put one on top of the other
/plot_sev plot_yield
# use "()" to create a hierarchical order of combinations
+plot_sev) / # first combine yield and severity, followed by stacking incidence and fdk
(plot_yield+ plot_fdk) (plot_inc
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
And, we can add tags and make changes to the plot’s overall theme
# use "()" to create a hierarchical order of combinations
+plot_sev) / # first combine yield and severity, followed by stacking incidence and fdk
(plot_yield+ plot_fdk) +
(plot_inc
# add a tag for each plot, there way to customize this
plot_annotation(tag_levels = "A") & # very important, use "&" to make theme changes
theme(
axis.text.x = element_text(colour = "red")) # red to make an obvious change
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
We can use the the function ggsave
to save a plot.
ggsave(plot = plot_yield, # name of the plot in R
filename = "figures/intro_r/example1.png", # plot will be saved in the folder figures, with name example1, format png
width = 85, height = 80, units = "mm", # plot dimension
dpi = 450 ) # resolution
And below is how our saved plot looks (it is the upload version of the plot)
Note how the “saved plot” is different from the one printed in R. This is because we force our “saved plot” to have a specific dimension, which may be different than the dimension which our screen has. When this happens, if necessary, make the appropriated changes (change point size, text size, etc)
plot_yield