Overview

This practical will guide you through the basics of plotting in R. The first section will introduce you to general graphics concepts in R, then the following section will show you how to use these concepts to replicate publication quality graphs.

ggplot

R is a statistical programming language whos utility is based on the CRAN (the Comprehensive R Archive Network). Briefly, the CRAN is a network of file servers to which any R user can upload their own tools in the form of R packages. As such, there are numerous packages for facilitating plotting in R. Here we’ll exclusively use ggplot2. ggplot is extremely flexible, has a simple interface (relative to other packages), and is widely used because of this.

The obvious first step is installing ggplot. You can do so with the following code. Open RStudio, make a new script, and copy and paste this code into it. You can execute the line by pressing [Ctrl Enter] when you cursor is on it.

install.packages('ggplot2')

Plotting elements & Introduction to R

Now that we have it installed, we can run through the basics of the ggplot interface. First, we’ll load our installed package into memory.

require(ggplot2)
## Loading required package: ggplot2

After running this command, you can now call any of the functions bundled inside the ggplot package. Before we start, we’ll be using simulated data for the examples given here. You can copy-and-paste the following code to make the dataset for the next examples.

# Create some simulated data to plot
dataset.wave <- data.frame(
    value = sin(
        seq(0,7*pi, length.out = 50)),
    sample = 1:50,
    Component = "Pressure")
dataset.wave <- rbind(dataset.wave,
    data.frame(
        value = cos(
            seq(0,7*pi, length.out = 50)),
        sample = 1:50,
        Component = "Particle Motion"))
head(dataset.wave)
str(dataset.wave)
##       value sample Component
## 1 0.0000000      1  Pressure
## 2 0.4338837      2  Pressure
## 3 0.7818315      3  Pressure
## 4 0.9749279      4  Pressure
## 5 0.9749279      5  Pressure
## 6 0.7818315      6  Pressure
## 'data.frame':    100 obs. of  3 variables:
##  $ value    : num  0 0.434 0.782 0.975 0.975 ...
##  $ sample   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Component: Factor w/ 2 levels "Pressure","Particle Motion": 1 1 1 1 1 1 1 1 1 1 ...

Our example data is held in a data.frame object. This is just the notation that R uses for a table. Each column can hold different types of data, typically numeric or catagorical (catagorical data is called factors in R). The functions head, str (i.e. structure) and summary are usefull for getting a quick idea of whats in your data.frame.

Notice how lines starting with # are greyed out? A hashtag denotes a comment, meaning what you enter after the hash on the line will not be read as code by R. You can use hashtag comments to keep useful notes about your R scripts. This is especially important if others have to read your code, but even for your own code, its good to get in the habit of annotating your code so you can easily come back and use it later.

ggplots are comprised of elements following a hierarchical structure. I’ll go through this with some example code below where we’ll plot a simulated acoustic waveform.

ggplot(data = dataset.wave, aes(x = sample, y = value, color = Component)) + geom_line()

The function ggplot intiates the plot that we are creating and its also where you can define parameters which are applied to the entire plot. When we add the plotting element geom_line (read as line geometry), this element inherets the arguments we provided in ggplot.

Alternatively, we can leave ggplot empty, and introduce these arguments later in the geom_line object. The following code results in the same plot.

ggplot() + geom_line(data = dataset.wave,
                                         aes(x = sample, y = value, color = Component))

While data obviously refers to the data.frame you’re using, the function aes refers to aesthetic mapping. The aes function is where we define how the data is interpreted and uniquely represented by different shapes, colors, line types, etc. Basically, aes automatically styles your plotted data based on groupings defined within your dataset. For instance aes(color = Gender) in a line plot will result in a unique line being drawn for each gender in your dataset, in addition to these genders being colored differently. Similarly, aes(group = Gender) will split your plot into a set of unique lines for each gender, but it will not color them based on these groupings. x and y refer to how your data is associated with the respective axes, while color will automatically color your plot based on the value that is found in the specified column. We can also use the color function outside of the aes function. Instead of assigning datapoints specific color values, as is done within aes, when outside the aes function, color simply paints all your lines a specific color.

ggplot() + 
    geom_line(data = dataset.wave, 
                        aes(x = sample, y = value, group = Component),
                        color = 'red')

Notice how we replaced color with group within the aes function. Since all data points are now labeled with the same color, we must use the group argument in aes to specify that these datapoints need to be seperated based on their group value. If we dont, the following happens…

ggplot() + 
    geom_line(data = dataset.wave, 
                        aes(x = sample, y = value, color = group),
                        color = 'red')