For our grand finale we’ll be exploring the ggplot2 package to create some visualizations. While ggplot2 is very popular, its best to think of it as the core to a cluster of extensions. In fact, ggplot2 refers to itself as belonging to the tidyverse ecosystem. You don’t need to know too much about that except that at its core, tidyverse is committed to writing R packages with common APIs (ways of communicating with
functions) and a shared philosophy of how things should work. The benefit to you is that typically everything in the ggplot2 ecosystem works pretty much the same way so you don’t have to spend a lot of time learning how each function works.

Each ggplot2 plot starts with creating a ggplot() object. A ggplot() contains data and an aesthetic (aes()) that describes how to use the data. Then you just keep adding layers onto it with other functions to create your desired visualization.

This may sound complicated and foreign to you at first. However, with a little experience (and time spent wrestling wiht other graphics functions and packages that don’t work this way), most people come to see the elegance of using ggplot2.

To get started, lets do a very simple plot. First off we are going to see a third way to get the mtcars data to generate nice labels. This time converting the numeric categories into labeled factors.


mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
                      labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
                    labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
                     labels=c("4cyl","6cyl","8cyl"))
mtcars$vs <- factor(mtcars$vs, levels = c(0,1), labels = c("V-shaped", "Straight"))

View(mtcars)

Then we’ll make our ggplot() object. Lets look at horsepower and mpg first.


> mt_plot <- ggplot(data = mtcars, aes(x=hp, y=mpg))
> mt_plot + geom_point()  # This tells ggplot to create a scatter point plot

A simple scatter point plot
Figure 1 - A simple scatter point plot using **ggplot2**.

We can add more information into the graph. Lets make the size of the points dependedent on the Displacement, the shape of the point dependent on the transmission, and the color depend on the weight.


> mt_plot <- ggplot(data = mtcars, aes(x=hp, y=mpg, size = disp, shape = am, color = wt))
> mt_plot <- geom_point()   # This add geom_point() to the plot
> mt_plot   # This displays the plot

Scatter point plot with more data displayed
Figure 2 - A scatter plot using color, size and shape to visualize more information.

Notice ggplot2 does a lot for you, such as adding legends, and automically scaling the size of the points. You can override those defaults as well.

You can keep adding additional layers, given that they work with your aesthetic. Essentially each geom_x, theme_x or stat_x will refine or change the existing plot. However, not everything will work so it can take some trial and error along with some searching on the web. The next chart doesn’t really seem to add much, but its a good example of adding geoms to a plot.


> mt_plot + geom_density_2d()

> mt_plot + stat_ellipse()

Density contour plot
Figure 3 - Adding density contours to the plot.
Scatter point plot with more data displayed
Figure 4 - Adding confidnece ellipses to teh plot.

Fitting lines to the plot is often helpful.


> mt_plot <- mt_plot + geom_smooth(method = lm, se = FALSE) 
> mt_plot

Scatter point plot with trend lines
Figure 5 - The plot with trendlines added for transmission type.

Once we get the content how we like it we can address more stylistic issues. First lets do labels and titles.


# Notice the indenting to make it easier to read and edit. This is a common style,
# there are others too. The most important thing is to be consistent.
> mt_plot <- mt_plot + scale_colour_gradient("Weight \n 1000lbs") + 
                       scale_shape("Transmission") + 
                       scale_size_continuous("Displacement\ncu.in.") + 
                       scale_x_continuous("Horsepower") + 
                       scale_y_continuous("Miles per Gallon") + 
                       ggtitle("Motor Trend Cars Analysis")
> mt_plot

Legends have been fixed
Figure 6 - The plot with legends improved.

We can also use some themes with ggthemes. If it isn’t installed and loaded do so first. This package has some preset themes that can probably make your chart cleaner and more stylish. Below are some you can try.


> mt_plot + theme_classic()
> mt_plot + theme_tufte()
> mt_plot + theme_clean()
> mt_plot + theme_economist()
> mt_plot + theme_fivethirtyeight()

> mt_plot <- mt_plot + theme_tufte()
> mt_plot

The tufte theme is derived from some classic books and thinking by Edward Tufte. His general philosophy is that your visualizations should have the minimal amount of ink to convey the story your graphic is trying tell. Anything else should be eliminated. As you can see below, a lot of what we might normally see in a graph is eliminated.

Now that we have the plot we want we can save it.


ggsave("my_final_plot.png")

The final plot
Figure 7 - The final plot with the tufte theme.

Try It Out

Explore creating other graphics in ggplot2. Perhaps use the state.x77 data. Use the reference page on the ggplot2 website to try out other geoms and other layers.