Ggplot2 is a system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. The ggplot works on the philosophy of adding layers to the visualization to visualize your data effectively.
It has 7-layers grammatical elements as shown below:
In this article, we will use mtcars inbuilt data frame to understand various aspects of ggplot.
The actual variables which need to be plotted. When you just associate data element with ggplot, it doesn’t show anything.
For example, the following code will not show anything:
Allows us to specify the features, the columns (i.e. dimension) that we want to plot. The aesthetics mapping described how variables in the data are mapped to visual properties of geoms.
You typically specify this using the function aes,
aes(x, y, …),
where x, y, … are list of name value pair
An example usage look like:
ggplot(data=mtcars, aes(x=mpg, y=wt))
- Even this statement would not plot anything as it doesn’t know what geometry to apply
- For the first two parameters, mentioning the x and y names are optional
Using this we associate the shapes that we intend to use to present the data using ggplot. After adding this layer, the ggplot knows how to show the data.
When you add this layer, a sample code would look like as shown below
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point()
When above code gets executed on the console, you will get an output as shown below:
Great thing about ggplot is that just by changing the shape of geom, you will be able to plot your desires plot. For example, below code will produce a dotplot chart:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_dotplot( binwidth = 2)
When you have data, aesthetics and geometrics in place then the plot will become visible.
These three layers allows us maximum flexibility to make subtle changes in each layer to clearly communicate our message.
Other Geometries available as part of ggplot
On command line, you can type geom_ and wait for the suggestion. This shows the complete list of shapes available for plotting your data.
Facets allow us to put multiple charts / plots on one canvas. Essentially, you can think of facets grouping your data on certain dimension and generating small multiple plots, each displaying a different subset of data.
For example, below code plots the point data by plotting the same graph for different gears:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) #Facets
The output looks as shown below:
Here the point plot is divided into 3-separate point plots separated on the number of gears in the cars.
There are two types of facets that you can make use of
- facet_grid(): In above example, we have used this facet, which lays out panels in the grid format.
- facet_wrap(): This facet wraps 1d ribbon of panels into 2d. In below example we have used this facets to show the point plot on gears in two columns format:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_wrap(~gear, ncol=2)
Facets along two dimension
So far we just used facets along one dimension in the previous examples. We used the dot (.) for the other dimension to indicate that we just want to consider one dimension. However, you can also make use of two facets (x and y coordinates) to get 2-dimensional views of the plot.
Try following code and observe the output:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~cyl )
Following outcome shows that the weight and mileage are plotted against the cylinder and number of gears:
Sometimes your data must be transformed or summarized before it is mapped to an aesthetic. Often the geom has default stats, which may work alright and your data visualization will make sense. However, many times providing a different stats does improve the clarity.
For example, the below code is using stat_smooth to help the eye in seeing patterns in the presence of overplotting by putting a shadow under the plot:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) + stat_smooth()
Following is the outcome, which further enhances the visualization of the plot compared to the default stats:
A coordinate system controls how positions are mapped to the plot. It is often used to apply limit on x-axis or y-axis to play with x vs y ratio and hence customize the visual as needed.
Typically we come across following coordinate systems:
- Cartesian coordinates
- Polar coordinates and
- Spherical projection
The cartesian coordinate is the most commonly used coordinate system and you will find this in many plots.
In the following examples, the xlim parameter has been used to once decrease the limit and then increase the limit:
You can run following code on the R-console to see above visuals:
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) + stat_smooth() + coord_cartesian(xlim = c(13, 30)) ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) + stat_smooth() + coord_cartesian(xlim = c(1, 40))
Themes allows you to enrich your data presentation with appropriate use of labels, positions, fonts, colors, etc. ggplot provides themes like theme_bw(), theme_classic, theme_dark, theme_light, etc. to allow you apply different styles on your data. Also, it allows you to create your own theme – if needed.
In below example, we have applied dark theme to
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) + stat_smooth() + coord_cartesian(xlim = c(13, 30)) + theme_dark()
Which shown following plot, which is of course darker than the one that we have seen in earlier examples:
Updating an existing theme
In above dark theme, we just applied the dark color. What if we want to make the legends bold and of higher weight. Also, if we want see a yellow dot instead of the black dot then what do we need to do? All these things are achievable through appropriate modification of themes layer.
Here is a sample code, where a theme is being customized
ggplot(data=mtcars, aes(x=mpg, y=wt)) + geom_point() + facet_grid(gear ~ .) + stat_smooth() + coord_cartesian(xlim = c(13, 30)) + theme_dark() + theme(axis.text.x = element_text(colour="blue", size=rel(1.2)), axis.title.x = element_text(size=rel(2)), plot.background = element_rect(fill = 'green', colour = 'red'))
Following is the outcome of the changes:
In above example, we have done following:
- Changed the color and size of the text of titles on the X-axis
- Change the relative size of the X-axis
- Changed the background color of the canvas
The idea is that using the theme function, you can override the built-in themes. Further, you can create your own themes and reuse it.
I hope that through this article, I have been successful in introducing you to the different layers of ggplot and how they are used while visualizing data.