How to create a simple heatmap in R ggplot2

heatmap in R ggplot2, heatmap in ggplot, heatmap in R, geom_tile, reorder heatmap in R, heatmap with values in R, heatmap with data labels in R, tile border in R heatmap

Here is how to quickly build a heatmap in R ggplot2 and add extra formatting by using a color gradient, data labels, reordering, or custom grid lines. There might be a problem if the data contains missing values. At the end of this post is an example of how to deal with NA values in the ggplot2 heatmap.

 

Here is the data set with United States personal expenditures (in billions of dollars) by categories and years.

USPersonalExpenditure

#                      1940   1945  1950 1955  1960
# Food and Tobacco    22.200 44.500 59.60 73.2 86.80
# Household Operation 10.500 15.500 29.00 36.5 46.20
# Medical and Health   3.530  5.760  9.71 14.0 21.10
# Personal Care        1.040  1.980  2.45  3.4  5.40
# Private Education    0.341  0.974  1.80  2.6  3.64

library(reshape2)

ue <- melt(USPersonalExpenditure)
ue <- setNames(ue, c("categories", "year", "expenditures"))

head(ue)

#           categories years expenditures
# 1    Food and Tobacco  1940       22.200
# 2 Household Operation  1940       10.500
# 3  Medical and Health  1940        3.530
# 4       Personal Care  1940        1.040
# 5   Private Education  1940        0.341
# 6    Food and Tobacco  1945       44.500

 

Heatmap in R with custom tile borders

Here is how to create a simple heatmap in ggplot2 by using the geom_tile.

require(ggplot2)

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile() +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  theme(text = element_text(size = 15))

You can specify a theme and hide elements that appear in the background, but it is not critical. For a better-looking heatmap, I will remove the padding between heatmap tiles and the axis, axis ticks, and titles. To better distinguish the geom_tile elements, you can add borders for each of them.

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile(colour = "white", linewidth = 1) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(text = element_text(size = 15)
    ,axis.title = element_blank()
    ,axis.ticks = element_blank())

 

ggplot2 heatmap with data labels

If you want to add the value of each tile to the ggplot2 heatmap, here is how to do that. You can round numbers to reduce decimal spaces and use this technique to keep trailing zeros.

My favorite is the function digits from the formattable because data is not changing numeric properties.

ue$expenditures <- formattable::digits(ue$expenditures, digits = 2)

Don’t worry about the result of rounding. The function digits do the same as the round function in R.

round(1.799, digits = 2)

#[1] 1.8

formattable::digits(1.799, digits = 2)

#[1] 1.80

Another good thing to do is split lengthy text. In this case, I’m replacing the first whitespace with the new line character.

ue$categories <- sub(" ", "\n", ue$categories)

Here is how it looks in the R heatmap with values.

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0))

 

Reorder ggplot2 heatmap

Here is how to reorder the ggplot2 heatmap by using the function reorder. By default, this function uses the mean value for order, but you can try different calculations in the FUN argument. To use a different ordering principle in the function reorder, use a minus sign before the second argument.

In addition, I moved the ggplot2 axis tick labels on top of the heatmap.

ggplot(ue, aes(
  x = year,
  y = reorder(categories, expenditures),
  fill = expenditures
)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0), position = "top") +
  scale_y_discrete(expand = c(0, 0)) +
  labs(
    title = "United States personal expenditures (in billions of dollars) 
    in the categories and years") +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0)
        , plot.title = element_text(size = 15, color = "grey20", hjust = 0.5))

 

Dealing with NA values in the ggplot2 heatmap

I will create a missing value in the previously used data frame.

ue[25, 3] <- NA

tail(ue)

#             categories year expenditures
# 20   Private\nEducation 1955          2.6
# 21    Food\nand Tobacco 1960         86.8
# 22 Household\nOperation 1960         46.2
# 23  Medical\nand Health 1960         21.1
# 24       Personal\nCare 1960          5.4
# 25   Private\nEducation 1960           NA

Here is how it can lead to problems if I’m creating a heatmap with the previous code.

There is nothing that you can do in geom_tile with missing values. You can solve this problem by using the additional argument in the function reorder for the missing values and the same kind of arguments in the geom_text and the scale_fill_gradient.

ggplot(ue, aes(
  x = year,
  y = reorder(categories, expenditures, FUN = mean, na.rm = TRUE),
  fill = expenditures
)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5,  na.rm = TRUE) +
  scale_fill_gradient(
    low = "steelblue",
    high = "tomato",
    guide = "none",
    na.value = "gray80") +
  scale_x_continuous(expand = c(0, 0), position = "top") +
  scale_y_discrete(expand = c(0, 0)) +
  labs(
    title = "United States personal expenditures (in billions of dollars)
    in the categories and years") +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0)
        , plot.title = element_text(size = 15, color = "grey20", hjust = 0.5))

 

Please look at other visualizations in this blog made using R. For example, gradient line chartglowing line chart, and gradient word cloud.

If you want to see more examples of how to implement color gradients in ggplot2, look at this post about color gradients in the jitter plot or gradient line chart.

Leave a comment

Your email address will not be published. Required fields are marked *

Exit mobile version