heatmap in R ggplot2, heatmap in ggplot, heatmap in R, geom_tile, reorder heatmap in R, heatmap with values in R, heatmap with data labels in R, tile border in R heatmap

How to create a simple heatmap in R ggplot2

Here is how to quickly build a heatmap in R ggplot2 and add extra formatting by using a color gradient, data labels, reordering, or custom grid lines. There might be a problem if the data contains missing values. At the end of this post is an example of how to deal with NA values in the ggplot2 heatmap.

 

Here is the data set with United States personal expenditures (in billions of dollars) by categories and years.

USPersonalExpenditure

#                      1940   1945  1950 1955  1960
# Food and Tobacco    22.200 44.500 59.60 73.2 86.80
# Household Operation 10.500 15.500 29.00 36.5 46.20
# Medical and Health   3.530  5.760  9.71 14.0 21.10
# Personal Care        1.040  1.980  2.45  3.4  5.40
# Private Education    0.341  0.974  1.80  2.6  3.64

library(reshape2)

ue <- melt(USPersonalExpenditure)
ue <- setNames(ue, c("categories", "year", "expenditures"))

head(ue)

#           categories years expenditures
# 1    Food and Tobacco  1940       22.200
# 2 Household Operation  1940       10.500
# 3  Medical and Health  1940        3.530
# 4       Personal Care  1940        1.040
# 5   Private Education  1940        0.341
# 6    Food and Tobacco  1945       44.500

 

Heatmap in R with custom tile borders

Here is how to create a simple heatmap in ggplot2 by using the geom_tile.

require(ggplot2)

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile() +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  theme(text = element_text(size = 15))

geom_tile in ggplot2

You can specify a theme and hide elements that appear in the background, but it is not critical. For a better-looking heatmap, I will remove the padding between heatmap tiles and the axis, axis ticks, and titles. To better distinguish the geom_tile elements, you can add borders for each of them.

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile(colour = "white", linewidth = 1) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(text = element_text(size = 15)
    ,axis.title = element_blank()
    ,axis.ticks = element_blank())

heatmap in R ggplot, remove heatmap axis ticks in R, ggplot2 heatmap tile border color, R heatmap tile border color

 

ggplot2 heatmap with data labels

If you want to add the value of each tile to the ggplot2 heatmap, here is how to do that. You can round numbers to reduce decimal spaces and use this technique to keep trailing zeros.

My favorite is the function digits from the formattable because data is not changing numeric properties.

ue$expenditures <- formattable::digits(ue$expenditures, digits = 2)

Don’t worry about the result of rounding. The function digits do the same as the round function in R.

round(1.799, digits = 2)

#[1] 1.8

formattable::digits(1.799, digits = 2)

#[1] 1.80

Another good thing to do is split lengthy text. In this case, I’m replacing the first whitespace with the new line character.

ue$categories <- sub(" ", "\n", ue$categories)

Here is how it looks in the R heatmap with values.

ggplot(ue, aes(x = year, y = categories, fill = expenditures)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0)) +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0))

R heatmap with data labels, heatmap in R with values, ggplot2 heatmap with values

 

Reorder ggplot2 heatmap

Here is how to reorder the ggplot2 heatmap by using the function reorder. By default, this function uses the mean value for order, but you can try different calculations in the FUN argument. To use a different ordering principle in the function reorder, use a minus sign before the second argument.

In addition, I moved the ggplot2 axis tick labels on top of the heatmap.

ggplot(ue, aes(
  x = year,
  y = reorder(categories, expenditures),
  fill = expenditures
)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5) +
  scale_fill_gradient(low = "steelblue",
                      high = "tomato",
                      guide = "none") +
  scale_x_continuous(expand = c(0, 0), position = "top") +
  scale_y_discrete(expand = c(0, 0)) +
  labs(
    title = "United States personal expenditures (in billions of dollars) 
    in the categories and years") +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0)
        , plot.title = element_text(size = 15, color = "grey20", hjust = 0.5))

ggplot2 axis tick labels on top, reorder heatmap in R, reorder ggplot2 heatmap, ggplot2 axis labels on top

 

Dealing with NA values in the ggplot2 heatmap

I will create a missing value in the previously used data frame.

ue[25, 3] <- NA

tail(ue)

#             categories year expenditures
# 20   Private\nEducation 1955          2.6
# 21    Food\nand Tobacco 1960         86.8
# 22 Household\nOperation 1960         46.2
# 23  Medical\nand Health 1960         21.1
# 24       Personal\nCare 1960          5.4
# 25   Private\nEducation 1960           NA

Here is how it can lead to problems if I’m creating a heatmap with the previous code.

ggplot2 heatmap with mising value, ggplot2 heatmap with NA

There is nothing that you can do in geom_tile with missing values. You can solve this problem by using the additional argument in the function reorder for the missing values and the same kind of arguments in the geom_text and the scale_fill_gradient.

ggplot(ue, aes(
  x = year,
  y = reorder(categories, expenditures, FUN = mean, na.rm = TRUE),
  fill = expenditures
)) +
  geom_tile(colour = "white", linewidth = 1) +
  geom_text(aes(label = expenditures), color = "white", size = 5,  na.rm = TRUE) +
  scale_fill_gradient(
    low = "steelblue",
    high = "tomato",
    guide = "none",
    na.value = "gray80") +
  scale_x_continuous(expand = c(0, 0), position = "top") +
  scale_y_discrete(expand = c(0, 0)) +
  labs(
    title = "United States personal expenditures (in billions of dollars)
    in the categories and years") +
  theme(text = element_text(size = 15)
        , axis.title = element_blank()
        , axis.ticks = element_blank()
        , axis.text.y = element_text(hjust = 0)
        , plot.title = element_text(size = 15, color = "grey20", hjust = 0.5))

missing value in R ggplot2 heatmap, NA values in ggplot2 heatmap

 

Please look at other visualizations in this blog made using R. For example, gradient line chartglowing line chart, and gradient word cloud.

If you want to see more examples of how to implement color gradients in ggplot2, look at this post about color gradients in the jitter plot or gradient line chart.


Posted

in

,

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *