# Yet Another How-to on Labelling Bar Graphs in ggplot2

Thursday • October 26, 2023

Header visualization from “maischberger” (see my note below)

## Introduction

Yes, I have written about creating bar charts with `{ggplot2}` before. As one of the most common chart types, creating bar charts is a task that thousands of people likely face every day. In an old blog post I’ve shown various ways

1. how to calculate the percentage values,
2. how to position the percentage labels inside, and
3. how to color the bars using different colors.

Inspired by a question by one of my clients, I am now extending that list by showcasing

1. how to place the category labels above the bars.

## Data Preparation

I am using the `diamonds` data set from the `{ggplot2}` package to generate shares of diamonds for five different categories describing the quality of the cut. In a first step, I am calculating the shares per quality and turn the categories into a factor ordered by that metric.

``````library(dplyr)
library(ggplot2)

diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, prop))``````
``````## # A tibble: 5 × 2
##   cut         prop
##   <ord>      <dbl>
## 1 Ideal     0.400
## 3 Good      0.0910
## 4 Very Good 0.224
## 5 Fair      0.0298``````

There are multiple other ways to calculate the shares, including `diamonds |> mutate(n = n()) |> summarize(prop = n() / unique(n), .by = cut)`. Instead of using the experimental `.by` argument you can also group your data first with `group_by(cut)` before summarizing per cut quality.

The last step is not needed in our example case here as the ranking by shares follows the defined order of the cut qualities. However, in most other cases you likely have to sort your categories on your own.

## Create a Basic Bar Chart

Now, I can easily pass the summarized data set to `ggplot()` and create a simple horizontal bar graph:

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, prop)) |>
ggplot(aes(prop, cut)) +
geom_col()``````

Alternatively, you can transform the complete data set on the fly instead of calculating shares first:

``````ggplot(diamonds, aes(y = cut, x = after_stat(count / sum(count)))) +
geom_bar()``````

## Style the Visualization

If you know me a bit, you know that before moving on I have to modify the theme and fix the grid lines (read: remove them all together in this case).

Also, I am modifying the x axis range and labels. Instead of showing proportions, I decide to show percentages (0-100). Also, to follow good practice I am adding the percentage label to the axis using `label_percent()` from the `{scales}` package. I am also removing the padding on the left and right of the bars and adjust the limits so that the 40% label is shown as well.

``````theme_set(theme_minimal(base_family = "Spline Sans"))
theme_update(
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.line.x = element_line(color = "grey80", linewidth = .4),
axis.ticks.x = element_line(color = "grey80", linewidth = .4),
axis.title.y = element_blank(),
plot.margin = margin(10, 15, 10, 15)
)

diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, prop)) |>
ggplot(aes(prop, cut)) +
geom_col() +
scale_x_continuous(
expand = c(0, 0), limits = c(0, .4),
labels = scales::label_percent(),
name = "Proportion"
) ``````

## Place Category Labels on the Top

The approach I take to now to move the labels to the top of the bars is: faceting!

There are multiple options including placing the labels with geom_text and shifting them upwards. But by far the fastest way (and also likely the one that breaks last when the number of bars changes) is using the facet functionality of `{ggplot2}`.

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, prop)) |>
ggplot(aes(prop, cut)) +
geom_col() +
facet_wrap(~ cut) +
scale_x_continuous(
expand = c(0, 0), limits = c(0, .4),
labels = scales::label_percent(),
)``````

It doesn’t work “out of the box”, however. But that’s a quick fix if you know about the `ncol` and the `scales` arguments in the `facet_wrap()` function! The trick is that we force all small multiples in a single column (so that bars share a common baseline again) by setting `ncol = 1`. By default, the axis ranges are kept constant across small multiples. By setting `scales = "free_y"` we can free the axis range which removes redundant, empty groups and all the resulting white space.

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, -prop)) |>
ggplot(aes(prop, cut)) +
geom_col() +
facet_wrap(~ cut, ncol = 1, scales = "free_y") +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .4), labels = scales::label_percent()
)``````

Note that we also have to flip the order of our categories as now they’re ordered top to bottom, not bottom to top anymore.

The final step is cleaning up the labels. First, let’s remove the category names on the y axis by passing `guide = "none"` in `scale_y_discrete()`.

To modify the new labels, the so-called strip texts, we address the text element `strip.text` via `theme()`. The margin of zero on the left ensures that, together with the horizontal justification (`hjust = 0`) that the strip text labels are full left-aligned with the baseline of the bars. The small margin at the top and the bottom ensure that the labels are not clipped (e.g. that the descender of y is shown completely).

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, -prop)) |>
ggplot(aes(prop, cut)) +
geom_col() +
facet_wrap(~ cut, ncol = 1, scales = "free_y") +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .4), labels = scales::label_percent()
) +
scale_y_discrete(guide = "none") +
theme(strip.text = element_text(
hjust = 0, margin = margin(1, 0, 1, 0),
size = rel(1.1), face = "bold"
))``````

To add some spacing between the last bar and the axis line, one can adjust the vertical padding of each panel by passing `expansion(add = c(.8, .6)` to the `expand` argument in `scale_y_discrete()`.

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, -prop)) |>
ggplot(aes(prop, cut)) +
geom_col() +
facet_wrap(~ cut, ncol = 1, scales = "free_y") +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .4), labels = scales::label_percent()
) +
scale_y_discrete(
guide = "none", expand = expansion(add = c(.8, .6))
) +
theme(strip.text = element_text(
hjust = 0, margin = margin(1, 0, 1, 0),
size = rel(1.1), face = "bold"
))``````

## Bonus: Style the Bars

Let’s merge this new approach with some of the tricks from my previous blog post. We add direct labels and highlight the top-ranked category.

#### Highlight Top-Ranked Category

By mapping the cut variable to fill, bars would be colored by categories. To color only the first, top ranked bar, I am making use of the rank which is equal to the factor level. Thus, mapping the fill to `as.numeric(cut) == 1)` returns `TRUE` for “Ideal” and `FALSE` otherwise. To customize the fill colors, we add `scale_fill_manual()` to pass a vector of two custom colors. As we don’t need a legend, we also set `guide = "none"`.

``````p <-
diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, -prop)) |>
ggplot(aes(prop, cut)) +
geom_col(aes(fill = as.numeric(cut) == 1)) +
facet_wrap(~ cut, ncol = 1, scales = "free_y") +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .4), labels = scales::label_percent()
) +
scale_y_discrete(guide = "none", expand = expansion(add = c(.8, .6))) +
scale_fill_manual(values = c("grey50", "#1D785A"), guide = "none") +
theme(strip.text = element_text(
hjust = 0, margin = margin(1, 0, 1, 0),
size = rel(1.1), face = "bold"
))

p``````

#### Add Percentages as Direct Labels

Similarly, we can pass an expression to `color` and `hjust` inside the `geom_text()` component that we use to add the direct labels. As `TRUE` is encoded as `1`, all group that have a share lower than 5% are right-aligned while all others are left-aligned (as `FALSE` = `0`). To move the labels a bit more inside and outside, respectively, I am cheating by adding some spaces before and after the label.

``````p +
geom_text(
aes(label = paste0("  ", sprintf("%2.1f", prop * 100), "%  "),
color = prop > .05, hjust = prop > .05),
size = 4, fontface = "bold", family = "Spline Sans"
) +
scale_color_manual(values = c("black", "white"), guide = "none")``````

Alternatively, you can pass the value for `hjust` directly by using an `ifelse`or `if_else` condition: `hjust = if_else(prop > .05, 1.2, -.2)`:

``````p +
geom_text(
aes(label = paste0(sprintf("%2.1f", prop * 100), "%"),
color = prop > .05, hjust = if_else(prop > .05, 1.2, -.2)),
size = 4, fontface = "bold", family = "Spline Sans"
) +
scale_color_manual(values = c("black", "white"), guide = "none")``````

The same logic applies when we want to control the text color, which is recommended here to increase the contrast. With the final `scale_color_manual()` I change the text color to white in case the label is placed inside the bar and black otherwise.

Another way to style the labels would be `scales::label_percent(accuracy = .1, prefix = " ", suffix = "% ")(prop)` (or make use of the superseded `scales::percent()`) but that’s rather long and also not that easy to remember.

One could of course also remove the x axis as the values are now shown as direct labels.

``````p +
geom_text(
aes(label = paste0("  ", sprintf("%2.1f", prop * 100), "%  "),
color = prop > .05, hjust = prop > .05),
size = 4, fontface = "bold", family = "Spline Sans"
) +
scale_x_continuous(guide = "none", name = NULL, expand = c(0, 0)) +
scale_color_manual(values = c("black", "white"), guide = "none")``````

## Alternative Approach

Here is an approach using `geom_text()`. The trick here is to (i) reduce the width (read: height in our case) of the bars to allow for space for the labels and (ii) add the labels with `geom_text()` in combination with a custom `vjust` or `nudge_y` setting.

``````diamonds |>
summarize(prop = n() / nrow(diamonds), .by = cut) |>
mutate(cut = forcats::fct_reorder(cut, prop)) |>
ggplot(aes(prop, cut)) +
geom_col(width = .5) +
geom_text(
aes(label = cut, x = 0),
family = "Spline Sans", fontface = "bold",
hjust = 0, vjust = -1.7, size = 4.5
) +
scale_x_continuous(
expand = c(0, 0), limits = c(0, .4),
labels = scales::label_percent(),
name = "Proportion"
) +
scale_y_discrete(guide = "none")``````

That’s a great solution, too. I see some potential issues coming up here, for example problems in case the labels become larger (can be fixed by removing the clipping and adding some margin) or the number of bars increases (and that may be especially a problem in an automated workflow). In the latter case, the space between bars may become too small and/or the placement of the labels, adjusted via `vjust` or `nudge_y`, is not perfectly above the bars anymore.

## Conclusion

To illustrate the different behavior of the two approaches, let’s run the exact same codes on a new data set with more categories:

``````p1 <-
mpg |>
filter(year == "2008") |>
summarize(prop = n() / nrow(mpg), .by = manufacturer) |>
mutate(manufacturer = forcats::fct_reorder(stringr::str_to_title(manufacturer), -prop)) |>
ggplot(aes(prop, manufacturer)) +
geom_col() +
facet_wrap(~ manufacturer, ncol = 1, scales = "free_y") +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .1), labels = scales::label_percent()
) +
scale_y_discrete(
guide = "none", expand = expansion(add = c(.8, .6))
) +
theme(strip.text = element_text(
hjust = 0, margin = margin(1, 0, 1, 0),
size = rel(1.1), face = "bold"
))

p2 <-
mpg |>
filter(year == "2008") |>
summarize(prop = n() / nrow(mpg), .by = manufacturer) |>
mutate(manufacturer = forcats::fct_reorder(stringr::str_to_title(manufacturer), prop)) |>
ggplot(aes(prop, manufacturer)) +
geom_col(width = .5) +
geom_text(
aes(label = manufacturer, x = 0),
family = "Spline Sans", fontface = "bold",
hjust = 0, vjust = -1.7, size = 4.5
) +
scale_x_continuous(
name = "Proportion", expand = c(0, 0),
limits = c(0, .1), labels = scales::label_percent()
) +
scale_y_discrete(guide = "none")

library(patchwork)
p1 + p2``````

Both approaches have their pros and cons. In circumstances, where you can tweak the exact setting of bar widths, font sizes, and vertical justification, the `geom_text()` approach might be easier to set up.

Using the `facet_wrap()` approach ensures that the labels are always above the bars and that the labels are not clipped by the panel or plot border. This is especially powerful in case the data changes and charts need to be updated without any further modifications. Or if you want to apply a function to multiple data sets without the need to include further arguments to modify the widths and spacing. At the same time, the thinner bars make it more difficult to place labels inside the bars. However, the same issue would pop up when adjusting the widths and font sizes in the `geom_text()` example.

Finally, I should note that also the facet approach will break at some point: if the figure height is not sufficient, no bars are visible at all. But scaling the figure height based on the number of categories is something one can easy automate as well.

#### Note on the Header Image

I’ve seen this bar chart on the german TV talk show “maischberger”, airing on September 27, 2023. A wonderful revival of 3D-bars, combined with a glossy, transparent gradient style. It shows the number of newly constructed apartments per year over time.

R Session Info
``````## R version 4.3.0 (2023-04-21)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.2.1
##
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] patchwork_1.1.2   ggplot2_3.4.3     dplyr_1.1.0       systemfonts_1.0.4
##
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4      jsonlite_1.8.7    highr_0.10        compiler_4.3.0    tidyselect_1.2.0  stringr_1.5.0     jquerylib_0.1.4   scales_1.2.1
##  [9] textshaping_0.3.6 yaml_2.3.7        fastmap_1.1.1     R6_2.5.1          labeling_0.4.2    generics_0.1.3    knitr_1.42        forcats_1.0.0
## [17] tibble_3.2.1      bookdown_0.35     munsell_0.5.0     bslib_0.5.1       pillar_1.9.0      rlang_1.1.1       utf8_1.2.3        stringi_1.7.12
## [25] cachem_1.0.8      xfun_0.40         sass_0.4.7        cli_3.6.1         withr_2.5.0       magrittr_2.0.3    digest_0.6.33     grid_4.3.0
## [33] rstudioapi_0.15.0 lifecycle_1.0.3   vctrs_0.6.3       evaluate_0.20     glue_1.6.2        farver_2.1.1      blogdown_1.18     ragg_1.2.5
## [41] fansi_1.0.4       colorspace_2.1-0  rmarkdown_2.20    tools_4.3.0       pkgconfig_2.0.3   htmltools_0.5.6``````