Understanding geom_col in R and Creating Bars Next to Each Other
===========================================================
geom_col is a powerful geometric layer in ggplot2, a popular data visualization library for R. It allows us to create various types of columns in our plots, such as boxplots, histograms, or simply bars. In this article, we’ll explore how to use geom_col to create bars next to each other for different categories.
Background
ggplot2 is built on top of the Grammar of Graphics, a language that allows us to specify what graphics we want to create in a declarative way. This means that instead of telling ggplot exactly what to do with our data, we tell it how we want our plot to look and behave.
Setting Up Our Data
Let’s first create some sample data that we’ll use for this example. We’ll assume that we have a dataset A2c with three variables: maritalStatus, Geschlecht (which is German for “sex” or “gender”), and two additional ones, nbr_total and Adipös.
library(tidyverse)
# Create sample data
A2c <- data.frame(
maritalStatus = c("Married", "Married", "Widowed", "Widowed", "Divorced", "Divorced", "Separated", "Separated", "Never married", "Never married"),
Geschlecht = c("Männlich", "Weiblich", "Männlich", "Weiblich", "Männlich", "Weiblich", "Männlich","Weiblich", "Männlich", "Weiblich"),
nbr_total = sample(1:1500, 10),
Adipös = sample(1:600, 10)
)
The Problem
The problem with the current code is that it’s plotting two bars for each marital status. However, we want to plot one bar for men and one for women next to each other for each marital status.
Solution
To solve this problem, we can use the position_dodge function from ggplot2. This function allows us to specify a distance between the bars in our columns.
# Filter out marital statuses that are not equal to "Refused"
A2c <- A2c %>%
filter(maritalStatus != "Refused")
# Group by marital status and gender, calculate total numbers and adiposity percentages
A2c <- A2c %>%
group_by(maritalStatus, Geschlecht) %>%
summarise(
nbr_total = n(),
nbr_adipos = sum(Adipös),
adipos_prozent = 100 * nbr_adipos / nbr_total
)
# Filter out rows where marital status is "Refused"
A2c <- A2c %>%
filter(maritalStatus != "Refused")
# Create a ggplot object
ggplot(A2c, aes(x = maritalStatus, y = adipos_prozent, color = Geschlecht, fill = Geschlecht)) +
geom_col(position = position_dodge(1)) +
theme(axis.text.x = element_text(angle = 90)) +
labs(y = "% Adipöser", x = "Marital Status")
How It Works
When we use position_dodge in our geom_col function, it tells ggplot to create space between each bar. The first argument to position_dodge, which is 1 in this case, specifies the width of the dodge (the distance between the bars).
By using a fixed value for the dodge, we can ensure that all the bars are spaced evenly apart.
Output
When we run our code, we get a plot with one bar for men and one for women next to each other for each marital status. The position_dodge function has successfully created space between our bars.
# Output:
Note: The output will be an image, not the actual image itself.
This is the final code that we used:
library(tidyverse)
A2c <- data.frame(
maritalStatus = c("Married", "Married", "Widowed", "Widowed", "Divorced", "Divorced", "Separated", "Separated", "Never married", "Never married"),
Geschlecht = c("Männlich", "Weiblich", "Männlich", "Weiblich", "Männlich", "Weiblich", "Männlich","Weiblich", "Männlich", "Weiblich"),
nbr_total = sample(1:1500, 10),
Adipös = sample(1:600, 10)
)
A2c <- A2c %>%
filter(maritalStatus != "Refused")
A2c <- A2c %>%
group_by(maritalStatus, Geschlecht) %>%
summarise(
nbr_total = n(),
nbr_adipos = sum(Adipös),
adipos_prozent = 100 * nbr_adipos / nbr_total
)
A2c <- A2c %>%
filter(maritalStatus != "Refused")
ggplot(A2c, aes(x = maritalStatus, y = adipos_prozent, color = Geschlecht, fill = Geschlecht)) +
geom_col(position = position_dodge(1)) +
theme(axis.text.x = element_text(angle = 90)) +
labs(y = "% Adipöser", x = "Marital Status")
Last modified on 2023-09-21