Bhaskar Karambelkar's Blog

Redoing some Bad Data Viz.

 

Tags: R-stats DataViz


I saw the above graph in my Twitter feed. This beauty comes from Business Insider and was part of this article describing the misery in the world. There are so many wrong visualization elements here. So let’s see what they are and if we can fix them.

  • Stacked Bar Chart are not useful when you have to compare the category which doesn’t align on an axis. In this case you can’t really compare the inflation values of each country because they don’t have a common baseline. Secondly it is very apparent that both the categories that is unemployment and inflation have different range, so the common range of -5 to 25 is not really ideal.
  • Vertical Labels Unless your head is attached at 270 degrees to your neck, it is really very hard to read vertical labels.
  • Legends are Confusing. Both the values are expressed in percentage, but only the Unemployment label has the % sign. Also notice the space between Unemploy and ment.
  • No order to the x-axis labels. neither alphabetical nor by the value of either inflation nor unemployment.

So how do we better this? Business Insider article cited the source of the chart as Société Générale’s Global Economic Outlook report. I couldn’t find the said report or the data for the said chart anywhere on their website. CIA World Fact book also has various indicators for each country, two of which are Inflation rate (consumer prices) & Unemployment rate. This would be perfect for demonstration purpose, even if the values from the CIA fact book may not be exactly same as that in the chart. Both these pages have a link to download the raw data in a tab separated format (TSV).

So after downloading the raw data and a few data wrangling in R, here is the result.

Note

  • Now instead of stacked bar chart you have two side-by-side charts. This allows you to compare the Inflation and Unemployment across Countries easily. Secondly each chart gets its own scale for the x-axis, which allows us to better scale the bars.
  • Instead of having you to rotate your head, now the chart is rotated so you can easily read each country label.
  • The x-axis labels are now consistent and both indicate that we’re looking at percentage values.
  • The y-axis data is sorted alphabetically as opposed to no order before.

For the interested the R code which produced the graph is shown below.

library(dplyr)
library(readr)
library(tidyr)
library(gridExtra)
library(ggplot2)
library(httr)

setwd('/Users/XYZ/Documents/cia-factbook/rankorder')
inf <- read_tsv('rawdata_2092.txt', col_names = F)
uemp <- read_tsv('rawdata_2129.txt', col_names = F)

countries <- c('Switzerland', 'Taiwan', 'Japan', 'Korea, South', 'United States',
               'Czech Republic', 'United Kingdom', 'Poland', 'China', 'Germany',
               'Netherlands', 'Mexico', 'Australia', 'France', 'Chile',
               'European Union', 'Italy', 'Indonesia', 'Brazil', 'Russia', 'Spain')

df <- inner_join(inf, uemp,by='X2')
df <- df %>% select(X2, X3.x, X3.y) %>%
  rename(Country=X2, unemployment=X3.y, inflation=X3.x)

df <- df %>% arrange(Country)
df <- df %>% mutate(unemployment=as.numeric(unemployment),
                    inflation=as.numeric(inflation),
                    Country=factor(Country, levels=rev(unique(df$Country)),
                                   ordered = T))


mytheme <- theme_bw() +
  theme(axis.ticks.y=element_blank()) +
  theme(panel.border=element_blank()) +
  theme(panel.grid=element_blank())

gInf <- ggplot(df %>% filter(Country %in% countries),
       aes(Country, inflation)) +
  geom_bar(stat='identity',fill='#C29365') + coord_flip() +
  xlab('') + ylab('Inflation (%)') + mytheme

gUemp <- ggplot(df %>% filter(Country %in% countries),
       aes(Country, unemployment)) +
  geom_bar(stat='identity', fill='#65ADC2') + coord_flip() +
  xlab('') + ylab('Unemployment (%)') + mytheme +
  theme(axis.text.y=element_blank())
grid.arrange(gInf, gUemp, ncol=2)