Replication of below article’s Data
and Visualizations
“We keep pumping money into the NHS.
Is it good value?
By Tom Calver”
Karim K. Kardous

Health Service and General Prelude

About the data/article in a nutshell: Tom uses a combination of sources to make his point → while not a bottom ‘performer’; especially compared to the US, UK’s National Health Service (NHS) did not wait for the COVID crisis to show its limits and harbor increasing consumer dissatisfaction.
That downward trend can be traced back to the early 2010’s; and that comes from the service running lean - too lean - but also inefficiently,
For more detail on the sources, feel free to visit the actual article with link embedded at the top of the page in the author’s name.
If you like to download the data, feel free to use urls here.

Overall Strategy for building first plot: The data for first plot is smoothed out/interpolated data; this is usually done to ‘well smooth’ the data and delineate clearer trends over time; in this case select European countries’ survey takers’ scores on their overall satisfaction level towards their respective countries’ public health services (ESS - European Social Survey).
In this case, there is an additional reason the author smoothed the data; it is to ‘complete’ the years given that the ESS is done once every two years; note that I am assuming here but it’s not an outrageous assumption to make. So in order to match that smoothing, I go with geom_smooth() from ggplot2 and keep span at default; after a few iterations; the data points (for non survey years) match highly to what Tom displays in the first graph.
Finally, since the graph is interactive, I use ggiraph package to emulate said interactivity; a JS based R package that lets you add tooltips/hover/highlight upon hover/downplay non-hovered, etc. all the usual things one expects from an interactive plot; without having to build a Shiny app; which for this exercise/first plot (and the rest); would be like building a Gatling gun to aim at an ant.

Steps taken:
While I try to be as detailed in my comments as possible; it’s still helpful to lay out the step by step process as a numbered list to get the overall chain of what the code is supposed to do on a high level - without having to go into the nitty gritty- the comments in the code chunks and in that regard should hopefully help:

Found the source of the data from HTML Source Page; clicked on Network tab after hovering on the plot panel; refreshed the page; and found ‘dataset’ under ‘datawrapper’.
The data was wide in structure (from raw csv):
17 columns (1 column for year, 8 hex-coded columns with imputed/smoothed values, and 8 columns for country abbreviations with survey data for even years, NULL otherwise).
Discovered the hex columns and country columns didn’t align in a straightforward way. Columns were randomly ordered within each set, requiring a ranking approach rather than pairwise matching (one hex column to the symmetrical position of the country labeled column).
Implemented a solution by sorting satisfaction scores per year, which helped group values by country through proximity of their scores. This approach works well since the values are interpolated through smoothing, making them very close to one another from row to row. This might not work in other cases, but it does here.
Combined two sorted datasets: year + hex columns, and year + country columns to create a properly aligned mapping, joined facts data (with scores/values) on newly created mapped long datasets (converted from wide- almost always much harder to work with) to then finalize the dataset for visualization. More detail on ‘finalized the dataset’ can be found in the comments of the actual code.
Prepared the final clean dataset for interactive plotting with ggiraph.

Health Service Satisfaction

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false
# check if the required package 'emo' is installed;
# if not, it might mean your renv environment is not fully restored.
# running `renv::restore()` will install all necessary packages
# to ensure consistent package versions for building this quarto document,
# effectively 'containerizing' your project and protecting it from future package changes.
if (!requireNamespace("emo", quietly = TRUE)) {
  message("\nIt looks like your environment might not be restored.\nRun `renv::restore()` to install required packages.\n")
}

# load packages
library(xml2)
library(downlit)
library(gdtools)
library(tidyverse)
library(quarto)
library(chromote)
library(here)
library(tidycensus)
library(janitor)
library(purrr)
library(ggtext)
library(ggshadow)
library(ggiraph)
library(gfonts)
library(showtext)
library(ggborderline)
library(patchwork)
library(shiny)
library(gt)
library(rsvg)
library(magick)
library(stringr)
library(ggimage)
library(emo)


theme_set_custom <- function() {
  
  # load google fonts 
  sysfonts::font_add_google("Roboto", "Roboto")
  sysfonts::font_add_google("Roboto Condensed", "Roboto Condensed")
  showtext::showtext_auto()

  # apply ggplot2 theme; can always be overwritten
  ggplot2::theme_set(
    ggplot2::theme_minimal() +
      ggplot2::theme(
        text = element_text(family = "Roboto", size = 11),
        plot.title = element_text(family = "Roboto", face = "bold", size = 16),
        plot.subtitle = element_text(family = "Roboto", size = 12),
        axis.text = element_text(family = "Roboto", size = 10),
        axis.title = element_text(family = "Roboto Condensed", size = 11)
      )
  )
}

theme_set_custom()

While above took care of importing required libraries and setting general options such as plot theme and text font to be used; below is the start of data related tasks; from initial pull, to wrangling, to finally output the visualizations.

Data Pull

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false
#|eval: false

invisible({
  b <- ChromoteSession$new()
  b$Page$navigate("https://www.thetimes.com/comment/columnists/article/we-keep-pumping-money-into-the-nhs-is-it-good-value-blq8bxc39")
  Sys.sleep(3) # allow some time for dynamic content to render
})

# extract all iframe srcs (joined by || in this case)
iframes_html <- b$Runtime$evaluate("Array.from(document.querySelectorAll('iframe')).map(el => el.src).join('||')")$result$value
# split and filter valid Datawrapper url's
chart_urls <- str_split(iframes_html, "\\|\\|")[[1]] |>
  str_subset("^https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+$")

all_data <- purrr::map_dfr(chart_urls, function(url) {  
  message("Navigating to: ", url)
  b$Page$navigate(url)
  Sys.sleep(3)
  
  html <- b$Runtime$evaluate("document.documentElement.outerHTML")$result$value
  
  # match visible chart values if any
  pattern <- 'aria-datavariables="year,\\s*([A-Z]+)".*?aria-datavalues="([0-9]{4}),\\s*([0-9.]+)"'
  matches <- str_match_all(html, pattern)[[1]]
  
  # match dataset.csv url as well 
  csv_pattern <- "https://datawrapper\\.dwcdn\\.net/[a-zA-Z0-9]+/\\d+/dataset\\.csv"
  csv_link <- str_extract(html, csv_pattern)
  if (is.na(csv_link)) {
    csv_link <- str_glue("{url}/dataset.csv")
  }
  
  tibble(
    chart_url = url,
    country = if(nrow(matches)) matches[, 2] else NA,
    year = if(nrow(matches)) as.integer(matches[, 3]) else NA,
    value = if(nrow(matches)) as.numeric(matches[, 4]) else NA,
    dataset_csv = csv_link
  )
})

# add a custom gt boilerplate -from {gt} package (great tables)-to reduce code redundancy (having to copy/paste same chunks of code every # time we turn a tibble into a gt object)
gt_nyt_custom <- function(x, title = '', subtitle = '', first_10_rows_only = TRUE){
  
  x <- x |> clean_names(case = 'title')
  numeric_cols <- x |> select(where(is.double)) |> names()
  integer_cols <- x |> select(where(is.integer)) |> names()
  
  title_fmt <- if(title != "") glue::glue("**{title}**") else ""
  subtitle_fmt <- if(subtitle != "") glue::glue("*{subtitle}*") else ""
  
  x |>
    (\(x) if (first_10_rows_only) slice_head(x, n = 10) else x)() |>
    gt() |> 
    tab_header(
      title = md(title_fmt),
      subtitle = md(subtitle_fmt)
    ) |> 
    tab_style(
      style = list(
        cell_text(color = '#333333')
      ),
      locations = cells_body()
    ) |> 
    tab_style(
      style = list(
        cell_text(color = '#CC6600', weight = 'bold')
      ),
      locations = cells_column_labels(everything())
    ) |> 
    fmt_number(
      columns = c(numeric_cols),
      decimals = 1
    ) |> 
    fmt_number(
      columns = c(integer_cols),
      decimals = 0
    ) |> 
    tab_options(
      table.font.names = c("Merriweather", "Georgia", "serif"),
      table.font.size = 14,
      heading.title.font.size = 18,
      heading.subtitle.font.size = 14,
      column_labels.font.weight = "bold",
      column_labels.background.color = "#eeeeee",
      table.border.top.color = "#dddddd",
      table.border.bottom.color = "#dddddd",
      data_row.padding = px(6),
      row.striping.include_table_body = TRUE,
      row.striping.background_color = "#f9f9f9"
    )
  
}

Metadata

Show the code

#|echo: false
#|message: false
#|warning: false

# reveal dataset urls/csvs
all_data |>
  count(
    url = chart_url, download_link = dataset_csv
  ) |>
  select(-n) |>
  gt_nyt_custom(
    title = 'Dataset Ids'
  ) |>
  cols_label(
    Url = "Plot URL",
    `Download Link` = "Link to CSV"
  ) |> 
  tab_footnote(
    "In the event you download the links yourself and run your own script,
    the third and last should be treated as tsv files, otherwise csv's"
  )

Dataset Ids

Plot URL	Link to CSV
https://datawrapper.dwcdn.net/7NJRB/1	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
https://datawrapper.dwcdn.net/Bxhol/4	https://datawrapper.dwcdn.net/Bxhol/4/dataset.csv
https://datawrapper.dwcdn.net/JH3Qn/1	https://datawrapper.dwcdn.net/JH3Qn/1/dataset.csv
https://datawrapper.dwcdn.net/Mc3q2/2	https://datawrapper.dwcdn.net/Mc3q2/2/dataset.csv
https://datawrapper.dwcdn.net/eXQPs/1	https://datawrapper.dwcdn.net/eXQPs/1/dataset.csv
In the event you download the links yourself and run your own script, the third and last should be treated as tsv files, otherwise csv's

Data Sample

Show the code

# reveal data sample for year 2004 as an example
all_data |> 
  filter(year == 2004) |>
  select(2:last_col()) |> 
  gt_nyt_custom() |> 
  tab_header(
    title = md("**Chart Data Summary**"),
    subtitle = md("*Extracted from embedded datawrapper in the HTML*")
  )

Chart Data Summary
Extracted from embedded datawrapper in the HTML
Country	Year	Value	Dataset Csv
DE	2,004	4.7	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
PT	2,004	3.5	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
IE	2,004	4.1	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
NO	2,004	5.7	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
FR	2,004	5.8	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
ES	2,004	5.8	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv
GB	2,004	5.4	https://datawrapper.dwcdn.net/7NJRB/1/dataset.csv

Data Wrangling

Show the code

#|echo: false
#|message: false
#|warning: false

# for easier referencing, assign file types (csv where appropriate, tsv otherwise) 
file_info <- tibble(
  path = unique(all_data$dataset_csv),
  name = c("health_service_sat:csv", "value_for_money:csv", 
           "room_to_improve:tsv", "barely_beds:csv", "budget_breakdowns:tsv")
  ) |> 
  separate(
    name, into = c('dataset_name', 'file_type'), sep = "\\:"
  )

# loop thru datasets, read them in, and then assign them to the global environment
invisible({
  file_info |>
    mutate(
      data = pmap(
        list(path, file_type),
        \(path, file_type) if(file_type == "csv") read_csv(path) else read_tsv(path)
      )
    ) |>
    select(dataset_name, data) |>
    deframe() |>
    list2env(envir = .GlobalEnv)
}
)

# set country 'switch'; so that tooltip can reflect full country name (spelled out) accordingly for imputed values (non survey years smoothed values)
country_labels <- c(
  NO = "Norway", DE = "Germany", ES = "Spain", 
  FR = "France", GB = "UK", IE = "Ireland", PT = "Portugal"
  )
country_label_tibble <- c(
    NO = "Norway", DE = "Germany", ES = "Spain",
    FR = "France", GB = "UK", IE = "Ireland", PT = "Portugal"
  ) |> 
  enframe()

# clean the first dataset: health_service_sat, to prep for plotting.
# it has 17 columns: 1 year column, 8 hex-coded columns (imputed/smoothed values), 
# and 8 columns for country abbreviations (survey data, even years).
# initially assumed hex columns align pairwise with country columns; but it was not the case.s
# columns are randomly ordered within each set, so we use ranking instead.
# sorting by satisfaction score per year helps group values by country (via proximity of their scores).
# we then combine two sorted datasets: year + hex columns, and year + country columns
# note that this might not always be the go-to solution but in this case, 
# and given that the values are interpolated (through smoothing), we can safely bet that the values will be very close to one another
# from one row to the next

hex_to_country_mapping <- health_service_sat |> 
  pivot_longer(
    -year
  ) |> 
  slice_max(year) |> 
  filter(
    str_starts(name, '\\#') & !str_detect(name, 'A9FF') # looking at last values from article curves, we can infer this is Italy so 
  ) |> 
  arrange(value) |> 
  bind_cols(
    health_service_sat |> 
      pivot_longer(
        -year
      ) |> 
      slice_max(year) |> 
      filter(
        !str_starts(name, '\\#') & !str_detect(name, 'IT')
      ) |> 
      arrange(value)
  ) |> 
  select(
    years = 1, hex_code = 2, second_to_last_val = 3, 
    years_max = 4, country_abb = 5, last_val = 6
  ) |> 
  mutate(
    val_diff = abs(last_val - second_to_last_val)
  ) |> 
  arrange(val_diff) |> 
  select(
    hex_code, country_abb
  ) |> 
  # also join to country_label_tibble to get full country names for future use
  inner_join(
    country_label_tibble, 
    join_by(country_abb == name)
    )
# now we can map the randomly assigned hex value labels to the actual columns/countries, and create 8 series,
# one for each country
health_service_sat <- health_service_sat |> 
  pivot_longer(
    -year
  ) |> 
  left_join(
    hex_to_country_mapping, 
    join_by(name == hex_code)
  ) |>
  mutate(
    country_abb = coalesce(country_abb, name)
  ) |> 
  inner_join(
    hex_to_country_mapping, 
    join_by(country_abb == country_abb)
  ) |> 
  select(
    year, 
    country_abb,
    country = value,
    value = value.x
  ) |> 
  drop_na() 

extract_smooth_build <- function(tibble, country = 'GB'){
  
  initial_pull <- 
    all_data |> 
    filter(country %in% {{country}}) |> 
    ggplot(aes(x = year, y = value)) + 
    geom_smooth(method = 'loess')
  
  # fetch country abbs for ids, and ranges
  country_ids <- c(na.omit(all_data |> pull(country) |> unique()))
  country_max <- all_data |> filter(country == {{country}}) |> pull(value) |> max()
  country_min <- all_data |> filter(country == {{country}}) |> pull(value) |> min()
  
  # access smoothed, include actual years to imputed/smoothed points, cap at min max per country/series
  # and keep only columns of interest
  smoothed_df <- ggplot_build(initial_pull)[[1]] |> as.data.frame() |> as_tibble()
  
  complete_series <- 
    smoothed_df |> 
    select(year = x, value = y) |> 
    mutate(country := country) |> 
    bind_rows(
      all_data |> 
        filter(country == {{country}}) |> 
        select(year, value) 
    ) |> 
    mutate(
      year = as.integer(year),
      year_val_tie_breaker = if_else(is.na(country), 1, 0)
    ) |> 
    group_by(country, year) |> 
    arrange(desc(year_val_tie_breaker)) |> 
    mutate(ties = row_number()) |> 
    filter(
      if (n() < 4) TRUE else ties + year_val_tie_breaker != 1 # make sure every year/country combo gets same no. of obs
      # and that original values (only in the event a given year is even or survey year) take precedence over smoothed ones
      # otherwise just pass/do nothing
    ) |> 
    ungroup() |> 
    # ensuer smoothed values don't go below/beyond lower/upper bounds
    mutate(
      value = pmin(pmax(value, country_min), country_max)
    ) |> 
    arrange(year) |> 
    fill(country, .direction = 'downup') |>  # since every year starts with 
    select(year, country, value) 
  
  return(complete_series)
  
}
# country vector to loop thru
country_name_abbs <- c(na.omit(all_data |> pull(country) |> unique()))
# combine all series
all_series <- map_dfr(.x = country_name_abbs, ~extract_smooth_build(tibble = all_data, country = .x))

# set contry 'switch; so that tooltip can change accordingly for odd numebred years
country_labels <- c(
  NO = "Norway", DE = "Germany", ES = "Spain",
  FR = "France", GB = "UK", IE = "Ireland", PT = "Portugal"
)

# adding year as continuous variable (decimal years) so that points don't overlap but strech over whithin a year to year span
all_series <- 
  all_series |> 
  mutate(
    rn = row_number(), .by = c(country, year)
  ) |> 
  mutate(
    decimal_year = if_else(rn == 1, year, year + rn / 8)
  ) |> 
  mutate(
    year = decimal_year
  ) |> 
  select(-decimal_year)

# also join on country full name mapping so we can generate a consolidate data_id that links the aestethics together (for interactive simultaneus highlighting, etc.)
all_series <- 
  all_series |> 
  inner_join(
    country_labels |> enframe() |> rename(values = value), 
    join_by(country == name)
  ) |> 
  mutate(
    data_id = str_c(country, values)
  ) |> 
  select(-values) |> 
  mutate(
    country_name = str_sub(data_id, 3, 20)
  )

# also generate visible (and non visible years by exclusion) as they don't visually get the same properties; visible (even numbered years) get the country abb as a tooltip (and larger markers/circles), 
# while 'invisible' ones (odd numbered years along with year 2023) get their country names fully spelled out and get transparent marker/circle fill
visible_years <- c(seq(2002, 2022, 2), 2023)

visible_points <- 
  all_series |> 
  filter(round(year) %in% visible_years & floor(year) == ceiling(year))
invisible_points <- all_series |> 
  filter(!round(year) %in% visible_years & floor(year) != ceiling(year))

# final touchups
# set color mappings
color_map <- expr(
  case_when(
    country %in% c('NO', 'Norway') ~ '#d43b45',
    country %in% c('DE', 'Germany') ~ '#DCA825',
    country %in% c('ES', 'Spain') ~ '#b01622',
    country %in% c('FR', 'France') ~ '#487caa',
    country %in% c('GB', 'UK') ~ '#264250',
    country %in% c('IE', 'Ireland') ~ '#61A861',
    country %in% c('PT', 'Portugal') ~ '#d27e4e',
    TRUE ~ '#000000'
  )
)

# set tooltip mappings
tooltip_map <- expr(
  case_when(
    !year %in% c(seq(2002, 2022, 2), 2023) & country %in% names(country_labels) ~ country_labels[country],
    TRUE ~ country
  )
)

label_data <-
  all_series |>
  group_by(country) |>
  arrange(desc(year)) |> 
  filter(row_number() == 1) |>
  mutate(
    y_offset = case_when(
      country == 'ES' ~ value + .1,
      country == 'FR' ~ value +  0,
      country == 'DE' ~ value - .05,
      country == 'GB' ~ value - .1,
      country == 'PT' ~ value + .2,
      TRUE ~ value)
  ) |> 
  ungroup() |> 
  mutate(
    country_name = case_when(
      country == "DE" ~ "Germany",
      country == "ES" ~ "Spain",
      country == "FR" ~ "France",
      country == "GB" ~ "UK",
      country == "IE" ~ "Ireland",
      country == "NO" ~ "Norway",
      country == "PT" ~ "Portugal",
      TRUE ~ NA_character_
    ),
    country_color = case_when(
      country %in% c("DE", 'Germany') | country_name %in% 'Germany' ~ "#9b6e00",  # override DE/Germany label color here since curve color is different than country label color (only one)
      country %in% c('NO', 'Norway') ~ '#d43b45',
      country %in% c('ES', 'Spain') ~ '#b01622',
      country %in% c('FR', 'France') ~ '#487caa',
      country %in% c('GB', 'UK') ~ '#264250',
      country %in% c('IE', 'Ireland') ~ '#61A861',
      country %in% c('PT', 'Portugal') ~ '#d27e4e',
      TRUE ~ '#000000'
    )
  ) |> 
  inner_join(
    country_labels |> enframe() |> rename(values = value), join_by(country == name)
  ) |> 
  mutate(
    data_id = str_c(country, values),
    country = if_else(country == 'DE', 'Germany', country)
  )

# add caption to match Tom's
caption_text <- "<span style='color:#232323;'>0 = extremely bad, 10 = extremely good</span><br><span style='color:#939291; font-weight: normal;'>Chart: Tom Calver | The Times and The Sunday Times • Source: ESS/K. Kardous</span>"

p <- 
  all_series |> distinct() |> 
  ggplot(
    aes(x = year, 
        y = value, 
        group = data_id,
        color = country)
  ) +
  scale_color_manual(
    values = c(
      'NO' = "#d43b45",
      'DE' = '#DCA825',
      'ES' = '#b01622',
      'FR' = '#487caa',
      'GB' = '#264250',
      'IE' = '#61A861',
      'PT' = '#d27e4e')
  ) +
  scale_y_continuous(
    breaks = seq(0, 7, 1), limits = c(0, 8)
    ) + 
  scale_x_continuous(
    breaks = seq(2002, 2022, 2), 
    limits = c(2002, 2025),
    expand = c(0, 0.1)
  ) + 
  theme(
    legend.position = 'none',
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  ) +
  geom_smooth_interactive(
    data = all_series,
    aes(x = year, y = value, data_id = paste0(country, country_name)),
    method = "loess",
    se = FALSE,
    linewidth = 3.5, # thick line acts as the 'border'
    alpha = 1,
    show.legend = FALSE,
    color = "white"
  ) +
  # colored interactive smooth line
  geom_smooth_interactive(
    data = all_series |> filter(!country %in% 'IE'),
    aes(data_id = paste0(country, country_name)),
    method = "loess", 
    se = FALSE, 
    linewidth = 0.9, 
    fill = NA,
    show.legend = FALSE
  ) +
  geom_smooth_interactive(
    data = all_series |> filter(country %in% 'IE'),
    aes(data_id = paste0(country, country_name)),
    method = "loess",
    se = FALSE, 
    linewidth = 0.9, 
    fill = NA, 
    show.legend = FALSE
  ) +
  scale_y_continuous(
    breaks = seq(0, 7, 1), limits = c(0, 8)
    ) + 
  scale_x_continuous(
    breaks = seq(2002, 2022, 2), 
    limits = c(2002, 2025),
    expand = c(0, 0.1)
  ) +
  labs(
    x = NULL,
    y = NULL,
    caption = caption_text
  ) +
  # final touchoups before interactive rendering thru girafe()
  theme(
    panel.spacing = unit(20, 'cm'),
    plot.margin = margin(l = 5, b = 10), # leave some space/margin at the bottom for caption 'room to breathe'
    axis.text = element_text(face = "bold"), # axis tick labels
    strip.text = element_text(face = "bold"), # facet labels
    panel.grid.major.x = element_blank(),
    axis.text.x = element_text(margin = margin(b = 10, t = -10)),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.ticks.x = element_blank(),
    plot.caption = element_markdown(
      family = "Roboto",
      face = 'bold'
      ) 
  ) 

p_interactive <- p +
  geom_point_interactive(
    data = visible_points,
    aes(
      x = year,
      y = value, 
      color = country,
      data_id = paste0(country, country_name)
    ),
    alpha = 0.1, fill = 'white', show.legend = FALSE
  ) +
  geom_point_interactive(
    data = 
      all_series |> 
      mutate(
        point_size = if_else(country %in% c('NO', 'Norway', 'PT', 'Portugal'), 3, 1.5),
        point_stroke = point_size
      ),
    aes(
      x = year, 
      y = value,
      data_id = paste0(country, country_name),
      tooltip = paste0(
        "<div style='text-align:", 
        if_else(year <= 2015.250, "left", "right"), 
        "; line-height: 1.1;'>", # tightens spacing
        "<div style='font-weight:bold; font-size:16px; color:",
        if_else(country_name == "Germany", "#9b6e00", eval(color_map)), 
        ";'>", 
        eval(tooltip_map), 
        "</div>",
        "<div style='font-size:16px;'>", round(year, 0), "</div>",
        "<div style='font-size:16px;'>", round(value, 2), "</div>",
        "</div>"
      )
    ),
    color = 'white',
    fill = 'white', 
    shape = 21, 
    alpha = 0
  ) +
  geom_rect(
    inherit.aes = FALSE,
    aes(xmin = 2024, xmax = Inf, ymin = -Inf, ymax = Inf),
    color = NA, 
    fill = "white"
  ) +
  scale_color_manual(
    breaks = c("GB", "FR", "IE", "PT", "ES", "NO", 'DE', 'Germany'), # this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow color
    values = c("#264250", "#487caa", "#61A861", "#d27e4e", "#b01622", "#d43b45", '#DCA825', '#9b6e00') # this makes sure Germany as a country label gets dark yellow while 'DE' as a curve gets a darker yellow color
  ) +
  # scale_color_identity() +  # correctly apply the country color to the label's font
  # coord_cartesian(xlim = c(2002, 2024.5)) +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  ) +
  # add persistent white circle that follows mouse (via selection); one to cover all data points is simply assigning data to country.year combo
  geom_point_interactive(
    data = all_series,
    aes(
      x = year,
      y = value,
      group = paste0(year, country_name)
    ),
    shape = 21,
    size = 0.4,
    stroke = 1,
    fill = 'white',
    color = "grey85",
    alpha = 0,
    show.legend = FALSE
  )

Data Visualization

Show the code

#|echo: false
#|message: false
#|warning: false

p_ggraph_ready <- 
  p_interactive + 
  theme_minimal() +  
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  ) + 
  labs(
    caption = caption_text
  ) + 
  theme(
    plot.caption = element_markdown(
      lineheight = 1.2,
      hjust = -0.01,
      margin = margin(t = 10, l = -7, r = 1),
      halign = 0
    ),
    axis.text.x = element_text(margin = margin(t = -7, l = 3, b = 7, r = -3))  
  ) +
  geom_segment(
    aes(x = 2002, xend = 2023, y = 0, yend = 0), 
    color = 'black', 
    linewidth = .1
  ) + 
    geom_label_interactive(
    data = 
      all_series |> 
      bind_rows(
        all_series |>
          slice_max(year) |>
          arrange(desc(value)) |> 
          mutate(
            country_name = str_sub(data_id, 3, 20),
            year = 2024,
            value = c(c(7, 5.8), c(5.7, 5.4, 5.1, 4.8, 4.5) - .3)
            )
        )
     |> 
      slice_max(year) |> 
      mutate(year = 2023.4),
    aes(
      x = year,
      y = value,
      group = paste0(country, country_name),
      label =  c("Norway", "Spain", "France", "Germany", "UK", "Ireland", "Portugal"),
      data_id = paste0(country, country_name)
    ),
    label.size = NA,
    fill = NA,
    color = c("#d43b45", "#b01622", "#487caa", "#9b6e00", "#264250", "#61A861", "#d27e4e"),
    size = 3.3,
    hjust = 0,
    vjust = -.2,
    # fontface = 'bold',
    inherit.aes = FALSE,
    alpha = 1
  ) 

girafe(
    ggobj = p_ggraph_ready,
    options = list(
        opts_tooltip(
        css = "
        background: transparent;
        border: none;
        box-shadow: none;
        font-family: sans-serif;
        text-shadow:
        0 0 4px rgba(234, 255, 255, 1),
        0 0 4px rgba(234, 255, 255, 1),
        0 0 4px rgba(255, 255, 255, 1);
        line-shadow:
        0 0 4px rgba(234, 255, 255, 1),
        0 0 4px rgba(234, 255, 255, 1),
        0 0 4px rgba(255, 255, 255, 1);
        border-radius: none;
        transform: translate(-50%, 20px);
        transition: all 0.3s ease-in-out;",
        delay_mouseover = 300,
        delay_mouseout = 200
        ),
        opts_hover(
          css = "stroke-width: 3; stroke-opacity: 0.9; fill-opacity: 0.9; opacity: 1; transition: all 0.3s ease-in-out;",
          nearest_distance = 10,
          reactive = FALSE
        ),
        opts_hover_inv(
          css = "stroke-width: .2; stroke-opacity: 0.2; fill-opacity: 0.2; opacity: 0.4;transition: all 0.3s ease-in-out;"
        )
    )
)

Notes on Above Plot: This graph looks deceptively simple at first glance; but was in fact by far the hardest to replicate out of all five.

Hollow Circles/Markers: I’m not sure what software Tom uses to render the interactive plots, but it was much harder than expected to replicate the persistent hollow circle marker that moves along each curve and ‘links’ the tooltip to the marker using a small vertical tick. It wasn’t for a lack of trying but I believe current ggiraph framework (I might be wrong) doesn’t natively support said functionality.

White Glow around Curve Borders: Another feature I wanted to implement/match with Tom’s original is the subtle white glow around the curves. Two functions (at least) derived from packages allow for this; geom_glowline() & geom_borderline() from the ggshadow & ggborderline packages, respectively; both of which ‘get disabled’ when being called in a ggiraph framework.

Hovering on a Curve vs. a Point: Upon hovering on a curve, original attenuates the points that make up the curve to a maximum, to only display the curve (devoid of any points shown); mine, while making other curves go far in the background, doesn’t completely eliminate the points that make up the said hovered curve because both points and curve roll up to the same data_id; an added layer from the ggplot2 extension in ggiraph which links in this case said points to the curve (and country labels together).
There might be a way for that uncoupling (between points and their curves) to happen after the fact, after the hover; but after trying arduously, I did not find it. Feel free to reach out or do a pull request to suggest an improvement (here or otherwise).
Having said all that, I believe the rest remains faithful to the original throughout.

Value for Money

Data Sample

Below is a sample of data showing the relationship between healthcare spending and life expectancy across seven countries—five of them European, with the US and Japan representing the West and East, respectively.

The data excludes spending from 2020 to 2022 (but includes 2023), as the author believes COVID-related expenditures would distort the results. This seems reasonable, although one could argue that the distortion would affect all countries similarly. Still, since changes in life expectancy tend to progress at a snail’s pace, including a short-term global health crisis like the pandemic would not make sense.

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false

value_for_money |> 
  drop_na(country) |>
  filter(
    !country %in% c('Norway', 'Australia')
    ) |>
  arrange(year) |>
  slice_max(year) |> 
  gt_nyt_custom(
    title = 'Data Sample of Value for Money Dataset',
    subtitle = 'Showing in this case only most recent year (2023)'
    )

Data Sample of Value for Money Dataset
Showing in this case only most recent year (2023)
Country	Year	Spend	Le	Label	Size
France	2,023.0	5,014.5	83.3	France	576.0
Germany	2,023.0	5,971.3	81.4	Germany	576.0
UK	2,023.0	4,443.7	81.3	UK	576.0
Italy	2,023.0	3,248.7	83.7	Italy	576.0
Canada	2,023.0	5,306.8	82.6	Canada	576.0
Japan	2,023.0	4,873.6	84.7	Japan	576.0
US	2,023.0	10,827.5	79.3	US	576.0

Data Visualization

Notes on the plot: Overall, I really liked this plot, and its replication. If I am being nitpicky, I am not a fan of the US values going outside of the plot panel (for older years; first three years); but that’s the only negative thing I have to say honestly about this plot.
So other than that, I think this is a powerful visualization that has the merit, among others, to clearly display essentially how well/intelligently does a country allocate funds do health as a service.
One thing I couldn’t quite replicate is the idea is the ‘go to back’ country labels of non-hovered elements, which I can’t really explain since the data_id is not shared from country to country and each country is unique; so while values of non hovered countries fade maximally into the white background (so this part works); country labels fail to do so; this might be a bug from the package and I might open an issue through github to shed light on said bug.
I also like the red highlighting action that red fills “UK”in the title with the white color font in bold. It provides instantly where the country of focus should/will be. In R, one way to apply that customization is by using element_markdown() function from the ggtext package - which can read in css as argument, in this case to wrap the word “UK” in css to indicate both color, fill, and font face.
Overall, I believe below plot remains highly faithful to the original.

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false

theme_set_custom()

data <- 
  value_for_money |> 
  drop_na(country) |> 
  filter(
    !country %in% c('Norway', 'Australia')
  ) |> 
  mutate(
    last_year = year == 2023,
    country_tooltip = paste(country, year, sep = ', ')
  ) |>
  arrange(country, year) |>
  mutate(
    country_fill = case_when(
      str_detect(country_tooltip, "US") ~ "US",
      str_detect(country_tooltip, "France") ~ "France",
      str_detect(country_tooltip, "Italy") ~ "Italy",
      str_detect(country_tooltip, "Germany") ~ "Germany",
      str_detect(country_tooltip, "Canada") ~ "Canada",
      str_detect(country_tooltip, "Japan") ~ "Japan",
      str_detect(country_tooltip, "UK") ~ "UK",
      TRUE ~ country_tooltip
    ),
    country = factor(country, levels = c("Canada", "France", "Germany", "Japan", "Italy", "UK", "US"))
  )

# adding a tibble for hjsut/vjust values as country label position is not uniform across all countries
hjust_vjust_tibble <- 
  data |>  
  slice_max(year) |> 
  distinct(country_fill, .keep_all = TRUE) |> 
  mutate(
    hjust_vals = if_else(country %in% c('Italy', 'US'), .5, -.25),
    vjust_vals = if_else(country %in% c('Italy', 'US'), -1.5, 0.5),
    hjust_vals = if_else(country %in% 'UK', -.7, hjust_vals)
  )

p2 <- data |>
  ggplot(
    aes(x = spend, y = le, color = last_year, fill = country_fill)
  ) +
  geom_point_interactive(
    aes(size = size, data_id = country_fill, tooltip = country_tooltip),
    shape = 21, alpha = 1
  ) +
  geom_text_interactive(
    data = data |> 
      slice_max(year) |> 
      distinct(country_fill, .keep_all = TRUE),
    aes(
      label = country_fill,
      tooltip = country_tooltip
    ),
    hjust = hjust_vjust_tibble$hjust_vals, 
    vjust = hjust_vjust_tibble$vjust_vals, 
    alpha = 1, 
    color = '#7B7B7B', 
    fontface = 'bold'
  ) +
  scale_fill_manual( 
    breaks = c("US", "France", "Italy", "Germany", "Canada", "Japan", "UK"),
    values = c("#4076A4", "#80B1E2", "#61A961", "#F5C55E", "#FFAEA9", "#DACFC0", "#E94F55")
  ) +
  scale_color_manual(
    breaks = c(FALSE, TRUE),
    values = c('white', 'black')
  ) +
  labs(
    title = '**Value for money**',
    subtitle = 
      "How life expectancy and per-capita healthcare spend have changed since 2000.<br>
     <span style='background-color:#e94f55; color:white; padding:2px 4px; font-weight:normal;'>**UK**</span> spending is rising, but life
     expectancy has stalled"
  ) +
  labs(x = NULL, y = NULL) +
  scale_x_continuous(
    breaks = seq(3000, 11000, 1000),
    labels = c(format(seq(3000, 10000, 1000), big.mark = ",", trim = TRUE), "$11,000")
  ) +
  coord_cartesian(
    xlim = c(2100, 11300),
    ylim = c(77, 86),
    expand = FALSE,
    clip = 'off'
  ) +
  # add caption for p2
  labs(
    caption = "<span style='color:#232323; font-weight:plain;'>In US Dollars, adjusted for purchasing power and inflation. Excludes 2020-22.</span>  <br>
  <span style='color:#939291; font-weight:bold;'>Chart: Tom Calver | The Times and The Sunday Times</span>"
  ) +
  theme(
    text = element_text(family = 'Roboto', color = 'black', face = 'bold'),
    plot.title = element_markdown(size = 12, lineheight = 1.2, linewidth = 1.5, margin = margin(b = 10, l = -15, r = 20)),
    plot.subtitle = element_markdown(size = 12, lineheight = 1.2, face = 'plain', margin = margin(b = 10, l = -15, r = 20)),
    axis.text.x = element_text(margin = margin(t = 5, b = -5)),
    panel.grid.major = element_line(size = 0.3, color = "#e8e8e8"),
    axis.line = element_line(color = "black", size = 0.3),
    legend.position = 'none',
    plot.caption = element_markdown(
      size = 10,
      hjust = 0,
      lineheight = 1.2,
      margin = margin(t = 20, r = 20, l = -15),
      face = 'plain'
    ),
  ) +
  annotate(
    geom = 'rect',
    xmin = 1880,
    xmax = 2130,
    ymax = 86.7,
    ymin = 86.27,
    fill = '#e94f55'
  ) +
  # we also need to annotate the years 2000 and 2023 with Germany's yellow hex code (to match what Tom has)
  # not so much for Germany but for reference in general to the range of years for the plot
  # 2000 persistent text geom; for 2023 we use text geom; for 2000, we use label with no borders to bring forward '2000'
  annotate(
    geom = 'label',
    label = '2000',
    x = 4250,
    y = 77.97,
    color = '#E0AB26',
    fill = 'white',
    label.size = NA,
    fontface = "bold"
  ) +
  # 2023 persistent text geom
  annotate(
    geom = 'text',
    label = '2023',
    x = 6400,
    y = 81,
    color = '#E0AB26',
    fontface = "bold"
  ) +
  # add x and y axes titles (within the plot itself)
  # y axis
  annotate(
    geom = 'text',
    label = 'Life expectancy',
    x = 2685,
    y = 85.8,
    color = '#7B7B7B',
    fontface = "bold",
    fontfamily = 'Roboto',
    fontsize = 15
  ) +
  # x axis; i couldn't get the text to right justify for x axis title 'Per Capita\n spend' even after using hjust = 1, so i split that text in two lines and that works/matches Tom's 
  annotate(
    geom = 'text',
    label = 'Per-capita',
    x = 11200,
    y = 77.5,
    color = '#7B7B7B',
    fontface = "bold",
    fontfamily = 'Roboto',
    fontsize = 15,
    hjust = 1,
    vjust = .6
  ) + 
  annotate(
    geom = 'text',
    label = 'spend',
    x = 11200,
    y = 77.2,
    color = '#7B7B7B',
    fontface = "bold",
    fontfamily = 'Roboto',
    fontsize = 15,
    hjust = 1,
    vjust = .6
  )

girafe(
  ggobj = p2,
  width_svg = 10, height_svg = 6,
  options = list(
    opts_tooltip(
      css = "background: white;
            border: 1px solid #ddd;
            border-radius: 4px;
            padding: 6px;
            font-size: 14px;
            font-family: Roboto;
            font-weight: bold;
            color: #232323;
            text-align: left;
            transform: translate(-50%, 20px);
            transition: all 0.1s ease-in-out;"
    ),
    opts_hover(
      css = "stroke-opacity: 1; fill-opacity: 1; opacity: 1;"
    ),
    opts_hover_inv(
      css = "fill-opacity: 0 !important;
            stroke-opacity: 0 !important;
            opacity: 0 !important;
            color: white;
            text { display: none !important; }"
    )
  )
)

Room to Improve

Dataset

Unlike previous sections, below is the full dataset (only 13 rows by 3 column) showing the room to improve for countries; the potential of each country’s health service to improve without the need to increase current spend; this piece/section of the article borrows from Zarulli et al.

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false
  
# map(room_to_improve$Country |> word(1) |> str_remove_all('\\:')[-5], emo::ji)

room_to_improve |> 
  gt_nyt_custom(
    title = 'Room to Improve Dataset', 
    subtitle = str_glue(
    "Notice the first column consists of country names preceded by emoji shortcodes per country so that for instance
    Spain can have ", md(emo::ji('es')), " as a prefix"
    )
  )

Room to Improve Dataset
Notice the first column consists of country names preceded by emoji shortcodes per country so that for instance Spain can have 🇪🇸 as a prefix
Country	Life Expectancy at Birth	Potential
:jp: Japan	84.1	84.1
:gb: United Kingdom	81.1	83.8
:it: Italy	83.0	83.2
:de: Germany	80.9	84.1
:gr: Greece	81.7	81.7
:us: United States	78.9	84.1
:ie: Ireland	81.6	84.1
:pl: Poland	78.1	80.9
:es: Spain	83.1	83.1
:pt: Portugal	81.4	82.6

Data Wrangling

Show the code

#|echo: false
#|message: false
#|warning: false

theme_set_custom()

# first isolate country names from first column values (using regex), then join on jis (emoji dataset) and filter to 
# subgroup == 'country-flag', then re-join (sef join) to capture only countries of interest's flag emojis
room_to_improve <- 
  room_to_improve |>
  mutate(
    country_names = room_to_improve$Country |> 
      word(2, sep = "[a-z]\\: ") 
  ) |>  
  inner_join(
    jis |> 
      filter(
        subgroup == 'country-flag'
        ) |> 
      select(emoji, name, aliases) |> 
      unnest(aliases) |> 
      mutate_if(is.character, str_to_lower) |> 
      distinct() |> 
      inner_join(
        tibble(
          room_to_improve$Country |> 
            word(2, sep = "[a-z]\\: ") |>
            str_to_lower() |> 
            enframe(name = NULL, value = 'name') |>
            select(name) 
            )
      ) |> 
      mutate(
        country_names = str_to_title(name)
      ) 
  ) |> 
  select(-c(aliases, name, Country)) |> 
  distinct()

Below is how the rendering looks like, now unfortunately, and because the plot uses Roboto throughout, the flag emojis weren’t rendering correctly. Also because flags are compound glyphs (made from regional indicators), and most R graphics devices don’t support fonts like “Apple Color Emoji” that can display them properly.

So since ggplot2 doesn’t allow font fallback within axis labels, I switched to embedding SVG flag images using element_markdown() and tags instead. This is better anyways in the sense that SVG’s are by design more scalable (in dimensions/sizing) and can be a good input/template/boiler plate for me and for you in the event embedding flag images into a future script/project is needed.

Show the code

room_to_improve |> 
  gt_nyt_custom(
    title = 'Room to Improve Dataset with added and rendered flags'
  )

Room to Improve Dataset with added and rendered flags

Life Expectancy at Birth	Potential	Country Names	Emoji
84.1	84.1	Japan	🇯🇵
81.1	83.8	United Kingdom	🇬🇧
83.0	83.2	Italy	🇮🇹
80.9	84.1	Germany	🇩🇪
81.7	81.7	Greece	🇬🇷
78.9	84.1	United States	🇺🇸
81.6	84.1	Ireland	🇮🇪
78.1	80.9	Poland	🇵🇱
83.1	83.1	Spain	🇪🇸
81.4	82.6	Portugal	🇵🇹

I found flag SVG urls using ‘https://static.dwcdn.net/css/flag-icons/flags/4x3/’ and then only appended at the end are the respective country abbreviations which are looped through below.

Show the code

#|echo: false
#|message: false
#|warning: false

theme_set_custom()
# generate the data
p3_prep <-
  tribble(
    ~Current,  ~Potential,  ~Country,
    78.9,      84.1,       "United States",
    80.9,      84.1,       "Germany",
    78.1,      80.9,       "Poland",
    81.1,      83.8,       "United Kingdom",
    81.6,      84.1,       "Ireland",
    81.4,      83.8,       "Finland",
    82.0,      84.1,       "Norway",
    82.3,      84.1,       "France",
    82.4,      84.1,       "Sweden",
    81.4,      82.6,       "Portugal",
    83.0,      84.1,       "Australia",
    83.0,      83.2,       "Italy",
    81.7,      81.7,       "Greece",
    83.1,      83.1,       "Spain",
    84.1,      84.1,       "Japan"
  ) |>
  select(last_col(), everything()) |>
  # order gets lost when pivoting longer so we create a integer sequence and force rank/arrange that way
  mutate(seq = 1:15)

# create country abb names (fetched from one of the html nodes within the original plot) and build urls/per flag
country_abbs <- c('us', 'de', 'pl', 'gb', 'ie', 'fi', 'no', 'fr', 'se', 'pt', 'au', 'it', 'gr', 'es', 'jp')
flag_urls <- str_glue("https://static.dwcdn.net/css/flag-icons/flags/4x3/{country_abbs}.svg")

# loop thru svg's and convert to png's
flag_paths <- map(
  flag_urls, function(url) {
    
    svg_path <- tempfile(fileext = ".svg")
    png_path <- tempfile(fileext = ".png")
    
    download.file(url, svg_path, mode = "wb")
    rsvg::rsvg_png(svg_path, png_path)
    
    return(png_path)
  }
)
flag_paths <- setNames(flag_paths, country_abbs)

# add the 2 additional columns back to p3
p3_data <- p3_prep |>
  bind_cols(
    flag_paths |> unlist() |> stack() |> rename(flag_pngs = values, country_abbs = ind)
  ) |>
  select(Country, country_abbs, Current, Potential, flag_pngs)

# also add html code straight into p3_data but first abbreviate country names
p3_data <-
  p3_data |>
  # abbreviated United States and United Kingdom because i noticed the blanks/two or more words can throw off element markdown, especially
  # when embedding svg's; while not perfectly replicating here, in the context of country names, 'US' and 'UK' are universally reconized, especially
  # if flag images are appended to them
  mutate(
    Country = if_else(
      Country == 'United Kingdom', 'UK',
      if_else(Country == 'United States', 'US', Country)
    ),
    flag_html = paste0("<img src='", flag_pngs, "' width='25' height='15'> <span style='font-family: Roboto;'>", " ", " ", Country, "</span>")
  )

# we have to turb this from wide to long; to get a tracking per country (current -> potential)
p3_data_prep <-
  p3_data |>
  # order gets lost when pivoting longer so we create a integer sequence and force rank/arrange that way
  mutate(seq = 1:15) |>
  pivot_longer(
    -c(seq, Country, country_abbs, flag_pngs, flag_html), names_to = 'progress'
  ) |>
  mutate(
    # add color codes (different for UK compared to rest)
    # for greece, default arrow shows a recession, but in Tom's plot, it's '>', it's the same value for current and for potential for Greece, so we
    # artificially add + 0.001 to the Greek score for potential for force '>' arrow direction
    # value = if_else(Country == 'Greece' & progress == 'Potential', value + 0.0001, value),
    hex_codes = if_else(Country == 'UK', '#73a3d3', '#264250'), # UK gets its own color
    arrow_end_angle = if_else(Country %in% c('Spain', 'Japan'), 90, 70)) |>
    arrange(desc(seq)) # for some reason, below plot was reversing order, so we reverse order here so that plot arranges countries properly

# prepare plot caption
plot_caption <- "<span style='color:#939292; font-weight:bold;'>Chart: The Times and The Sunday Times | Source: <span style='color:#254251; text-decoration:underline;'>Zarulli et al.</span></span>"

# since arrow() wouldn't natively recognized arrow_end_angle,
# we create an variable in the global env. to call it within arrow() later on
p3 <- 
  p3_data_prep |>
  ggplot(aes(y = fct_reorder(Country, -seq), x = value, color = hex_codes)) +
  geom_path(
    arrow = arrow(type = "open", angle = c(rep(90, 3), rep(60, 27)), length = unit(5, 'pt')),
    linewidth = 1,
    lineend = 'round'
  ) +
  geom_label(
    data = p3_data_prep |> filter(progress == 'Current'),
    aes(label = value, hjust = 1.2),
    family = "Roboto",
    size = 4,
    fill = 'white', 
    label.size = NA,
    fontface = 'bold'
  ) +
  geom_text(
    data = p3_data_prep |> filter(progress == 'Potential'),
    aes(label = value, hjust = -.3),
    family = "Roboto", 
    size = 4,
    fontface = 'bold'
  ) +
  geom_text(
    aes(x = 78.7, y = 15, label = '\nCurrent\n'),
    family = 'Roboto',
    size = 4, 
    nudge_y = .75, 
    nudge_x = -.2, 
    color = '#232323',
    margin = margin(t = -10, b = 5)
  ) +
  geom_text(
    aes(x = 84.1, y = 15, label = '\nPotential\n'),
    family = 'roboto',
    size = 4, 
    nudge_y = .75,
    nudge_x = .1, 
    color = '#232323',
    margin = margin(t = -10, b = 5)
  ) +
  scale_color_identity() +
  scale_y_discrete(
    labels = p3_data_prep |> filter(progress == "Current") |> pull(flag_html)
  ) +
  theme(
    axis.text.y = element_markdown(family = "Roboto", size = 10, hjust = 0, face = 'plain', colour = '#232323'),
    axis.text.x = element_text(family = "Roboto", size = 10, face = 'bold'),
    axis.title.x = element_text(size = 12, family = "Roboto Condensed"),
    axis.title.y = element_text(size = 12, family = "Roboto Condensed"),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  ) +
  labs(x = NULL, y = NULL) +
  geom_segment(
    aes(x = 78.85, xend = 78.85, y = 15.25, yend = 15.4),
    color = "grey70", linewidth = .2, inherit.aes = FALSE
  ) +
  geom_segment(
    aes(x = 84.1, xend = 84.1, y = 15.25, yend = 15.4),
    color = 'grey70', linewidth = .2, inherit.aes = FALSE
  ) +
  labs(caption = plot_caption) +
  theme(
    plot.caption = element_markdown(
      hjust = -.45,
      size = 9,
      family = "Roboto", 
      face = 'bold',
      margin = margin(t = 10)
    ),
    axis.text.x = element_text(margin = margin(t = 10, b = 10))
  ) + 
  coord_cartesian(
  xlim = c(77, 85),
  expand = FALSE,
  clip = 'off'
  ) + 
  scale_x_continuous(breaks = seq(78, 84, 2)) 

p3

Overall, I like this type of plot (typically called arrow plots); which more often than not shows spread between one value to another (so 2 values total) per group; in this case per country. The reason I like it is because it scales very well; meaning in this case we could have ‘gotten away’ with adding 5 more countries within fairly similar plot panel dimensions while not sacrificing 5 times the readability.
A couple of things I couldn’t replicate perfectly is the centering on the country labels, but that may be a limitation from ggtext as it doesn’t fully support all types of css tags including align center since I tried multiple ways to achieve that - to no avail.
Finally, while I was able to embed link to the source cited in the caption (Zaruli et al.); it would have taken more work (more than I deemed necessary) to make said link clickable.

Barely Beds

Dataset

This below shows the dataset for the barely bed section of the article; by far the easiest to replicate.
It shows hospital beds availability across countries, normalized by considering no. of beds per 1,000 people.

Show the code

#|echo: false
#|message: false
#|warning: false

barely_beds |> 
  gt_nyt_custom(
    title = 'Barely Beds Dataset', 
    subtitle = 'We can clearly see the dearth of hospital beds for the UK'
  )

Barely Beds Dataset
We can clearly see the dearth of hospital beds for the UK
Country	Beds
Japan	12.8
Germany	7.9
Austria	7.2
France	5.8
Belgium	5.6
Greece	4.2
Australia	3.8
Portugal	3.5
Finland	3.4
Italy	3.2

Data Visualization

Not the most exciting visualization but sometimes, you just need a good old bar plot !

Show the code

#|echo: false
#|message: false
#|warning: false

theme_set_custom()

barely_beds <- 
  barely_beds |> 
  select(
    country = Country,  Beds_per_1000 = Beds
    ) |> 
  mutate(
    country = factor(country, levels = rev(country)),
    bar_color_code = if_else(country != 'UK', '#264250', '#7fb1e2'),
    label_color_code = if_else(country != 'UK', 'white', 'black'),
    font_face = if_else(country != 'UK', 'bold', 'plain')
  ) 

p4 <- ggplot(barely_beds, aes(x = country, y = Beds_per_1000, fill = bar_color_code)) +
    geom_col(width = .8) +
    scale_fill_identity() + 
    coord_flip() +
    labs(x = NULL, y = NULL) + 
    theme(
        axis.line.x = element_blank(), axis.line.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks = element_blank(),
        axis.text.y = element_text(size = 8, hjust = 0, colour = 'black')
    ) + 
    scale_y_discrete(expand = expansion(mult = c(0, 0.1))) +
    geom_label(
        aes(label = Beds_per_1000), 
        label.size = NA, 
        size = 4,
        fill = NA, 
        hjust = 1, 
        vjust = .6,
        nudge_x = .1, 
        fontface = barely_beds$font_face,
        colour = barely_beds$label_color_code
        ) + 
    labs(
        caption = "<span style='color:#939292; font-weight:bold; text-align: left; white-space: nowrap;'>Chart: The Times and The Sunday Times • Source: OECD.</span>"
    ) +
    theme(
        plot.caption = element_markdown(
            size = 8,
            hjust = -.225,
            family = "Roboto", 
            face = 'bold'
        )
    ) + 
    ggtitle(
      label = md("**Barely beds**"),
      subtitle = "Hospital beds per 1,000 people"
      ) + 
    theme(
        plot.title = element_markdown(size = 8, hjust = -.1265),
        plot.subtitle = element_markdown(size = 8, hjust = -.155),
        plot.margin = margin(l = 5, t = 0, r = 10, b = 10),
        panel.grid = element_blank()
    )
p4

Budget Breakdowns

Dataset

Below is a breakdown of how the budget is allocated across different cost categories in each country. There are eight categories in total, with Hospitals, unsurprisingly, accounting for the lion share in every country. The data has been normalized (same currency and per-person basis) for comparability.
The UK ranks second to last in terms of budget allocated to Medical goods and equipment, just ahead of Italy, which implies that UK’s NHS is starved on a crucial budget.

Show the code

budget_breakdowns |> 
  mutate(
    Ancillary = coalesce(Ancillary, 0)
  ) |> 
  mutate_if(is.numeric, as.integer) |> 
  rename(country = Category) |> 
  gt_nyt_custom(
    title = "Budget Allocation on Health Services per person",
    subtitle = "Ancillary spend for the US was NULL or NA, so I replaced it with 0 since all other spends have non-empty values, it's likely that there is no such spend for this category for the US"
  ) |> 
  tab_style(
    style = cell_fill(color = '#fff5ec'), 
    locations = cells_body(rows = 8)
  ) |> 
  fmt_currency(everything(), decimals = 0)

Budget Allocation on Health Services per person
Ancillary spend for the US was NULL or NA, so I replaced it with 0 since all other spends have non-empty values, it’s likely that there is no such spend for this category for the US
Country	Hospitals	Residential Long Term Care	Ambulance	Ancillary	Medicines	Preventive Care	Admin and Finance	Rest of Economy
US	$3,393	$479	$3,315	$0	$1,473	$524	$836	$40
Germany	$1,718	$573	$1,968	$104	$1,130	$261	$305	$310
Norway	$2,468	$980	$1,809	$64	$618	$179	$112	$92
Australia	$2,414	$382	$1,288	$346	$742	$250	$220	$122
Canada	$1,413	$782	$1,561	$83	$915	$335	$176	$11
France	$1,998	$577	$1,148	$168	$860	$71	$256	$56
Japan	$1,948	$445	$1,339	$27	$804	$172	$72	$1
UK	$1,986	$557	$1,067	$91	$424	$152	$79	$161
Italy	$1,416	$187	$800	$160	$494	$131	$57	$5

Data Wrangling & Visualization

The first step was to make the data as shown above go from wide (a typically very hard to work with structure) to long, so that each country name repeats 8 times, each time for a different spend category.
Second was to control how many labels would be displayed on the each segment per bar.
Third was turning spend categories into a factor variable for easier sorting, matching Tom’s sorting/order of bars.
Forth was creating a tibble/dataframe for tooltips; this will activate upon hover and will be specially useful for small-width segments as data label won’t be explicitly displayed.
Fifth step was to include a separate plot which serves as a legend that will stack (vertically) on main bar plot. This allows for easier customization and since it’s written using a function to generate each box (per category) along with labels to the right of each bar, it makes it more modular/generalizeable to other scripts in the future where needed. I used {pathwork} (p1 / p2) in this case meaning legend (p1) to go over the bar plot.
Finally, this consolidated both plots into one, before being rendered via ggiraph for interactivity.

Show the code

#|echo: false
#|message: false
#|warning: false
#|include: false

country_budget_breakdowns <-
  budget_breakdowns |> 
  pivot_longer(
    -Category, 
    names_to = 'spend_category',
    values_to = 'budget'
  ) |> 
  select(country = 1, everything()) |> 
  mutate(
    spend_category = case_when(spend_category == 'Ambulance' ~ 'Community',
                               spend_category == 'Medicines' ~ 'Medical Goods & Equip.',
                               .default = spend_category),
    spend_budget_factor = factor(spend_category, levels = rev(unique(spend_category))),
    total_spend = sum(budget), .by = country
  ) |> 
  mutate(
    # for data labels/budget, the original only prints out values > $914 or values for below budgets so I create a variable for later reference/use 
    data_labels = case_when(
      budget >= 915 ~ scales::dollar(budget),
      (budget %in% c(782, 800, 804, 836, 860)) ~ scales::dollar(budget),
      TRUE ~ ""
    ),
    data_labels_color = case_when(
      (spend_category %in% 'Medical Goods & Equip.' | budget %in% c(980, 782)) ~ 'black', 
      TRUE ~ 'white'
    ),
    data_labels_font_face = case_when(
     data_labels_color == 'white' ~ 'bold',
     TRUE ~ 'plain'
    ),
    spend_budget_factor = factor(spend_category, levels = c(
      'Hospitals',
      'Residential long-term care',
      'Community',
      'Ancillary',
      'Medical Goods & Equip.',
      'Preventive care',
      'Admin and finance',
      'Rest of economy')
    )
  ) |> 
  arrange(desc(total_spend), spend_budget_factor) 

tooltip_texts <- country_budget_breakdowns |>
  group_by(country, spend_category) |>
  summarise(
    tooltip = paste0(
      spend_category, "\n", 
      paste0(country, ": ", scales::dollar(budget), collapse = "\n")
    ),
    .groups = "drop"
  )

country_budget_breakdowns <- country_budget_breakdowns |>
  left_join(
    tooltip_texts, 
    join_by(country == country, spend_category == spend_category)
  )

p2 <- country_budget_breakdowns |> 
  ggplot(
    aes(
      x = reorder(country, total_spend), 
      y = as.numeric(as.character(budget)), 
      fill = fct_rev(spend_budget_factor), 
      data_id = spend_category
      )
  ) +
  geom_col_interactive(
    aes(
      fill = fct_rev(spend_budget_factor),
      data_id = spend_category,
      tooltip = tooltip
      )
    ) +
  geom_text_interactive(
    aes(label = data_labels, y = budget), 
    position = position_stack(vjust = 0.04), 
    hjust = 0,                               # keeps text left-aligned, starting at the segment's beginning (after coord_flip)
    size = 5,
    family = "Roboto",
    fontface = country_budget_breakdowns$data_labels_font_face,
    color = country_budget_breakdowns$data_labels_color
    ) +
  scale_y_continuous(breaks = seq(0, 10e3, 2e3), position = 'right', labels = scales::dollar_format()) + 
  coord_flip() + 
  scale_x_discrete(position = "bottom") +
  labs(x = NULL, y = NULL) + 
  theme_classic() +
  theme(
    axis.line.x = element_blank(), 
    axis.line.y = element_blank(),
    axis.ticks = element_blank(),
    axis.text.x.top = element_text(family = 'Roboto', face = 'bold', size = 12, color = 'black'), 
    axis.text.x = element_blank(),
    legend.position = 'none'
  ) +
  scale_fill_manual_interactive(
    values = c(
      'Hospitals' = "#264250",
      'Residential long-term care' = '#7fb1e2',
      'Community' = '#61a961',
      'Ancillary' = '#3392a5',
      'Medical Goods & Equip.' = '#e0ab26',
      'Preventive care' = '#dacfc2',
      'Admin and finance' = '#b01622',
      'Rest of economy' = '#95807a')
  ) +
  geom_hline_interactive(yintercept = seq(0, 10e3, 2e3), color = 'white', linewidth = 0.2, linetype = "solid") +
  # The ylim, expand, and clip arguments for coord_flip should ideally be consolidated into the main coord_flip() call
  # For example: coord_flip(ylim = c(0, 11000), expand = FALSE, clip = 'off')
  # However, based on your provided p2 snippet, the limits are primarily controlled by scale_y_continuous.
  # Ensure consistency if you have multiple coord_flip() or limit-setting calls.
  theme(
    plot.margin = margin(l = 20, r = -20),
    axis.text.y = element_text(
      hjust = 0,              
      family = 'Roboto',
      face = 'bold',
      size = 12,
      margin = margin(r = 10))
  )

legend_box <- 
  tribble(
    ~spend_category,  ~colors,
    'Hospitals',  "#264250",
    'Residential long-term care',  '#7fb1e2',
    'Community',  '#61a961',
    'Ancillary',  '#3392a5',
    'Medical Goods & Equip.', '#e0ab26',
    'Preventive care',  '#dacfc2',
    'Admin and finance', '#b01622',
    'Rest of economy', '#95807a'
  )

# function to create tiled legend to serve as first row of a two plot wrapper (stacked veritically)
legend_tile_plot <- function(label, color) {
  
  ggplot() +
    geom_rect_interactive(
      aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),  # 1x1 squares/tiles
      fill = color,
      data_id = label,
      tooltip = label
    ) +
    # move labels to the right of tiles
    annotate_interactive(
      "text", 
      x = 1.2, y = .6,  # position to the right of the box
      label = label,
      data_id = label,
      hjust = 0,  # left-align text
      size = 5,
      color = 'black',
      family = "Roboto",
      face = "bold") +
    coord_fixed(ratio = 1) +  
    xlim(0, 20) +  # useful for lengthier words, such as 'Medical Goods & Equip.'
    ylim(-1.4, 1.1) +
    theme_void() +
    theme(
      plot.margin = margin(0, 0, 0, 0)  # reduce margins for better plot values fit 
    )
}

legend_plots <- map2(
  legend_box$spend_category, 
  legend_box$colors,
  legend_tile_plot
)

# reduce spacing within p1
p1 <- wrap_plots(
  legend_plots, 
  nrow = 2, 
  ncol = 4, 
  heights = 1.2,
  widths = .6
) 

# reduce spacing not within but between p1 and p2 using plot_layout heights
girafe(
  ggobj = (p1 / p2) + plot_layout(heights = c(0.4, 3), widths = c(1, 3)),  # reduced p1 height and widths
  width_svg = 12, height_svg = 7,
  options = list(
    opts_tooltip(
      css = "background: white;
            border: 1px solid #ddd;
            border-radius: 4px;
            padding: 6px;
            font-size: 14px;
            font-family: Roboto;
            font-weight: bold;
            color: #232323;
            text-align: left;
            transform: translate(-50%, 20px);
            transition: all 0.3 ease-in-out;"
    ),
    opts_hover(
      css = "stroke-opacity: 1; fill-opacity: 1; opacity: 1; transition: all 0.3s ease-in-out;"
    ),
    opts_hover_inv(
      css = "fill-opacity: 0.3 !important;
            stroke-opacity: 0.3 !important;
            opacity: 0.6 !important;
            transition: all 0.2s ease-in-out;
            color: white;
            text { display: none !important; }"
    )
  )
)

Overall, I liked this plot - it took a fair amount of effort to replicate. The only part I wasn’t a fan of was the y-axis scale from $0 to $10,000. It can be a bit misleading: for example, the U.S. hospital spend of around 3,400 falls below the $2,000 mark. I would have removed the intermediate axis (that goes from 0 to $10,000) ticks and kept only the $10,000 label to avoid this potential confusion. Another difference where I actually deviated from the plot is the structure of the legend; where Tom uses one block across which you will find boxes along with their respective spend categories from left to right whereas I decided to just go for 2x4 where each row contains 4 spend categories (total of 8). I think this hopefully gives clearer separation and I think can be more than often useful especially for even number of categories like it’s the case in here.

One feature I couldn’t replicate was the hover interaction where small segments (like Ancillary spend) trigger the display of all corresponding values across countries. Achieving that would require a deeper dive into CSS -assuming ggiraph even supports that level of dynamic interaction. That said, for small segments like ‘Residential long-term care’, a few values are already visible without interaction, which gives a reasonable sense of the scale for the rest. For all those reasons, I didn’t think the extra complexity was worth the additional time investment.

Thanks for taking the time to read this and hopefully, if nothing else, you have enjoyed it, more to come !