Visualising trade-flow data over time in R using Chord GIFs

It's hard to satisfactorily summarise and visualise data in three dimensions. Specifically here we are concerned with flow data over time. That is imagine we have some unit, often a geography or individual, say countries. We are interested in the flows of something (goods, migration, information whatever) between countries and how it has changed over time.

The best way I have found of visualising this data is in a Chord diagram animation - a GIF. Like the one below which looks at trade in endangered Crocodiles (or parts thereof) between the 10 largest players since 1975.

In [1]:
from IPython.display import display, Image
with open('trade_crocs.gif','rb') as f:
    display(Image(data=f.read(), format='png'))

Step 1: format your data correctly

As previously mentioned your data needs to be a three way balanced panel. That is for every $i$ you have an entry for every $j$ and for every $ij$ there is data in every time period $t$. A good example is country trade flows, note these can be symmetric (i.e. capturing total trade, exports plus imports) or directional.

The data I use here is from the the Convention on international trade in endangered species (CITIES) .

In [ ]:
// load data 
use full_complete_data.dta , clear

// here I focus on trade in crocodiles 
local type = "Crocodile" // focus on Crocs
// Chord diagrams can get very busy very quickly, its best to keep to a small number of countries, I pick 10
local x = 10 // focus on the 10 biggest exporters

// convert type to code - some mess specific to my dataset
if "`type'" == "Crocodile" local code = 2
if "`type'" == "Orchid" local code = 1
if "`type'" == "Coral" local code = 3

// this identifies and keeps only data pertaining to the 10 biggest players
keep if order_code == `code' 
qui {
// save as file to be read into R
keep exp imp quantity year

// keep the largest x trading countries
bysort exp: egen tot_exp = total(quantity)
bysort imp: egen tot_imp = total(quantity)
gen tot_trade = tot_exp + tot_imp if exp == imp
replace tot_trade = 0 if missing(tot_trade)
bysort imp (tot_trade): gen tot_trade_imp = tot_trade[_N]
bysort exp (tot_trade): gen tot_trade_exp = tot_trade[_N]
gen mtot_trade_imp = - tot_trade_imp
gen mtot_trade_exp = - tot_trade_exp
egen trade_rk_imp = group(mtot_trade_imp)
egen trade_rk_exp = group(mtot_trade_exp)

drop if trade_rk_imp > `x' + 1
drop if trade_rk_exp > `x' + 1

// must keep a tidy dataset
drop tot_exp tot_imp tot_trade* mtot_trade* trade_rk*    
    
// rename and reorder variables to prevent any ambiguity
rename year year0
rename exporter exp
rename importer imp
order exp imp quantity year0 
}

// export to a csv file for reading by R
export delimited using data_temp_vis.csv , replace

Step 2 : parse into R

To create the diagrams, and the GIF I use R packages. You maybe able to do step 1 in R also, that's great. I'm very new to R and currently feel more comfortable doing such data manipulation in Stata, but if you can do it in R that's better.

In [ ]:
// create meta data to keep colors fixed, this prevents colors changing from pane-to-pane
qui {
keep exp
duplicates drop

gen col = ""
replace col = "red" in 1
replace col = "blue" in 2
replace col = "green" in 3
replace col = "cyan" in 4
replace col = "orange" in 5
replace col = "purple" in 6
replace col = "pink" in 7
replace col = "grey" in 8
replace col = "darkblue" in 9
replace col = "yellow" in 10
replace col = "darkred" in 11
}

export delimited using data_meta.csv , replace

To make things easy on the eyes in Jupyter Notebooks I change R warning options to $-1$. Here I also load the relevant packages, make sure you have these installed prior.

In [ ]:
options(warn=-1)
library(circlize)
library(gdata)
library(tidyverse)
library(tweenr)

Read the two .csv files you've made in Stata into R, again if you can do everything in R straight away that is preferred.

In [ ]:
d0 = read.csv("data_temp_vis.csv ") 
d1 = read.csv("data_meta.csv")

Next we tween the data. Tweening is the process of creating intermediate frames between images, in our case we use year as the time frame and linearly tween to make 100 frames.

In [ ]:
# tween the data (provide intermediate steps to make the animation smooth)
d2 <- d0 %>%
  mutate(corridor = paste(exp , imp , sep = " -> ")) %>%
  select(corridor, year0, quantity) %>%
  mutate(ease = "linear") %>%
  tween_elements(time = "year0", group = "corridor", ease = "ease", nframes = 100) %>%
  tbl_df()

Some tidying after that tweening

In [ ]:
d2 <- d2 %>%
  separate(col = .group, into = c("orig_reg", "dest_reg"), sep = " -> ") %>%
  select(orig_reg, dest_reg, quantity, everything())

In order to make a Chord diagram over time it is useful to fix the arc positions and sizes for each country. We do this by forcing them to be equal to their max value. Note however, that at the begining some appear - this is somewhat of a bug (although I've almost convinced myself it's a feature).

This chunk of code finds and fixes the arc lengths at their max.

In [ ]:
library(magrittr)

reg_max <- d0 %>%
  group_by(year0, exp) %>%
  mutate(tot_out = sum(quantity)) %>%
  group_by(year0, imp) %>%
  mutate(tot_in = sum(quantity)) %>%
  filter(exp == imp) %>%
  mutate(tot = tot_in + tot_out) %>%
  mutate(reg = exp) %>%
  group_by(reg) %>%
  summarise(tot_max = max(tot)) %$%
  'names<-'(tot_max, reg)

Next we define the colors.

In [ ]:
col = c(AT = "red" , CH = "blue" , CN = "green" , FR = "cyan" , HK = "orange" ,
            IT = "purple" , JP = "pink" , MX = "grey" , SG = "darkblue" , TH = "yellow" ,
            US = "darkred" )

This is the real meat - this block of code loops through each frame from our tweening and for each constructs a consistent Chord diagram. It uses the circize package, which is a great resource for creating all manner of circular visualisations. Detailed descriptions of the code here, as well as much much more, can be found here.

In [ ]:
# Create a folder to hold all the images (there will be 100 in total)
dir.create("./plot-gif/")

# start the loop through each unique f (frame) in the dataset you tweened
for(f in unique(d2$.frame)){
    # construct png image
    png(file = paste0("./plot-gif/globalchord", f, ".png"), height = 7, width = 7,
      units = "in", res = 500)
    
    # set up circize package
    circos.clear()
    par(mar = rep(0, 4), cex=1)
    circos.par(start.degree = 90, track.margin=c(-0.1, 0.1), 
             gap.degree = 4, points.overflow.warning = FALSE)

    # plot the chord diagram
    # only look at one frame at a time
    x = filter(d2 , .frame == f)
    year = floor(mean(x$year0))
    x = x[c(1:3)]
    # look at the above link to circize documentation for a description of each option
    chordDiagram(x ,  self.link = 2, directional = 1, order = d1$exp,
        grid.col = col , col = col , annotationTrack = "grid",
        transparency = 0.25,  annotationTrackHeight = c(0.05, 0.1),
        direction.type = c("diffHeight", "arrows"), link.arr.type = "big.arrow",
        diffHeight  = -0.04, link.sort = TRUE, link.largest.ontop = TRUE, 
        xmax = reg_max)
    
    circos.track(track.index = 1, bg.border = NA, panel.fun = function(x, y) {
        xlim = get.cell.meta.data("xlim")
        sector.index = get.cell.meta.data("sector.index")
        reg1 = d1 %>% filter(exp == sector.index) %>% pull(exp)

        circos.text(x = mean(xlim), y = ifelse(is.na(reg1), 3, 4),
                    labels = reg1, facing = "bending", cex = 1.1)
        circos.axis(h = "top", labels.cex = 0.8 ,
                    labels.niceFacing = FALSE, labels.pos.adjust = FALSE)
      })

    # add labels around the outside 
    title_str <- paste("Flows: ", year)
    text(-0.8, 1, labels = title_str , cex = 1.6)
    
    # close plotting device
    dev.off()
}

Now we have created 100 images in /plot-gif/ it only remains to squige them all together into a GIF using the magick package

In [ ]:
library(magick)

img <- image_read(path = "./plot-gif/globalchord0.png")
for(f in 1:99){
  img0 <- image_read(path = paste0("./plot-gif/globalchord",f,".png"))
  img <- c(img, img0)
  message(f)
}

img1 <- image_scale(image = img, geometry = "800x800")

ani0 <- image_animate(image = img1, fps = 10)
image_write(image = ani0, path = "./plot-gif/globalchord.gif")

And viola! The finished article.

In [2]:
with open('trade_crocs.gif','rb') as f:
    display(Image(data=f.read(), format='png'))

Software circlize citation: Gu, Z. (2014) circlize implements and enhances circular visualization in R. Bioinformatics. DOI: 10.1093/bioinformatics/btu393