Does polarization increase protest? A note on Griffin et al. 2020 (BJPS, 51, 3)

The 2020 paper “Deprivation in the Midst of Plenty: Citizen Polarization and Political Protest” by Griffin, Kiewiet de Jonge, and Velasco-Guachalla, published by in British Journal of Political Science, contains a data error that invalidates the study’s main results and conclusions that polarization leads to higher protest activity. The error is quite trivial and lies in the construction of the dependent variable of the study. After fixing the data error the results no longer indicate a positive effect of polarization on protest, nor a moderating effect of grievances.

The paper’s original replication materials are in this Dataverse repository.

I starts with a brief summary of the theoretical argument and the data and measures used. Next, I explain the source of the error, reproduce the original analysis and present the corrected analyses after the data error is fixed.

Summary of Griffin et al.

The theoretical argument builds on Ted Gurr’s grievance theory to argue that the main determinant of protest is not the level of grievances itself, but the polarization of grievances. High polarization of grievances, i.e. a situation when some groups in a society are very dissatisfied, while others are very satisfied, is conducive to protest eruption because it implies feelings of relative deprivation among the dissatisfied group, who compare themselves to the satisfied groups.

The scope conditions, according to the authors, include the requirement of a democratic context, and - relatedly - peaceful anti-government protest rather than collective violence.

The measure of polarization is constructed as a sample standard deviation of responses to the question about satisfaction with democracy from various cross-national survey projects. It is measured at the country-year level.

The level of protest is measured as the number of anti-government protests and strikes at the country-year level, taken from the Cross National Time Series (CNTS) dataset. It too is measured at the country-year level.

Data inspection

I should note that I have not attempted to reproduce the measure of polarization, nor have I paid much attention to how the control variables were constructed. I just focus on the protest measure, i.e. the dependent variable of the study.

The file Conflict data sample in the replication materials contains the protest data used in the article to construct the dependent variable. Here is a snippet for Poland 1981-1990. The variable protest is the one created by the authors and used in the analysis. In the article, it is described as the sum of strikes (strike) and anti-government demonstrations (demons). It is enough to eyeball the data to see that the variable ‘protest’ is not a sum of strike and demons. Rather, the variable protest looks like strike + riot.

I checked the original CTNS data downloaded following instructions on the CTNS project website. Below is the data snippet for Poland in 1981-1990. The counts of demonstrations, riots, and strikes registered in Poland 1981-1990 match the numbers in the Griffin et al. replication data file.

The error in creating the protest variable looks trivial. The authors must have added together the wrong data columns. This is an easy mistake to make if the CTNS data are imported into data analysis software with only the short variable names (domestic2, domestic3, …) and not full labels.


I start by downloading the data files from the authors’ Dataverse repository.

library(tidyverse) # for manipulating data
library(dataverse) # for getting data from Dataverse
library(zoo) # for interpolating
library(skimr) # for making quick data summarise
library(kableExtra) # for making tables
library(glmmTMB) # for running generalized linear mixed models
library(sjPlot) # model tables (and much more!)

# Polarization dataset
polarization <- get_dataframe_by_name(
  filename = "Polarization",
  dataset = "10.7910/DVN/SVXE7L", 
  server = "")

# Protest dataset
conflict <- get_dataframe_by_name(
  filename = "Conflict data",
  dataset = "10.7910/DVN/SVXE7L", 
  server = "")

# Control variables dataset
qog <- get_dataframe_by_name(
  filename = "QOG",
  dataset = "10.7910/DVN/SVXE7L", 
  server = "")

The code below is a translation of the Stata code in Polarization_Protest from the replication materials to R.

combo <- polarization %>%
  # merge all datasets by country and year
  full_join(conflict, by = c("ccode", "Year")) %>%
  full_join(qog, by = c("ccode", "Year")) %>%
  mutate(# correct measure of anti-gov demonstrations + strikes
         demons_strike = demons + strike,
         # riots + strikes
         riot_strike = riot + strike,
         # other transformations as in the original data cleaning script
         polar = polar*100,
         disat = disat*100,
         GDPcap = log(pwt_rgdpch),
         Growth = wdi_gdpgr,
         Inflation = wdi_infl,
         Ln_Inflation = ifelse(Inflation >= 1, log(Inflation), 0),
         polity2 = p_polity2,
         Eth_frag = al_ethnic,
         Ln_pop = log(wdi_pop),
         Elec_leg = dpi_legelec,
         Elec_exe = dpi_exelec,
         ENP = gol_enpp,
         Presidential = as.numeric(dpi_system == 0),
         Presidential = ifelse(, 0, Presidential),
         Urban = wdi_urban,
         reg_durability = p_durable,
         terror = gd_ptss,
         timetrend = Year - 2000
         ) %>%
  arrange(Country, Year) %>%
  group_by(Country) %>%
  # interpolate income inequality
  mutate(gini = zoo::na.approx(solt_ginet, na.rm = F)) %>%
  ungroup() %>%
  # keep democracies only
  filter(polity2 >= 0) %>%
  # created lagged and centered variables
  mutate_at(vars(protest, demons_strike, riot_strike, demons, riot, strike, polar, disat, gini,
                 GDPcap, Growth, Inflation, Ln_Inflation, Elec_leg, Elec_exe, polity2, reg_durability,
                 terror, Urban, Ln_pop),
            .funs = list(l = ~lag((. - mean(., na.rm = T)), 1))) %>%
  # drop cases with missing lagged polarization
  drop_na(polar_l) %>%
  select(ccode, Year, demons_strike, demons_strike_l, riot_strike, riot_strike_l,
         protest, protest_l, polar_l, disat_l, gini_l, GDPcap_l,
         Growth_l, Ln_Inflation_l, Elec_leg_l, Elec_exe_l, polity2_l,
         reg_durability_l, terror_l, Urban_l, Ln_pop_l, Presidential,
         Eth_frag, timetrend)

Here’s the summary of all relevant variables. Note that the protest variable, constructed by the Griffin et al., and the riot_strike variable, constructed by myself as a sum of the number of riots and strikes, are the same.

skimr::skim(combo) %>%
  mutate(numeric.mean = round(numeric.mean, 3)) %>%
  mutate( = round(, 3)) %>%
  mutate(numeric.p0 = round(numeric.p0, 3)) %>%
  mutate(numeric.p100 = round(numeric.p100, 3)) %>%
  dplyr::select(skim_variable, n_missing, numeric.mean,, numeric.p0, numeric.p100) %>%
  print(n = 25)
## # A tibble: 24 × 6
##    skim_variable    n_missing numeric.mean numeric.p0 numeric.p100
##    <chr>                <int>        <dbl>      <dbl>      <dbl>        <dbl>
##  1 ccode                    0      412.       245.         8          894    
##  2 Year                     0     2000.         9.36    1967         2011    
##  3 demons_strike           71        0.784      1.63       0           12    
##  4 demons_strike_l         27       -0.145      1.58      -0.928       11.1  
##  5 riot_strike             72        0.439      1.33       0           22    
##  6 riot_strike_l           28       -0.256      1.21      -0.672       21.3  
##  7 protest                 71        0.439      1.33       0           22    
##  8 protest_l               27       -0.256      1.21      -0.672       21.3  
##  9 polar_l                  0        0          6.26     -17.5         31.0  
## 10 disat_l                  0        0          8.53     -24.0         22.5  
## 11 gini_l                 107       -2.68       9.80     -18.9         28.8  
## 12 GDPcap_l                34        0.439      0.93      -3.21         1.94 
## 13 Growth_l                12       -0.651      4.79     -48.3         14.9  
## 14 Ln_Inflation_l          12       -0.209      1.32      -1.94         6.77 
## 15 Elec_leg_l               9        0.024      0.454     -0.266        0.734
## 16 Elec_exe_l               8        0.009      0.334     -0.118        0.882
## 17 polity2_l                0        0.877      1.79      -7.82         2.18 
## 18 reg_durability_l         0        0.913     31.8      -29.7        171.   
## 19 terror_l                22       -0.241      0.984     -1.14         2.86 
## 20 Urban_l                  5        8.61      15.0      -45.5         38.5  
## 21 Ln_pop_l                 5        0.105      1.26      -3.23         4.54 
## 22 Presidential             0        0.394      0.489      0            1    
## 23 Eth_frag                 0        0.313      0.216      0.002        0.93 
## 24 timetrend                0       -0.211      9.36     -33           11

And here is a summary of the data created with the authors’ original code. The summary statistics in the table below are the same as in the table above, up to rounding error.

 Variable           Obs        Mean    Std. Dev.       Min        Max  
 protest            971     .438723    1.333588          0         22  
 l_protest        1,015   -.2562795    1.213236   -.672043   21.32796  
 l_polar          1,042    3.51e-07    6.258634  -17.48873   31.01496  
 l_disat          1,042    3.33e-06    8.526368  -23.96803   22.46431  
 l_gini             935   -2.684602    9.803594  -18.88194   28.81756  
 l_GDPcap         1,008    .4393539     .930272  -3.211872   1.935358  
 l_Growth         1,030   -.6509732    4.790745  -48.32703   14.85958  
 l_Ln_Inflation   1,030   -.2092084    1.322055  -1.936994    6.76943  
 l_Elec_leg       1,033    .0243469    .4541744  -.2660694   .7339306  
 l_Elec_exe       1,034    .0092795    .3338718  -.1183801   .8816199  
 l_polity2        1,042    .8766785    1.792215   -7.81622    2.18378  
 l_reg_durability 1,042    .9132185    31.78594  -29.67699    171.323  
 l_terror         1,020   -.2409006    .9839658  -1.144822   2.855178  
 l_Urban          1,037    8.612539    15.04106  -45.52881   38.51258  
 l_Ln_pop         1,037    .1051592    1.261732  -3.233301   4.535833  
 Presidential     1,042    .3944338    .4889634          0          1  
 Eth_frag         1,042    .3128498    .2159714    .001998    .930175  
 timetrend        1,042   -.2111324    9.355481        -33         11  


The authors estimate negative binomial count models predicting the number of events. They present both “flat” models with errors clustered by country and multilevel models with country-years nested in countries. I focus on the latter.

Main effects

The code below reproduces Model specification 2 with country random effects from Table 1 in Griffin et al. - model protest in the output table below. The second model replaces the original protest variable with the sum of counts of demonstrations and strikes, demons_strike.

The crucial coefficient is the effect of lagged polarization, polar_l. In the original model the coefficient is estimated at 0.046, significant at the 0.05 level. In the corrected model the coefficient equals 0.009 and is no longer significant.

Note that not all coefficient values of the protest model below are exactly the same as in the original table in Griffin et al. For example, the coefficient for lagged protest in the reanalysis below is 0.197, SE = 0.065, compared to 0.195, SE = 0.065 in the Griffin et al. paper. This is likely due to small differences in the estimation routine menbreg in Stata and in the R package glmmTMB. Coefficients for the effects of polarization are the same whether the model is estimated in R or Stata, and equal 0.009 with SE = 0.015.

re_original <- glmmTMB(protest ~ protest_l + polar_l + disat_l + 
                         gini_l + I(gini_l^2) + GDPcap_l + Growth_l + Ln_Inflation_l + 
                         Elec_leg_l + Elec_exe_l + polity2_l + reg_durability_l + 
                         terror_l + Urban_l + Ln_pop_l + Presidential + Eth_frag + timetrend +
                  (1 | ccode),
                  data = combo,
                  ziformula = ~0,
                  family = nbinom2)

re_corrected <- glmmTMB(demons_strike ~ demons_strike_l + polar_l + disat_l + 
                          gini_l + I(gini_l^2) + GDPcap_l + Growth_l + Ln_Inflation_l + 
                          Elec_leg_l + Elec_exe_l + polity2_l + reg_durability_l + 
                          terror_l + Urban_l + Ln_pop_l + Presidential + Eth_frag + timetrend +
                  (1 | ccode),
                  data = combo,
                  ziformula = ~0,
                  family = nbinom2)

sjPlot::tab_model(re_original, re_corrected, = FALSE, = TRUE, transform = NULL,
                  digits = 3, emph.p = FALSE,
                  order.terms = c(2,20,3:19,1))
  protest demons strike
Predictors Log-Mean std. Error p Log-Mean std. Error p
protest l 0.197 0.065 0.003
demons strike l 0.125 0.042 0.003
polar l 0.046 0.019 0.015 0.009 0.015 0.548
disat l -0.000 0.015 0.977 0.016 0.011 0.151
gini l 0.020 0.021 0.340 0.034 0.016 0.037
gini l^2 -0.000 0.001 0.831 0.000 0.001 0.812
GDPcap l -0.222 0.315 0.481 -0.135 0.239 0.570
Growth l -0.024 0.023 0.280 -0.001 0.018 0.974
Ln Inflation l -0.087 0.102 0.392 -0.102 0.079 0.194
Elec leg l 0.199 0.201 0.324 0.182 0.158 0.250
Elec exe l 0.080 0.292 0.783 0.329 0.218 0.131
polity2 l -0.119 0.073 0.105 -0.161 0.057 0.005
reg durability l -0.004 0.006 0.485 -0.004 0.005 0.363
terror l -0.052 0.147 0.725 -0.076 0.117 0.514
Urban l 0.027 0.014 0.048 0.019 0.010 0.060
Ln pop l 0.395 0.116 0.001 0.411 0.090 <0.001
Presidential -0.593 0.461 0.199 -0.408 0.354 0.249
Eth frag -0.284 0.782 0.716 -0.432 0.600 0.472
timetrend -0.038 0.015 0.011 -0.030 0.012 0.012
(Intercept) -1.341 0.425 0.002 -0.614 0.337 0.069
Random Effects
σ2 2.07 1.57
τ00 0.53 ccode 0.31 ccode
ICC 0.20 0.16
N 84 ccode 84 ccode
Observations 874 874
Marginal R2 / Conditional R2 0.244 / 0.397 0.303 / 0.417


I now turn to model specification 3 with country random effects, which adds an interaction between polarization and dissatisfaction. The first model reproduces the one from the paper, the second one corrects the data error.

In the reproduction the main effect of polarization is 0.044, SE = 0.019, significant at the 0.05 level. The interaction effect equals -0.004, SE = 0.002, significant at the 0.1 level, like in the published paper.

After correcting the data error the main effect of polarization and the interaction effect become much smaller and neither is statistically significant.

re_original_mod <- glmmTMB(protest ~ protest_l + polar_l*disat_l + 
                         gini_l + I(gini_l^2) + GDPcap_l + Growth_l + Ln_Inflation_l + 
                         Elec_leg_l + Elec_exe_l + polity2_l + reg_durability_l + 
                         terror_l + Urban_l + Ln_pop_l + Presidential + Eth_frag + timetrend +
                  (1 | ccode),
                  data = combo,
                  ziformula = ~0,
                  family = nbinom2)

re_corrected_mod <- glmmTMB(demons_strike ~ demons_strike_l + polar_l*disat_l + 
                          gini_l + I(gini_l^2) + GDPcap_l + Growth_l + Ln_Inflation_l + 
                          Elec_leg_l + Elec_exe_l + polity2_l + reg_durability_l + 
                          terror_l + Urban_l + Ln_pop_l + Presidential + Eth_frag + timetrend +
                  (1 | ccode),
                  data = combo,
                  ziformula = ~0,
                  family = nbinom2)

sjPlot::tab_model(re_original_mod, re_corrected_mod, = FALSE, = TRUE, transform = NULL,
                  digits = 3, emph.p = FALSE,
                  order.terms = c(2,21,3,4,20,5:19,1))
  protest demons strike
Predictors Log-Mean std. Error p Log-Mean std. Error p
protest l 0.198 0.065 0.002
demons strike l 0.127 0.042 0.002
polar l 0.044 0.019 0.017 0.009 0.015 0.550
disat l 0.008 0.015 0.589 0.019 0.012 0.111
polar l × disat l -0.004 0.002 0.060 -0.001 0.002 0.454
gini l 0.021 0.021 0.315 0.035 0.016 0.034
gini l^2 -0.000 0.001 0.962 0.000 0.001 0.786
GDPcap l -0.253 0.309 0.414 -0.140 0.237 0.555
Growth l -0.031 0.023 0.172 -0.002 0.018 0.898
Ln Inflation l -0.096 0.101 0.345 -0.103 0.079 0.192
Elec leg l 0.218 0.201 0.278 0.188 0.158 0.234
Elec exe l 0.056 0.290 0.846 0.324 0.218 0.136
polity2 l -0.098 0.073 0.176 -0.154 0.057 0.007
reg durability l -0.003 0.006 0.665 -0.004 0.005 0.430
terror l -0.029 0.147 0.842 -0.072 0.116 0.536
Urban l 0.027 0.013 0.043 0.019 0.010 0.059
Ln pop l 0.408 0.114 <0.001 0.412 0.089 <0.001
Presidential -0.627 0.451 0.165 -0.417 0.351 0.235
Eth frag -0.409 0.772 0.596 -0.469 0.597 0.432
timetrend -0.037 0.015 0.011 -0.030 0.012 0.012
(Intercept) -1.286 0.416 0.002 -0.591 0.335 0.077
Random Effects
σ2 2.07 1.57
τ00 0.47 ccode 0.29 ccode
ICC 0.19 0.16
N 84 ccode 84 ccode
Observations 874 874
Marginal R2 / Conditional R2 0.266 / 0.403 0.308 / 0.417


After fixing the data error the above analysis fails to provide evidence that polarization (or dispersion) of satisfaction with democracy predicts the total number of anti-government demonstrations and strikes.

The error needs to be corrected. I contacted the authors in January 2024 and BJPS in May 2024, and nothing has been done yet.

