I wanted to flag a soon-to-be-published article by my colleagues at MIT, Mark Bell and Nicholas Miller, looking at whether and how nuclear weapons possession affects conflict.
The paper is interesting on both substantive and methodological grounds. Substantively, they find no statistically discernable difference in the conflict propensity of states with offsetting nuclear arsenals. This is true even at low levels of conflict. These two findings combined mean they find insufficient evidence to support the existence of a “stability-instability” paradox where the presence of nuclear weapons both deters full-scale war while also increasing the likelihood of lower levels of violence.
I think the methodological point they make is perhaps more important, at least for political science practitioners. There was an established paper in the field that found empirical support for the stability-instability paradox, a paper by Rauchhaus (2009). Rauchhaus made a few mistakes, and helpfully for those of us playing at home, mistakes that we probably could imagine making ourselves.
Empirically, he coded the 1999 Kargil conflict between India and Pakistan as a non-war. The number of casualties in that conflict is somewhat in dispute, and alas for us, it is either immediately below or immediately above the 1,000 battle deaths standard that has come to define “war” in political science. Rauchhaus generally relied on the Correlates of War coding of conflict, except in the Kargil case, which means that Rauchhaus accepted the Correlates of War coding for many wars with fewer casualties than Kargil. The empirical finding that nuclear dyads are less likely to fight wars is entirely dependent on whether or not Kargil is coded as a war. If it is coded as a war, then there is no statistically significant difference between nuclear dyads and their non-nuclear counterparts. Though Bell and Miller do not mention it, Kargil has the potential to play a similar spoiler role in the deterministic-variant of the democratic peace literature, and so people would be well advised to always pay attention to Kargil in their dataset if war is an important IV or DV. Montgomery and Sagan (2009) had flagged this a while ago, but it’s not clear to me it has fully sunk in.
The other error is also one that could be made more generally, and so should be of interest to political scientists uninterested in war or nuclear weapons. He used a canned package in Stata called xtgee, which estimates a logit generalized estimating equation (GEE). Exciting stuff, no? The problem is, that when Kargil is coded as a non-war, then there are no wars between nuclear dyads, which creates “separation” in the data. Nuclear weapons predict non-war perfectly. This should lead to non-identification in GEE, or logit, or probit models. The computer should yell at you in such instances. In this case, the xtgee command in Stata erroneously allowed for a coefficient estimate to be produced, and hence Rauchhaus found what he found. Rauchhaus might have realized his results were fishy if he had produced a table of relative risk tables instead of just reporting coefficient estimates in the article. If he had done so, he would have realized his coefficient suggested non-nuclear dyads were 2.7 million times more likely to go to war than nuclear dyads. Bell and Miller use an estimator developed by Firth that allows for parameter estimation even with separation in the data (a downloadable firthlogit package is available for Stata users).
Separation is a fairly common problem, particularly for small datasets or datasets with rare events in them that are dichotomously coded. It should be underlined that a common response of statistical programs is to drop variables with separation, which permits a computational solution, but probably biases the results on the remaining variables. This webpage by UCLA helpfully walks through how different statistical packages handle the problem. Others who are methodologically smarter than me have convinced me to always fit a linear probability model onto my data as a robustness check. Linear probability models also do not suffer from separation. And if the coefficient estimates are radically off, you should just be prepared to defend your estimator choice very strongly.