Type: | Package |
Title: | Overflow Data for Quantitative Peace Science Research |
Version: | 0.1.0 |
Depends: | R (≥ 3.5.0) |
Description: | These are data and functions to support quantitative peace science research. The data are important state-year information on democracy and wealth, which require periodic updates and regular maintenance. The functions permit some exploratory and diagnostic assessment of the kinds of data in demand by the community, but do not impose many dependencies on the user. |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | peacesciencer, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-09 13:01:55 UTC; steve |
Author: | Steve Miller |
Maintainer: | Steve Miller <steve@svmiller.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-10 09:10:01 UTC |
Democracy (Correlates of War System)
Description
These are estimates of democracy for Correlates of War state system members.
Usage
cw_democracy
Format
A data frame with the following 6 variables.
ccode
a numeric vector for the Correlates of War state code
year
a numeric vector for the year
euds
a numeric vector for the extended Unified Democracy Scores (UDS) estimate in a given year
aeuds
a numeric vector for the adjusted, extended UDS estimate in a given year
polity2
a numeric vector for the
polity2
score in a given yearv2x_polyarchy
a numeric vector for the Varieties of Democracy "polyarchy" estimate in a given year
Details
Extended Unified Democracy Scores (UDS) estimates come from Marquez' democracyData package. That is version 0.5.1. The Varieties of Democracy data are version 15, but also come by way of their R package.
The "adjusted" versions of the UDS estimate means that 0 represents the average
cut-point for the dichotomous indicators. If it were my call to make, I think
these "adjusted" estimates generally have greater face validity, certainly
for obvious autocracies, even if one might object that they're somewhat less
sanguine than they perhaps could or should be with obvious democracies. For
the latest years in the sample, run a pnorm()
on the values returned for
illustrative cases like Afghanistan, Australia, China, North Korea, Sweden,
and the United States to get an idea of the differences between these
measures (as probabilistic assessments of whether the thing in question is a
democracy).
References
Please cite Miller (2022) for peacesciencer. Beyond that, cite the following, contingent on which democracy estimate you are using.
Extended Unified Democracy Scores (UDS)
Marquez, Xavier. 2016. "A Quick Method for Extending the Unified Democracy Scores" doi: 10.2139/ssrn.2753830
Marquez, Xavier. 2020. "democracyData: A package for accessing and manipulating existing measures of democracy." https://github.com/xmarquez/democracyData.
Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.
Polity5
Marshall, Monty G. 2020. "Polity5: Dataset Users' Manual v2018". Center for Systemic Peace. https://www.systemicpeace.org
Varieties of Democracy
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Fabio Angiolillo, Michael Bernhard, Agnes Cornell, M. Steven Fish, Linnea Fox, Lisa Gastaldi, Haakon Gjerløw, Adam Glynn, Ana Good God, Sandra Grahn, Allen Hicken, Katrin Kinzelbach, Joshua Krusell, Kyle L. Marquardt, Kelly McMann, Valeriya Mechkova, Juraj Medzihorsky, Natalia Natsika, Anja Neundorf, Pamela Paxton, Daniel Pemstein, Johannes von Römer, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Aksel Sundström, Marcus Tannenberg, Eitan Tzelgov, Yi-ting Wang, Felix Wiebrecht, Tore Wig, Steven Wilson and Daniel Ziblatt. 2025. "V-Dem Country-Year Dataset v15" Varieties of Democracy (V-Dem) Project. doi: 10.23696/vdemds25.
Maerz, Seraphine, Amanda Edgell, Sebastian Hellmeier, Nina Ilchenko, Linnea Fox. 'Vdemdata - an R package to load, explore and work with the most recent V-Dem (Varieties of Democracy) and V-Party datasets'. Varieties of Democracy (V-Dem) Project. 2025. https://www.v-dem.net and https://github.com/vdeminstitute/vdemdata
Examples
str(cw_democracy)
head(cw_democracy)
GDP, Population, and GDP per Capita (Correlates of War System)
Description
These are estimates of democracy for Correlates of War state system members.
Usage
cw_gdppop
Format
A data frame with the following 8 variables.
ccode
a numeric vector for the Correlates of War state code
year
a numeric vector for the year
mrgdppc
a numeric vector for the estimated GDP per capita in a given year. See Details.
sd_mrgdppc
a numeric vector for the standard deviation of estimated GDP per capita in a given year.
pwtrgdp
a numeric vector for the estimated real GDP in a given year. See Details.
sd_pwtrgdp
a numeric vector for the standard deviation of estimated real GDP in a given year.
pwtpop
a numeric vector for the estimated population in a given year. See Details.
sd_pwtpop
a numeric vector for the standard deviation of estimated population in a given year.
Details
Fariss et al. (2022) use Gleditsch-Ward for their population of cases. The differences between Gleditsch-Ward and Correlates of War are obvious if often overstated. However, there will be cases where merging one system into the other amounts to a collision, the wreckage of which can't go unnoticed. The canonical cases here are post-WW2 Germany, post-unification Yemen, and all of Serbia/Yugoslavia. Those merit further scrutiny by the user.
The underlying data, as they are, at at the mercy of the Gleditsch-Ward system for describing the universe of cases that could have a GDP, a population, or a GDP per capita. That means there are missing data for Serbia (1916, 1917), Morocco (1905-1912), Egypt (1856-1882), Saudi Arabia (1927-1931), and Laos (1953). I can think of a few imputation procedures under those circumstances, but that is something for which the user would have to take initiative to do themselves.
Based on my reading of Fariss et al. (2022), I think the following information gathered from their simulations make sense for suggested defaults. You may want to get their actual simulations if you want something else, but I think what's included here is good for most use cases.
For additional clarification, the suggested defaults included in this data set are:
GDP per capita: real GDP per capita in prices constant across countries and over time (in 2011 international dollars, PPP).
GDP: expenditure-side real GDP in prices constant across countries and over time (in millions of 2017 international dollars, PPP)
Population: total population (in millions)
The GDP per capita measure is anchored around the Maddison Project Database. The GDP and population measures are anchored around Penn World Tables (10.0). You can create a rough estimate of GDP per capita from the Penn World Table simulations based on the information in this data set. It's free and the cops can't stop you.
I also honor the authors' suggestion to include the standard deviation of these estimates as well. Everyone likes a point estimate, but variation of uncertainty around the estimate is also important.
References
Please cite Miller (2022) for peacesciencer. Cite Fariss et al. (2022) for the simulations. You should also cite the Maddison Project Database (Bolt et al. 2018) and Penn World Table (Feenstra et al. 2015) if that is the underlying source of the data that Fariss et al. (2022) are estimating.
Bolt, Jutta, Robert Inklaar, Herman de Jong, and Luiten Janvan Zanden. 2018. "Rebasing 'Maddison': New Income Comparisons and the Shape of Long-Run Economic Development." Maddison Project Working paper 10.
Fariss, Christopher, J., Therese Anders, Jonathan N. Markowitz, and Miriam Barnum. 2022. "New Estimates of Over 500 Years of Historic GDP and Population Data." Journal of Conflict Resolution 66(3): 553–91.
Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer. 2015. "The Next Generation of the Penn World Table." American Economic Review 105(10): 3150–82.
Examples
str(cw_gdppop)
head(cw_gdppop)
State-Year Panel for Merging G-W Data (Correlates of War)
Description
This a state-year panel in which the Correlates of War state system is the population of interest. They are matched, as well as one can, with their corollaries in the Gleditsch-Ward system. Its primary use is merging in data demarcated in Gleditsch-Ward state system codes when the primary system in use is the Correlates of War system.
Usage
cw_gw_panel
Format
A data frame with the following 6 variables.
stateabb
the state abbreviation, which was the greatest source of agreement between both data sets
year
a numeric vector for the year
gwcode
a Gleditsch-Ward state code
ccode
a Correlates of War state code
gw_statename
the state name as it appears in the Gleditsch-Ward data.
cw_statename
the state name as it appears in the Correlates of War data.
Details
The data-raw/
directory on Github contains more information about how
these data were created. The code itself is derived from what peacesciencer
did for its cow_gw_years
data. It amounts to the creation of daily data for
both systems before doing a "full join" on where there is the least friction:
state abbreviations. This at least requires the least amount of clean-up.
Use of these data will merge only on the state code and year. The state abbreviations and state names are there for background information, where necessary/appropriate.
peacesciencer's documentation cautions that the differences between the two systems are obvious, if often overstated. Merging one into the other, where possible, will be unproblematic in almost all cases. The biggest headaches concern German unification, Yemeni unification, and the overall history of Serbia/Yugoslavia.
Examples
str(cw_gw_panel)
head(cw_gw_panel)
States (Correlates of War System)
Description
These are the independent states in the Correlates of War system.
Usage
cw_system
Format
A data frame with the following 5 variables.
ccode
a numeric vector for the Correlates of War state code
cw_abb
a character vector for the state abbreviation
cw_name
a character vector for the state name
start
a date for system entry
end
a date for system exit
Details
The end column is current as of Dec. 31, 2016. That date is reflected in the
end
column for states still active today.
References
Gleditsch, Kristian S. and Michael D. Ward. 1999. "A Revised List of Independent States since the Congress of Vienna." International Interactions 25(4): 393–413.
Examples
str(cw_system)
head(cw_system)
State-Year Panel for Merging Correlates of War Data (Gleditsch-Ward)
Description
This a state-year panel in which the Gleditsch-Ward state system is the population of interest. They are matched, as well as one can, with their corollaries in the Correlates of War system. Its primary use is merging in data demarcated in Correlates of War state system codes when the primary system in use is the Gleditsch-Ward system.
Usage
gw_cw_panel
Format
A data frame with the following 6 variables.
stateabb
the state abbreviation, which was the greatest source of agreement between both data sets
year
a numeric vector for the year
gwcode
a Gleditsch-Ward state code
ccode
a Correlates of War state code
gw_statename
the state name as it appears in the Gleditsch-Ward data.
cw_statename
the state name as it appears in the Correlates of War data.
Details
The data-raw/
directory on Github contains more information about how
these data were created. The code itself is derived from what peacesciencer
did for its gw_cow_years
data. It amounts to the creation of daily data for
both systems before doing a "full join" on where there is the least friction:
state abbreviations. This at least requires the least amount of clean-up.
Use of these data will merge only on the state code and year. The state abbreviations and state names are there for background information, where necessary/appropriate.
peacesciencer's documentation cautions that the differences between the two systems are obvious, if often overstated. Merging one into the other, where possible, will be unproblematic in almost all cases. The biggest headaches concern German unification, Yemeni unification, and the overall history of Serbia/Yugoslavia.
Examples
str(gw_cw_panel)
head(gw_cw_panel)
Democracy (Gleditsch-Ward System)
Description
These are estimates of democracy for Gleditsch-Ward state system members.
Usage
gw_democracy
Format
A data frame with the following 6 variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
year
a numeric vector for the year
euds
a numeric vector for the extended Unified Democracy Scores (UDS) estimate in a given year
aeuds
a numeric vector for the adjusted, extended UDS estimate in a given year
polity2
a numeric vector for the
polity2
score in a given yearv2x_polyarchy
a numeric vector for the Varieties of Democracy "polyarchy" estimate in a given year
Details
Extended Unified Democracy Scores (UDS) estimates come from Marquez' democracyData package. That is version 0.5.1. The Varieties of Democracy data are version 15, but also come by way of their R package.
The "adjusted" versions of the UDS estimate means that 0 represents the average
cut-point for the dichotomous indicators. If it were my call to make, I think
these "adjusted" estimates generally have greater face validity, certainly
for obvious autocracies, even if one might object that they're somewhat less
sanguine than they perhaps could or should be with obvious democracies. For
the latest years in the sample, run a pnorm()
on the values returned for
illustrative cases like Afghanistan, Australia, China, North Korea, Sweden,
and the United States to get an idea of the differences between these
measures (as probabilistic assessments of whether the thing in question is a
democracy).
References
Please cite Miller (2022) for peacesciencer. Beyond that, cite the following, contingent on which democracy estimate you are using.
Extended Unified Democracy Scores (UDS)
Marquez, Xavier. 2016. "A Quick Method for Extending the Unified Democracy Scores" doi: 10.2139/ssrn.2753830
Marquez, Xavier. 2020. "democracyData: A package for accessing and manipulating existing measures of democracy." https://github.com/xmarquez/democracyData.
Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.
Polity5
Marshall, Monty G. 2020. "Polity5: Dataset Users' Manual v2018". Center for Systemic Peace. https://www.systemicpeace.org
Varieties of Democracy
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Fabio Angiolillo, Michael Bernhard, Agnes Cornell, M. Steven Fish, Linnea Fox, Lisa Gastaldi, Haakon Gjerløw, Adam Glynn, Ana Good God, Sandra Grahn, Allen Hicken, Katrin Kinzelbach, Joshua Krusell, Kyle L. Marquardt, Kelly McMann, Valeriya Mechkova, Juraj Medzihorsky, Natalia Natsika, Anja Neundorf, Pamela Paxton, Daniel Pemstein, Johannes von Römer, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Aksel Sundström, Marcus Tannenberg, Eitan Tzelgov, Yi-ting Wang, Felix Wiebrecht, Tore Wig, Steven Wilson and Daniel Ziblatt. 2025. "V-Dem Country-Year Dataset v15" Varieties of Democracy (V-Dem) Project. doi: 10.23696/vdemds25.
Maerz, Seraphine, Amanda Edgell, Sebastian Hellmeier, Nina Ilchenko, Linnea Fox. 'Vdemdata - an R package to load, explore and work with the most recent V-Dem (Varieties of Democracy) and V-Party datasets'. Varieties of Democracy (V-Dem) Project. 2025. https://www.v-dem.net and https://github.com/vdeminstitute/vdemdata
Examples
str(gw_democracy)
head(gw_democracy)
GDP, Population, and GDP per Capita (Gleditsch-Ward System)
Description
These are estimates of democracy for Gleditsch-Ward state system members.
Usage
gw_gdppop
Format
A data frame with the following 8 variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
year
a numeric vector for the year
mrgdppc
a numeric vector for the estimated GDP per capita in a given year. See Details.
sd_mrgdppc
a numeric vector for the standard deviation of estimated GDP per capita in a given year.
pwtrgdp
a numeric vector for the estimated real GDP in a given year. See Details.
sd_pwtrgdp
a numeric vector for the standard deviation of estimated real GDP in a given year.
pwtpop
a numeric vector for the estimated population in a given year. See Details.
sd_pwtpop
a numeric vector for the standard deviation of estimated population in a given year.
Details
Based on my reading of Fariss et al. (2022), I think the following information gathered from their simulations make sense for suggested defaults. You may want to get their actual simulations if you want something else, but I think what's included here is good for most use cases.
For additional clarification, the suggested defaults included in this data set are:
GDP per capita: real GDP per capita in prices constant across countries and over time (in 2011 international dollars, PPP).
GDP: expenditure-side real GDP in prices constant across countries and over time (in millions of 2017 international dollars, PPP)
Population: total population (in millions)
The GDP per capita measure is anchored around the Maddison Project Database. The GDP and population measures are anchored around Penn World Tables (10.0). You can create a rough estimate of GDP per capita from the Penn World Table simulations based on the information in this data set. It's free and the cops can't stop you.
I also honor the authors' suggestion to include the standard deviation of these estimates as well. Everyone likes a point estimate, but variation of uncertainty around the estimate is also important.
References
Please cite Miller (2022) for peacesciencer. Cite Fariss et al. (2022) for the simulations. You should also cite the Maddison Project Database (Bolt et al. 2018) and Penn World Table (Feenstra et al. 2015) if that is the underlying source of the data that Fariss et al. (2022) are estimating.
Bolt, Jutta, Robert Inklaar, Herman de Jong, and Luiten Janvan Zanden. 2018. "Rebasing 'Maddison': New Income Comparisons and the Shape of Long-Run Economic Development." Maddison Project Working paper 10.
Fariss, Christopher, J., Therese Anders, Jonathan N. Markowitz, and Miriam Barnum. 2022. "New Estimates of Over 500 Years of Historic GDP and Population Data." Journal of Conflict Resolution 66(3): 553–91.
Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer. 2015. "The Next Generation of the Penn World Table." American Economic Review 105(10): 3150–82.
Examples
str(gw_gdppop)
head(gw_gdppop)
States (Gleditsch-Ward System)
Description
These are the independent states and microstates in the Gleditsch-Ward system.
Usage
gw_system
Format
A data frame with the following 6 variables.
gwcode
a numeric vector for the Gleditsch-Ward state code
gw_abb
a character vector for the state abbreviation
gw_name
a character vector for the state name
microstate
a numeric vector for whether the state is a microstate. 1 = microstate. 0 = not a microstate
start
a date for system entry
end
a date for system exit
Details
The end column is current as of Dec. 31, 2020. That date is reflected in the
end
column for states still active today.
References
Gleditsch, Kristian S. and Michael D. Ward. 1999. "A Revised List of Independent States since the Congress of Vienna." International Interactions 25(4): 393–413.
Examples
str(gw_system)
head(gw_system)
Create a Panel of State-Years from the Correlates of War or Gleditsch-Ward system.
Description
state_panel()
is a function to create a panel of state-years
from one of two major state systems in international relations scholarship.
Usage
state_panel(system = "cow", mry = TRUE)
Arguments
system |
a state system (either "cow" or "gw") |
mry |
logical, defaults to TRUE. If TRUE, the panel created extends to the most recently concluded calendar year. If FALSE, the panel created ends at the year of last update. See details section for more. |
Details
This function leans on cw_system
and gw_system
in this package.
The Correlates of War system's last year is 2016. The Gleditsch-Ward system's
last year is 2020. This information matters for the mry
argument in the
function.
Value
state_panel()
returns a data frame of state years corresponding with
either the Correlates of War or the Gleditsch-Ward system.
Examples
head(state_panel(), 10)
head(state_panel(system='gw'), 10)