Type: Package
Title: Overflow Data for Quantitative Peace Science Research
Version: 0.1.0
Depends: R (≥ 3.5.0)
Description: These are data and functions to support quantitative peace science research. The data are important state-year information on democracy and wealth, which require periodic updates and regular maintenance. The functions permit some exploratory and diagnostic assessment of the kinds of data in demand by the community, but do not impose many dependencies on the user.
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Suggests: peacesciencer, testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-04-09 13:01:55 UTC; steve
Author: Steve Miller ORCID iD [aut, cre]
Maintainer: Steve Miller <steve@svmiller.com>
Repository: CRAN
Date/Publication: 2025-04-10 09:10:01 UTC

Democracy (Correlates of War System)

Description

These are estimates of democracy for Correlates of War state system members.

Usage

cw_democracy

Format

A data frame with the following 6 variables.

ccode

a numeric vector for the Correlates of War state code

year

a numeric vector for the year

euds

a numeric vector for the extended Unified Democracy Scores (UDS) estimate in a given year

aeuds

a numeric vector for the adjusted, extended UDS estimate in a given year

polity2

a numeric vector for the polity2 score in a given year

v2x_polyarchy

a numeric vector for the Varieties of Democracy "polyarchy" estimate in a given year

Details

Extended Unified Democracy Scores (UDS) estimates come from Marquez' democracyData package. That is version 0.5.1. The Varieties of Democracy data are version 15, but also come by way of their R package.

The "adjusted" versions of the UDS estimate means that 0 represents the average cut-point for the dichotomous indicators. If it were my call to make, I think these "adjusted" estimates generally have greater face validity, certainly for obvious autocracies, even if one might object that they're somewhat less sanguine than they perhaps could or should be with obvious democracies. For the latest years in the sample, run a pnorm() on the values returned for illustrative cases like Afghanistan, Australia, China, North Korea, Sweden, and the United States to get an idea of the differences between these measures (as probabilistic assessments of whether the thing in question is a democracy).

References

Please cite Miller (2022) for peacesciencer. Beyond that, cite the following, contingent on which democracy estimate you are using.

Extended Unified Democracy Scores (UDS)

Marquez, Xavier. 2016. "A Quick Method for Extending the Unified Democracy Scores" doi: 10.2139/ssrn.2753830

Marquez, Xavier. 2020. "democracyData: A package for accessing and manipulating existing measures of democracy." https://github.com/xmarquez/democracyData.

Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.

Polity5

Marshall, Monty G. 2020. "Polity5: Dataset Users' Manual v2018". Center for Systemic Peace. https://www.systemicpeace.org

Varieties of Democracy

Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Fabio Angiolillo, Michael Bernhard, Agnes Cornell, M. Steven Fish, Linnea Fox, Lisa Gastaldi, Haakon Gjerløw, Adam Glynn, Ana Good God, Sandra Grahn, Allen Hicken, Katrin Kinzelbach, Joshua Krusell, Kyle L. Marquardt, Kelly McMann, Valeriya Mechkova, Juraj Medzihorsky, Natalia Natsika, Anja Neundorf, Pamela Paxton, Daniel Pemstein, Johannes von Römer, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Aksel Sundström, Marcus Tannenberg, Eitan Tzelgov, Yi-ting Wang, Felix Wiebrecht, Tore Wig, Steven Wilson and Daniel Ziblatt. 2025. "V-Dem Country-Year Dataset v15" Varieties of Democracy (V-Dem) Project. doi: 10.23696/vdemds25.

Maerz, Seraphine, Amanda Edgell, Sebastian Hellmeier, Nina Ilchenko, Linnea Fox. 'Vdemdata - an R package to load, explore and work with the most recent V-Dem (Varieties of Democracy) and V-Party datasets'. Varieties of Democracy (V-Dem) Project. 2025. https://www.v-dem.net and https://github.com/vdeminstitute/vdemdata

Examples


str(cw_democracy)
head(cw_democracy)



GDP, Population, and GDP per Capita (Correlates of War System)

Description

These are estimates of democracy for Correlates of War state system members.

Usage

cw_gdppop

Format

A data frame with the following 8 variables.

ccode

a numeric vector for the Correlates of War state code

year

a numeric vector for the year

mrgdppc

a numeric vector for the estimated GDP per capita in a given year. See Details.

sd_mrgdppc

a numeric vector for the standard deviation of estimated GDP per capita in a given year.

pwtrgdp

a numeric vector for the estimated real GDP in a given year. See Details.

sd_pwtrgdp

a numeric vector for the standard deviation of estimated real GDP in a given year.

pwtpop

a numeric vector for the estimated population in a given year. See Details.

sd_pwtpop

a numeric vector for the standard deviation of estimated population in a given year.

Details

Fariss et al. (2022) use Gleditsch-Ward for their population of cases. The differences between Gleditsch-Ward and Correlates of War are obvious if often overstated. However, there will be cases where merging one system into the other amounts to a collision, the wreckage of which can't go unnoticed. The canonical cases here are post-WW2 Germany, post-unification Yemen, and all of Serbia/Yugoslavia. Those merit further scrutiny by the user.

The underlying data, as they are, at at the mercy of the Gleditsch-Ward system for describing the universe of cases that could have a GDP, a population, or a GDP per capita. That means there are missing data for Serbia (1916, 1917), Morocco (1905-1912), Egypt (1856-1882), Saudi Arabia (1927-1931), and Laos (1953). I can think of a few imputation procedures under those circumstances, but that is something for which the user would have to take initiative to do themselves.

Based on my reading of Fariss et al. (2022), I think the following information gathered from their simulations make sense for suggested defaults. You may want to get their actual simulations if you want something else, but I think what's included here is good for most use cases.

For additional clarification, the suggested defaults included in this data set are:

The GDP per capita measure is anchored around the Maddison Project Database. The GDP and population measures are anchored around Penn World Tables (10.0). You can create a rough estimate of GDP per capita from the Penn World Table simulations based on the information in this data set. It's free and the cops can't stop you.

I also honor the authors' suggestion to include the standard deviation of these estimates as well. Everyone likes a point estimate, but variation of uncertainty around the estimate is also important.

References

Please cite Miller (2022) for peacesciencer. Cite Fariss et al. (2022) for the simulations. You should also cite the Maddison Project Database (Bolt et al. 2018) and Penn World Table (Feenstra et al. 2015) if that is the underlying source of the data that Fariss et al. (2022) are estimating.

Bolt, Jutta, Robert Inklaar, Herman de Jong, and Luiten Janvan Zanden. 2018. "Rebasing 'Maddison': New Income Comparisons and the Shape of Long-Run Economic Development." Maddison Project Working paper 10.

Fariss, Christopher, J., Therese Anders, Jonathan N. Markowitz, and Miriam Barnum. 2022. "New Estimates of Over 500 Years of Historic GDP and Population Data." Journal of Conflict Resolution 66(3): 553–91.

Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer. 2015. "The Next Generation of the Penn World Table." American Economic Review 105(10): 3150–82.

Examples


str(cw_gdppop)
head(cw_gdppop)


State-Year Panel for Merging G-W Data (Correlates of War)

Description

This a state-year panel in which the Correlates of War state system is the population of interest. They are matched, as well as one can, with their corollaries in the Gleditsch-Ward system. Its primary use is merging in data demarcated in Gleditsch-Ward state system codes when the primary system in use is the Correlates of War system.

Usage

cw_gw_panel

Format

A data frame with the following 6 variables.

stateabb

the state abbreviation, which was the greatest source of agreement between both data sets

year

a numeric vector for the year

gwcode

a Gleditsch-Ward state code

ccode

a Correlates of War state code

gw_statename

the state name as it appears in the Gleditsch-Ward data.

cw_statename

the state name as it appears in the Correlates of War data.

Details

The ⁠data-raw/⁠ directory on Github contains more information about how these data were created. The code itself is derived from what peacesciencer did for its cow_gw_years data. It amounts to the creation of daily data for both systems before doing a "full join" on where there is the least friction: state abbreviations. This at least requires the least amount of clean-up.

Use of these data will merge only on the state code and year. The state abbreviations and state names are there for background information, where necessary/appropriate.

peacesciencer's documentation cautions that the differences between the two systems are obvious, if often overstated. Merging one into the other, where possible, will be unproblematic in almost all cases. The biggest headaches concern German unification, Yemeni unification, and the overall history of Serbia/Yugoslavia.

Examples


str(cw_gw_panel)
head(cw_gw_panel)



States (Correlates of War System)

Description

These are the independent states in the Correlates of War system.

Usage

cw_system

Format

A data frame with the following 5 variables.

ccode

a numeric vector for the Correlates of War state code

cw_abb

a character vector for the state abbreviation

cw_name

a character vector for the state name

start

a date for system entry

end

a date for system exit

Details

The end column is current as of Dec. 31, 2016. That date is reflected in the end column for states still active today.

References

Gleditsch, Kristian S. and Michael D. Ward. 1999. "A Revised List of Independent States since the Congress of Vienna." International Interactions 25(4): 393–413.

Examples


str(cw_system)
head(cw_system)


State-Year Panel for Merging Correlates of War Data (Gleditsch-Ward)

Description

This a state-year panel in which the Gleditsch-Ward state system is the population of interest. They are matched, as well as one can, with their corollaries in the Correlates of War system. Its primary use is merging in data demarcated in Correlates of War state system codes when the primary system in use is the Gleditsch-Ward system.

Usage

gw_cw_panel

Format

A data frame with the following 6 variables.

stateabb

the state abbreviation, which was the greatest source of agreement between both data sets

year

a numeric vector for the year

gwcode

a Gleditsch-Ward state code

ccode

a Correlates of War state code

gw_statename

the state name as it appears in the Gleditsch-Ward data.

cw_statename

the state name as it appears in the Correlates of War data.

Details

The ⁠data-raw/⁠ directory on Github contains more information about how these data were created. The code itself is derived from what peacesciencer did for its gw_cow_years data. It amounts to the creation of daily data for both systems before doing a "full join" on where there is the least friction: state abbreviations. This at least requires the least amount of clean-up.

Use of these data will merge only on the state code and year. The state abbreviations and state names are there for background information, where necessary/appropriate.

peacesciencer's documentation cautions that the differences between the two systems are obvious, if often overstated. Merging one into the other, where possible, will be unproblematic in almost all cases. The biggest headaches concern German unification, Yemeni unification, and the overall history of Serbia/Yugoslavia.

Examples


str(gw_cw_panel)
head(gw_cw_panel)


Democracy (Gleditsch-Ward System)

Description

These are estimates of democracy for Gleditsch-Ward state system members.

Usage

gw_democracy

Format

A data frame with the following 6 variables.

gwcode

a numeric vector for the Gleditsch-Ward state code

year

a numeric vector for the year

euds

a numeric vector for the extended Unified Democracy Scores (UDS) estimate in a given year

aeuds

a numeric vector for the adjusted, extended UDS estimate in a given year

polity2

a numeric vector for the polity2 score in a given year

v2x_polyarchy

a numeric vector for the Varieties of Democracy "polyarchy" estimate in a given year

Details

Extended Unified Democracy Scores (UDS) estimates come from Marquez' democracyData package. That is version 0.5.1. The Varieties of Democracy data are version 15, but also come by way of their R package.

The "adjusted" versions of the UDS estimate means that 0 represents the average cut-point for the dichotomous indicators. If it were my call to make, I think these "adjusted" estimates generally have greater face validity, certainly for obvious autocracies, even if one might object that they're somewhat less sanguine than they perhaps could or should be with obvious democracies. For the latest years in the sample, run a pnorm() on the values returned for illustrative cases like Afghanistan, Australia, China, North Korea, Sweden, and the United States to get an idea of the differences between these measures (as probabilistic assessments of whether the thing in question is a democracy).

References

Please cite Miller (2022) for peacesciencer. Beyond that, cite the following, contingent on which democracy estimate you are using.

Extended Unified Democracy Scores (UDS)

Marquez, Xavier. 2016. "A Quick Method for Extending the Unified Democracy Scores" doi: 10.2139/ssrn.2753830

Marquez, Xavier. 2020. "democracyData: A package for accessing and manipulating existing measures of democracy." https://github.com/xmarquez/democracyData.

Pemstein, Daniel, Stephen Meserve, and James Melton. 2010. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type." Political Analysis 18(4): 426-449.

Polity5

Marshall, Monty G. 2020. "Polity5: Dataset Users' Manual v2018". Center for Systemic Peace. https://www.systemicpeace.org

Varieties of Democracy

Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I. Lindberg, Jan Teorell, David Altman, Fabio Angiolillo, Michael Bernhard, Agnes Cornell, M. Steven Fish, Linnea Fox, Lisa Gastaldi, Haakon Gjerløw, Adam Glynn, Ana Good God, Sandra Grahn, Allen Hicken, Katrin Kinzelbach, Joshua Krusell, Kyle L. Marquardt, Kelly McMann, Valeriya Mechkova, Juraj Medzihorsky, Natalia Natsika, Anja Neundorf, Pamela Paxton, Daniel Pemstein, Johannes von Römer, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Aksel Sundström, Marcus Tannenberg, Eitan Tzelgov, Yi-ting Wang, Felix Wiebrecht, Tore Wig, Steven Wilson and Daniel Ziblatt. 2025. "V-Dem Country-Year Dataset v15" Varieties of Democracy (V-Dem) Project. doi: 10.23696/vdemds25.

Maerz, Seraphine, Amanda Edgell, Sebastian Hellmeier, Nina Ilchenko, Linnea Fox. 'Vdemdata - an R package to load, explore and work with the most recent V-Dem (Varieties of Democracy) and V-Party datasets'. Varieties of Democracy (V-Dem) Project. 2025. https://www.v-dem.net and https://github.com/vdeminstitute/vdemdata

Examples


str(gw_democracy)
head(gw_democracy)


GDP, Population, and GDP per Capita (Gleditsch-Ward System)

Description

These are estimates of democracy for Gleditsch-Ward state system members.

Usage

gw_gdppop

Format

A data frame with the following 8 variables.

gwcode

a numeric vector for the Gleditsch-Ward state code

year

a numeric vector for the year

mrgdppc

a numeric vector for the estimated GDP per capita in a given year. See Details.

sd_mrgdppc

a numeric vector for the standard deviation of estimated GDP per capita in a given year.

pwtrgdp

a numeric vector for the estimated real GDP in a given year. See Details.

sd_pwtrgdp

a numeric vector for the standard deviation of estimated real GDP in a given year.

pwtpop

a numeric vector for the estimated population in a given year. See Details.

sd_pwtpop

a numeric vector for the standard deviation of estimated population in a given year.

Details

Based on my reading of Fariss et al. (2022), I think the following information gathered from their simulations make sense for suggested defaults. You may want to get their actual simulations if you want something else, but I think what's included here is good for most use cases.

For additional clarification, the suggested defaults included in this data set are:

The GDP per capita measure is anchored around the Maddison Project Database. The GDP and population measures are anchored around Penn World Tables (10.0). You can create a rough estimate of GDP per capita from the Penn World Table simulations based on the information in this data set. It's free and the cops can't stop you.

I also honor the authors' suggestion to include the standard deviation of these estimates as well. Everyone likes a point estimate, but variation of uncertainty around the estimate is also important.

References

Please cite Miller (2022) for peacesciencer. Cite Fariss et al. (2022) for the simulations. You should also cite the Maddison Project Database (Bolt et al. 2018) and Penn World Table (Feenstra et al. 2015) if that is the underlying source of the data that Fariss et al. (2022) are estimating.

Bolt, Jutta, Robert Inklaar, Herman de Jong, and Luiten Janvan Zanden. 2018. "Rebasing 'Maddison': New Income Comparisons and the Shape of Long-Run Economic Development." Maddison Project Working paper 10.

Fariss, Christopher, J., Therese Anders, Jonathan N. Markowitz, and Miriam Barnum. 2022. "New Estimates of Over 500 Years of Historic GDP and Population Data." Journal of Conflict Resolution 66(3): 553–91.

Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer. 2015. "The Next Generation of the Penn World Table." American Economic Review 105(10): 3150–82.

Examples


str(gw_gdppop)
head(gw_gdppop)


States (Gleditsch-Ward System)

Description

These are the independent states and microstates in the Gleditsch-Ward system.

Usage

gw_system

Format

A data frame with the following 6 variables.

gwcode

a numeric vector for the Gleditsch-Ward state code

gw_abb

a character vector for the state abbreviation

gw_name

a character vector for the state name

microstate

a numeric vector for whether the state is a microstate. 1 = microstate. 0 = not a microstate

start

a date for system entry

end

a date for system exit

Details

The end column is current as of Dec. 31, 2020. That date is reflected in the end column for states still active today.

References

Gleditsch, Kristian S. and Michael D. Ward. 1999. "A Revised List of Independent States since the Congress of Vienna." International Interactions 25(4): 393–413.

Examples


str(gw_system)
head(gw_system)


Create a Panel of State-Years from the Correlates of War or Gleditsch-Ward system.

Description

state_panel() is a function to create a panel of state-years from one of two major state systems in international relations scholarship.

Usage

state_panel(system = "cow", mry = TRUE)

Arguments

system

a state system (either "cow" or "gw")

mry

logical, defaults to TRUE. If TRUE, the panel created extends to the most recently concluded calendar year. If FALSE, the panel created ends at the year of last update. See details section for more.

Details

This function leans on cw_system and gw_system in this package.

The Correlates of War system's last year is 2016. The Gleditsch-Ward system's last year is 2020. This information matters for the mry argument in the function.

Value

state_panel() returns a data frame of state years corresponding with either the Correlates of War or the Gleditsch-Ward system.

Examples


head(state_panel(), 10)
head(state_panel(system='gw'), 10)