---
title: "GGO states codebook"
output: rmarkdown::html_vignette
author: James Hollway
date: "2025-09-19"
vignette: >
  %\VignetteIndexEntry{GGO states codebook}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
```{r setup, echo=FALSE, message=FALSE}
library(manystates)
```
## Release 1.0
This document provides a brief overview of the coding rationale for key variables in the list of episodes of independent states and state-like entities in the international system provided in `manystates::states$GGO`. 
Note that this dataset was constructed as a complement to datasets such as the Gleditsch and Ward Revised List of Independent States (`manystates::states$GW`) and Butcher and Griffiths’ International System(s) Dataset (`manystates::states$ISD`). 
As such, it is incomplete in observations nor variables, yet offers some more specificity and some additional entries compared to such other datasets.
Work on this dataset was supported by the Swiss National Science Foundation (SNSF)
[Grant Number 188976](https://data.snf.ch/grants/grant/188976): 
"Power and Networks and the Rate of Change in Institutional Complexes" (PANARCHIC).
Please direct all comments and suggestions to:
_James Hollway_
_International Relations/Political Science Department_
_Graduate Institute of International and Development Studies_
_Geneva, Switzerland_
_james.hollway@graduateinstitute.ch_
## States
### StateName, StateNameAlt
This is the name or names of the state or state-like entity. 
Since the dataset includes entities (or dates placing these entities) 
before the advent of the modern interstate system, 
the definition of a state has changed but we include them here for reasons
of comprehensivity.
Where there are alternative or longer forms of the name of the state name,
or names in other languages, these are included in the `StateNameAlt` variable.
The shorter or more common name is preferred for the `StateName` variable,
so long as it is unambiguous.
### stateID
This is the three-letter code associated with the state or state-like entity. 
These three-letter codes are based on the ISO 3166-1 alpha-3 list, and all codes are consistent with it, 
however additional codes have been added to cover historical and other states that are not covered by the ISO’s own list. 
Where possible, we use the Correlates of War three-letter codes for this purpose, or those used in the `GW` or `ISD` datasets. 
However, in some cases we must select new codes and in such situations, 
we aim to use recognisable, unique codes relying on significant consonants or vowels.
Note that we endeavour to use existing codes where possible for state episodes that are substantially similar in territory and involve some inheritance of the international legal obligations, rights, and recognitions of the predecessor states. 
For this reason there is a series of episodes associated with "RUS", for example, ranging from the Russian Empire, through the USSR, to the Russian Federation. 
However, where the state is not considered the legal successor state, 
for example Serbia is not considered the legal successor of Yugoslavia, 
we use different stateID codes (in this case "SRB" and "YUG"). 
In cases of dissolution (see below), the old stateID code should cease, 
whereas in cases of secession, the old stateID code should continue for the rump state.
## Dates
### Begin, End
These are the dates when an episode of state independence is deemed to have begun or ended. 
Dates are coded using the messydates system. 
This implements ISO’s extended date/time format. 
As such, some dates are only entered as a year or are annotated with a question mark if the source is uncertain. 
For more details see `{messydates}`.
States that are currently independent have an end date `9999-12-31`. 
This distinguishes them from missing data, which is always coded `NA`.
### Basis
The basis is coded as how the episode of state independence began. 
We adopt many of the categories offered in the ISD dataset,
but add some additional categories to improve specificity:
- _Consolidation_: state created over territory where no unified state previously existed,
often uniting smaller local polities into a single entity
- _Decolonisation_: state born from decolonisation of an empire or colonial metropole,
including the conclusion of a protectorate or trusteeship arrangement
- _Dissolution_: state born as a fragment of a larger state that broke apart 
and ceased to exist (e.g. Austro-Hungarian Empire)
- _Liberation_: state restored after a period of non-existence,
for example following occupation or annexation (e.g. Belgium after WWII occupation)
- _Secession_: state secedes or breaks away from larger state or empire that continues to exist
- _Transformation_: state continues in substance but changes its 
constitutional form, title, or status without foreign conquest or voluntary unification
(e.g. Tsardom of Russia 1721 to Russian Empire)
- _Unification_: state born from the voluntary merging of several 
(typically equally sized) states that previously existed,
e.g. UAE in 1971
- _Other_: for unusual or unclear cases;
to be used sparingly with an explanation or elaboration required in the comments
Where the code is followed by a `?` annotation, this indicates uncertainty about the coding.
### Grounds
The grounds is coded as how the state ended. We use the categories offered in the ISD dataset:
- _Annexation_: state taken over by conquest/foreign take-over (e.g. Aceh in 1874 by the Netherlands)
- _Colonisation_: state subjected to imperial, non-contiguous colonisation, 
becomes a protectorate, or vassal (e.g. Mewar 1818 under British protection)
- _Unification_: state ceases through process of voluntary unification or incorporation
(e.g. Croatia 1102 into Hungary)
- _Dissolution_: state ceases through dissolution of the state into several smaller states
(e.g. Gran Colombia 1830)
- _Occupation_: state ceases through occupation by outside powers
(e.g. Albania by Italy in 1939)
- _Partition_: state ceases through partition by outside powers or scission
(e.g. Poland 1795)
- _Revolution_: state ceases through internal revolution or coup
(e.g. Russian Empire 1917)
- _Transformation_: state continues in substance but changes its 
constitutional form, title, or status without foreign conquest or voluntary unification
(e.g. Tsardom of Russia 1721 to Russian Empire)
- _Other_: for unusual or unclear cases;
to be used sparingly with an explanation or elaboration required in the comments
Where the code is followed by a `?` annotation, this indicates uncertainty about the coding.
## Places
### Capital, CapitalAlt
This is the name of the capital city. 
For the most part, this is fairly straightforward, 
however in some cases there is a second capital city, 
in which case this will appear in the `CapitalAlt` variable.
### Latitude, Longitude
Here we use the latitude and longitude in decimal form. 
If possible, we code the location of the capital city. 
If this is not possible, we attempt to identify the longitude and latitude of the barycentre of the territory.
### Region
We code the region more specifically than in some other datasets. 
We code the region descriptively and as a character string, 
which affords the opportunity to search by regular expression such as “America” to get “Northern America”, “Southern America”, “Central America”, and “Caribbean America”. 
Note that we use the adjectival form, e.g. “Southern Africa”, to distinguish the region from the country “South Africa”. 
We use “Central” to describe areas in the middle of the continent, if applicable. 
The data includes the following regions:
- _Northern America_
- _Southern America_
- _Central America_
- _Caribbean America_
- _Northern Europe_
- _Eastern Europe_
- _Southeastern Europe_
- _Southern Europe_
- _Western Europe_
- _Central Europe_
- _Eastern Asia_
- _Southeastern Asia_
- _Southern Asia_
- _Western Asia_
- _Central Asia_
- _Northern Africa_
- _Eastern Africa_
- _Southern Africa_
- _Western Africa_
- _Central Africa_
- _Oceania_
## Coder, Comments, Source
The `Coder` variable is a comma separated vector of the surnames of those who have added or verified data for each entry/observation. 
Where special conditions arise, the `Comments` variable offers a free text area for explanations or recording how the coding has changed from version to version. 
The `Source` variable should contain only links or bibliographic information for the sources used to add or verify information.