Codebooks

In short, a codebook is a document which describes the content, structure, and the variables of a dataset. We normally use the codebook to understand a dataset.

A minimum structure of a codebook is:

  • Introduction: Previous considerations, definitions, etc.
  • Description of the variables.

And a minimum structure of a dataset is:

  • Units of analysis and codes.
  • Relevant variables
  • Other variables

How to use codebooks (castellano | català)

Democracy and Dictatorship dataset

A simple example of a dataset is the Democracy and Dictatorship (DD) dataset, also known as the Cheibub-Gandhi-Vreeland (CGV) index (Cheibub, Gandhi, and Vreeland 2010). This dataset can be found in the personal webpage of one of the authors, José Antonio Cheibub.

Using Excel

Since the DD dataset is relatively short (9159 observations and 78 variables), a good choice for learning is to open it using Excel or Google Sheets. The video attached to this page shows how to explore the data via Google Sheets dynamic tables. The procedures are the following:

  • Download the Excel file in the author’s webpage.
  • Upload the Excel file in Google Sheets.
  • Select all the rows and colums.
  • Go to Data -> Create Filter

Using R

R requires a more sophisticated knowledge of data wrangling, but this type of software is essential with large datasets (the DD dataset has more than 9.000 observations). The next lines of code show how to upload the DD dataset in R and to display the first six rows. When we upload a dataset to R we must convert it to an R object, normally a dataframe. In Table 1 we display the result of applying the function head() to the dd dataframe.

library(foreign)
library(dplyr)
dd <- as_tibble(read.dta("https://uofi.box.com/shared/static/bba3968d7c3397c024ec.dta"))
head(dd)
Table 1: Democracy and Dictatorship dataset
orderctrynameyearaclpcodecowcodecowcode2ccdcodeletccdcodenumaclpyearcowcode2yearcowcodeyearchgterrychgterrflagc_cowcode2flage_cowcode2entryyexitycidwdicodeimf_codepolitycodebankscodedpicodeuncodeun_regionun_region_nameun_continentun_continent_nameaclp_regionbornyearendyeardupcowdupwdidupundupdpidupimfdupbanksexseleclegseleccloseddejuredefactodefacto2lpartyincumbtype2collectnheadsnmilnheadnpostndateeheadsageehemilroyalheaddiffeheadepostedatetenure08commecens08edeathflagehdemocracyassconfidpoppreselecregimettttdttaflagcflagdemflagregagedemageregstra
1Afghanistan1946142700700AFG1142194670019467001946001019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000010Mohammad Zahir Shahking11.08.33014010Mohammad Zahir Shahking11.08.33200111000500011118180
2Afghanistan1947142700700AFG1142194770019477001947000019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000000Mohammad Zahir Shahking015010Mohammad Zahir Shahking200110000500000019190
3Afghanistan1948142700700AFG1142194870019487001948000019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000000Mohammad Zahir Shahking016010Mohammad Zahir Shahking200110000500000020200
4Afghanistan1949142700700AFG1142194970019497001949000019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000000Mohammad Zahir Shahking017010Mohammad Zahir Shahking200110000500000021210
5Afghanistan1950142700700AFG1142195070019507001950000019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000000Mohammad Zahir Shahking018010Mohammad Zahir Shahking200110000500000022220
6Afghanistan1951142700700AFG1142195170019517001951000019462008700AFG51270010AFG434Southern Asia142Asia919192008000000322000000000Mohammad Zahir Shahking019010Mohammad Zahir Shahking200110000500000023230

These are most common functions to explore a dataframe, all applied to the dd dataframe:

  • head(dd): Displays the first 6 rows.
  • tail(dd): Displays the last 6 rows.
  • dim(dd): Displays the rows and columns.
  • glimpse(dd): Displays all the columns and the first observations of the dataframe in a vertical format.
  • View(dd): Opens a spreadsheet similar to Excel.

Other codebooks

Other codebooks you might want to explore are the following ones:

References

Barbieri, Katherine, and Omar M. G. Keshk. 2016. Correlates of War Project Trade Data Set Codebook, Version 4.0. Online.”
Barbieri, Katherine, Omar M. G. Keshk, and Brian M. Pollins. 2009. Trading Data: Evaluating our Assumptions and Coding Rules.” Conflict Management and Peace Science 26 (5): 471–91.
Cheibub, José Antonio, Jennifer Gandhi, and James Raymond Vreeland. 2010. Democracy and Dictatorship Revisited.” Public Choice 143 (2-1): 67–101.
Pettersson, Therese. 2020. UCDP Dyadic Dataset Codebook v 20.1.”
Pevehouse, Jon C. W., Timothy Nordstron, Roseanne W. McManus, and Anne Spencer Jamison. 2019. Tracking Organizations in the World: The Correlates of War IGO Version 3.0 datasets.” Journal of Peace Research 57 (3): 492–503.
Next