How to clean a very untidy data set with Freedom House country ratings, saved in an Excel sheet, which violates many principles of data organization in spreadsheets described in this paper by Karl Broman and Kara Woo, but otherwise is an invaluable source of data on freedom in the world?
Data source: https://freedomhouse.org/content/freedom-world-data-and-resources
The full code used in this post is available here.
I would do this:
Read in the file,
Data Packages Varieties of Democracy (V-Dem): Dedicated package Polyarchy: Semicolon delimited CSV file -> rio Freedom House: Excel file with by-year sheets Polity IV: SPSS file -> rio Democracy Barometer: Excel file with header in top rows -> rio The Standardized World Income Inequality Database (SWIID): Plain CSV file -> rio World Bank’s World Development Indicators: Dedicated package Merging all datasets Writing to file Shortly after writing this post on importing datasets in different formats (CSV, XLS, XLSX, SAV) to R, I got the following comment:
Data Packages Varieties of Democracy (V-Dem): Dedicated package Polyarchy: Semicolon delimited CSV file Freedom House: Excel file with by-year sheets Polity IV: SPSS file Democracy Barometer: Excel file with header in top rows The Standardized World Income Inequality Database (SWIID): Plain CSV file World Bank’s World Development Indicators: Dedicated package Merging all datasets Country graphs Variable graphs Writing to file with Viktoriia Muliavka
Social and political scientists often need to put together datasets of country-level political, economic, and demographic variables with data from different sources.
Instructions References In the previous post I wrote about downloading and exploring the Survey Data Recycling (SDR), version 1 dataset, which consists of selected harmonized variables from 22 survey projects, 1966-2013.
The SDR project will develop a website for browsing, subsetting, downloading, and visualizing data from the SDR project. This website is currently under construction. Meanwhile, I made a Shiny app with basic functionalities of the future on-line browsing and subsetting tool (also serves as its mock-up): https://mkolczynska.
Introduction Downloading the SDR data Exploring SDR: availability of variables by project Exploring SDR: availability of variables with different formulations Identifying surveys containing selected variables Subsetting the Master File Country coverage plot Combining data from different survey projects creates new opportunities for research, alas, at the cost of increased volume (obviously) and complexity of the data. The Survey Data Recycling project created a dataset with data from 22 international survey projects.