Data

I am involved in multiple ongoing data collection processes related to Japanese government expenditures and personnel movements. All data sets will be made publicly available upon completion and validation:

  • Amakudata (with Sayumi Miyano, Diana Stanescu, and Hikaru Yamagishi): A dataset of all Japanese bureaucrats who have retired to positions in the private sector (i.e. revolving door or “amakudari” appointments) from 2009 - 2019. While the full dataset is forthcoming, an R Shiny dashboard—Amakudashboard— that allows users to explore the dataset is currently live.
    Data visualizations
    Data dictionary
    • Coming soon
  • jNPO: A dataset of all subsidies and contracts from the Japanese government to nonprofit organizations (NPOs) from 2011 - 2021, including the agency or ministry which provided the subsidy or made the purchase, the NPO that received the subsidy or contract, and the value of the subsidy or contract.

  • jProcurement (with Hikaru Yamagishi): A dataset of all products procured from the private sector by the Japanese government from 2003 - 2018, including the agency or ministry which made the purchase, the company the product was purchased from, and the value of the contract.

Software

read_dir: R function that can be used to concatenate all data files with common columns from a directory into a single data set, and creates an optional column identifying the name of the file for each row.

Code
# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# DESCRIPTION ----
# ______________________________________________________________________________

# Last updated 7 April, 2022 by Trevor Incerti

# This file contains a function that can be used to concatenate all data 
# files with common columns from a directory into a single data set, 
# and creates an optional column identifying the name of the file for 
# each row. 

# This can be useful for e.g., administrative data provided in individual 
# files by city. The current function supports any delimited text data files 
# and Excel files. Support for other data types will be added. 

# ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# REQUIRED LIBRARIES AND HELPER FUNCTIONS ----
# ______________________________________________________________________________

# Import/define pipe operator from magrittr ------------------------------------
`%>%` <- magrittr::`%>%`

# Helper functions -------------------------------------------------------------
read_flnm <- function(flnm, delim = NULL, skip = NULL) {
  read_delim(flnm, delim = delim, skip = skip, 
             col_types = cols(.default = "c")) %>% 
    mutate(filename = tools::file_path_sans_ext(fs::path_file(flnm)))
}

read_flnm_xl <- function(flnm, sheet = NULL, skip = NULL, col_types = NULL) {
  readxl::read_excel(flnm, sheet = sheet, skip = skip, col_types = col_types) %>% 
    mutate(filename = tools::file_path_sans_ext(fs::path_file(flnm)))
}

# Main function: read in and append all files in a directory ------------------ 
# Function arguments:
# Path = filepath of directory where data files are located.
# Extension = data files extension. Currently accepts:
# all extensions compatible with readr::read_delim and "xlsx" for Excel.
# delim = Single character used to separate fields within a record, e.g. ",".
# sheet = Sheet to import if importing from Excel. 
# skip = Number of rows to skip when importing each file.

read_dir = function(path, extension, delim, filename, sheet = NULL, skip = 0,
                    col_types = NULL) {
  
  # Stop and display errors if conflicting arguments are entered
  if (!missing(sheet) & extension != "xlsx") {
    stop("Error: Argument 'sheet' only applies to Excel files")
    
    # Read in delimited text data files
  } else if (filename == FALSE & extension != "xlsx") {
    list.files(path = path,
               pattern = paste0("*.", extension),
               full.names = T) %>%
      purrr::map_df(~read_delim(., delim = delim, skip = skip, 
                                col_types = cols(.default = "c")))
    
  } else if (filename == TRUE & extension != "xlsx") {
    list.files(path = path,
               pattern = paste0("*.", extension),
               full.names = T) %>%
      purrr::map_df(~read_flnm(., delim = delim, skip = skip))
    
    # Read in Excel data files  
  } else if (extension == "xlsx" & filename == F) {
    list.files(path = path,
               pattern = paste0("*.", extension),
               full.names = T) %>%
      purrr::map_df(~readxl::read_excel(., sheet = sheet, skip = skip,
                                        col_types = col_types))
    
  } else if (extension == "xlsx" & filename == T) {
    list.files(path = path,
               pattern = paste0("*.", extension),
               full.names = T) %>%
      purrr::map_df(~read_flnm_xl(., sheet = sheet, skip = skip,
                                  col_types = col_types))
  }
}