CAI Yishu

Ph.D. Candidate | CityU of HK

Let every man be his own methodologist, let every man be his own theorist. ― C. Wright Mills


Drive


Codes and Repositories

Twitter Scraper without API: snscrape

Twitter banned its free API in Sept, 2020, invalidating these API-based libraries such as GetOldTweets and twint.

The snscrape mimics the user scrolling and is not dependent on the API. Despite the lower accuracy and speed, it is a satisfying substitute for once-popular liberties.

You can get a Jupyter Notebook written by me that scrapes Twitters by IDs and post times here. I set up an automatic retry in this code, so you don’t have to worry about your IP address being blocked.

Web Scraping using Stata

Although Stata is not commonly used in web scraping, it performs very well on some simple table crawling tasks. An example is using Stata to scrape the CSR index from HeXun.

Codes of Textual Data Processing and Topic Modeling

Management researchers use topic modeling based on textual data to capture phenomenon-based constructs. I upload a Notebook processing earnings call transcripts and implementing topic modeling for replication and reference.

These codes are written for my research project. The earnings call transcripts are scraped from Seeking Alpha.

Batch Data Conversion from Excel to Stata

You can copy codes below to Stata to convert hundreds of excel files to dta format and label each variable.

clear all
set excelxlsxlargefile on
captur install fs
fs *.xls *.xlsx
// convert all Excel files to Stata files
foreach file in `r(files)' {
  di "`file'"
  import excel "`file'", clear
  save "`file'.dta", replace
}
// assign a label to each variable through a loop statement
fs *.dta
foreach file in `r(files)'{
use "`file'", clear
di "`file'"
qui foreach v of varlist _all{
	local v_name = `v'[1]
	local v_lable = `v'[2] + "_" + `v'[3]
	di "`v'"
	cap rename `v' `v_name'
	cap la var `v_name' "`v_lable'"
}
	drop in 1/3
	ren *,lower
	foreach v in varlist _all {
		cap destring `v', replace
	}
	save, replace	
}

Databases

Religion, SMO & Globalization

Some open-source data about the globalization of religions and SMOs:

Programs or Studies Description
  Compiled by CAI Yishu
1. Global Religious Futures Project Analyzes religious change and its impact on societies around the world.
2. Social Movements and Religion Database The most comprehensive archive of religious social movements. For visualization, click the “Site Tutorials.”
3. Religious Landscape Study About the general social and political views.
4. The Religious Movements Homepage Project Detailed profiles of more than two hundred different religious groups and movements.
5. Kirmani (2008) Also recommend you to refer to the University of Birmingham’s project: “Religions and Development”

Careers of Chinese Local Officers

This file tracks the careers of Chinese local officials as of 2020. It covers data from the county- to provincial-level.

Chinese-English Name Translation

A website supported by the Taiwan government translates Chinese names to English based on “漢語拼音”, “通用拼音”, “威妥瑪拼音”, and “國音第二式拼音”.

Calibre Library

A mirror of my Calibre Library.