*------------------------------------------* * Estimating formal/informal by nighttime light *------------------------------------------* clear clear matrix set more off cd E:\Cambodia\Informality\data_program import excel using informal_dataset.xls, sh("data") first clear *Variable construction *convert one month figures into one year gen lsf = ln(12*salfml) gen lsi = ln(12*salinf) gen lef = ln(12*expfml) gen lei = ln(12*expinf) *Light Intensity gen ll = ln(rlight3) *Percentage share of lightened area gen tmp = light_area3 replace tmp=area if light_area3>=area gen la = tmp/area gen ar = ln(area) drop tmp* *Control variables foreach X in ll la ar { gen `X'sq = `X'^2 gen `X'cb = `X'^3 } *Drop year 1992 & 2012 drop if year==1992 | year==2012 *----------------------------------------------- *Summary statistics: table 4 qui reg lsi ll llsq la lasq ar arsq, noc robust su lsi lei lsf lef if e(sample) su ll la ar if ll!=. *Regression: table 5 local Z ll llsq la lasq ar arsq reg lsi `Z', noc robust est store r1 reg lei `Z', noc robust est store r2 reg lsf `Z', noc robust est store r3 reg lef `Z', noc robust est store r4 #delimit ; esttab r1 r2 r3 r4 using regression.csv, replace b(a2) se(a2) star(+ 0.10 * 0.05 ** 0.01) scalar("r2 R-squared" "rmse RMSE" "N No. of obs.") ; #delimit cr *-------------------------------------------- *Robustness check *1. Including cubic terms local W ll llsq llcb la lasq lacb ar arsq arcb reg lsi `W', noc robust reg lei `W', noc robust reg lsf `W', noc robust reg lef `W', noc robust *2. Excluding area variables local Z ll llsq la lasq reg lsi `Z', noc robust reg lei `Z', noc robust reg lsf `Z', noc robust reg lef `Z', noc robust *------------------------------------------------------- *Predicting past data *1. create an index *2. multiply index with actual sales/expenses in 2011 local Z ll llsq la lasq qui reg lsi `Z', noc robust predict yhat_si qui reg lei `Z', noc robust predict yhat_ei qui reg lsf `Z', noc robust predict yhat_sf qui reg lef `Z', noc robust predict yhat_ef *step 1: index series gen tmp=yhat_si if year==2011 egen tmp1=mode(tmp), by(spid2) gen idx_inf = yhat_si/tmp1 replace idx_inf=1 if year==2011 drop tmp* gen tmp=yhat_sf if year==2011 egen tmp1=mode(tmp), by(spid2) gen idx_fml = yhat_sf/tmp1 replace idx_fml=1 if year==2011 drop tmp* gen tmp=yhat_ei if year==2011 egen tmp1=mode(tmp), by(spid2) gen idx_infe = yhat_ei/tmp1 replace idx_infe=1 if year==2011 drop tmp* gen tmp=yhat_ef if year==2011 egen tmp1=mode(tmp), by(spid2) gen idx_fmle = yhat_ef/tmp1 replace idx_fmle=1 if year==2011 drop tmp* *step 2: monthly sales -> yearly sales/expenses in mil. USD gen tmp=salinf*12/1000000 if year==2011 egen tmp1=mode(tmp), by(spid2) gen salinf_prd = tmp1*idx_inf drop tmp* gen tmp=salfml*12/1000000 if year==2011 egen tmp1=mode(tmp), by(spid2) gen salfml_prd = tmp1*idx_fml drop tmp* gen tmp=expinf*12/1000000 if year==2011 egen tmp1=mode(tmp), by(spid2) gen expinf_prd = tmp1*idx_infe drop tmp* gen tmp=expfml*12/1000000 if year==2011 egen tmp1=mode(tmp), by(spid2) gen expfml_prd = tmp1*idx_fmle drop tmp* *Country-level share: table 6 tabstat salfml_prd salinf_prd, stat(n) by(year) tabstat salfml_prd salinf_prd, stat(sum) by(year) *District-level trends: Figure 2 gen informal_share=100*salinf_prd/(salinf_prd+salfml_prd) egen Min = min(informal_share), by(year) egen Max= max(informal_share), by(year) egen Mean= mean(informal_share), by(year) egen Median=median(informal_share), by(year) egen p75 = pctile(informal_share), p(75) by(year) egen p25 = pctile(informal_share), p(25) by(year) preserve keep if spid2==102 list year Mean Min p25 Median p75 Max #delimit ; sc p25 year, c(l) msymbol(o) title("", size(medium)) ytitle("Percentage Share") ylabel(70(10)100,angle(horizontal)) xtitle("Year") xlabel() legend(label(1 "25th") label(2 "Median") label(3 "75th") labe(4 "Mean") rows(1)) || sc Median year, c(l) msymbol(X) || sc p75 year, c(l) msymbol(d) || sc Mean year, c(l) msymbol(Sh) ; graph export Informal_share.tif, width(1000) replace ; #delimit cr restore