Question 1 Show that the univariate ols model \(y_i = \beta x_i + \epsilon_i\) is identified when \(E(x\epsilon)=0\) and \(var(x)>0\).
Question 2 Derive the bias of the sample variance.
require(data.table)
require(kableExtra)
require(ggplot2)
require(texreg)
require(readstata13) #to install, run install.packages("readstata13")
require(sandwich)
options(knitr.table.format = "html")
Here is some relevant material:
To install these packages
install.packages("lmtest")
install.packages("sandwich")
We are going to reproduce an exercise similar to the example for the computation of standard error. Start by downloading the CPS data from here. We first load the data into R.
# replace this with the path to your download folder
data = read.dta13("../data/CPS_2012_micro.dta")
data = data.table(data)
data$age = as.numeric(data$age)
Next generate a fictuous policy that you randomly assigned at the state times gender level. Run the regression and report standard errors given by R for one draw of the poilcy.
set.seed(60356548) # I fix the seed to make sure the draws are reproducible
data <- data[,fp := runif(1)>0.5, statefip]
fit1 = lm(lnwage ~fp,data)
htmlreg(fit1,single.row=TRUE)
Model 1 | |
---|---|
(Intercept) | 2.68 (0.00)*** |
fpTRUE | -0.02 (0.00)** |
R2 | 0.00 |
Adj. R2 | 0.00 |
Num. obs. | 65685 |
***p < 0.001; **p < 0.01; *p < 0.05 |
Note We do not control for state specific fixed effect as these would would be perfectly colinear with the policy.
Now this is surprising. We generated fp
randomly across
states and so we should have that when the number of states becomes very
large \(E(\epsilon_i fp_i)=0\). To gain
understanding on what is happening we will generate our own data in a
way where we control exactly what is happening.
Let’s start by reassuring ourselves. Let’s use an IID data generating process (DGP), run the regression and check the significance.
lnwage
in the sample. This is
an estimate of our homoskedastic error.y2
by adding to
fp
a normal error with the estimated variance, and truly
independent across individuals. Use y2:=rnorm(.N)*var_est
inside your data.table data.y2
on fp
, our
fictuous policy and collect the coefficient, also save if the
coefficient is significant at 5%.Question 3 Follow the
previous steps and report the rejection rate of the test on
fp
. You should find something close to 5% and you should
feel better!
Now we want to compute heteroskedastic robust standard errors which requires us to use some co-variates. We then want to repeat the previous procedure, but we are going to use a different test for the significance. We then want to construct our variance co-variance matrix using the following formula:
\[ V =(X'X)^{-1} X' \Omega X'
(X'X)^{-1} \] where \(\Omega = diag
\{ \epsilon_i^2 \}\). Using vcovHC with type
type="const"
and type="HC0"
will do that for
you!
We want to check this by simulating from a model with heteroskedesatic errors. To do so we are going to use linear model for the variance.
lnwage ~ yrseduc + age + I(age^2)
and regress the square of
the residual on the same co-variates formula to get an estimate of the
heteroskedastic variance.s
.pred
.s
and adding the pred
.fp
using vcovHC with type type="const"
and
type="HC0"
.Question 4 Follow the steps and report the rejection rate for each of the variance evaluation.
We are again here going to try to simulate corrolated error within state. For this we pick a correlation parameter \(\rho\). Then, to simulate we are going to draw the first individual in an iid way, then using an auto-regressive structure to compute the error of the following people. Given \(\rho\) it can be done in the following way:
fit0 = lm(lnwage ~ yrseduc + age + I(age^2),data)
data <- data[,yhat := predict(fit0)]
rho = 0.8
data <- data[, res_hat := {
r = rep(0,.N)
r[1] = rnorm(1)
for (i in 2:.N) {
r[i] = rho*r[i-1] + rnorm(1)
}
r
},statefip]
data <- data[,y2:= yhat + res_hat]
data <- data[,fp := runif(1)>0.5, statefip]
fitn = lm(y2 ~ fp+yrseduc + age + I(age^2),data)
#summary(fitn)
htmlreg(fitn,single.row=TRUE,omit.coef="state")
Model 1 | |
---|---|
(Intercept) | -0.50 (0.09)*** |
fpTRUE | -0.03 (0.01)** |
yrseduc | 0.10 (0.00)*** |
age | 0.07 (0.00)*** |
age^2 | -0.00 (0.00)*** |
R2 | 0.04 |
Adj. R2 | 0.04 |
Num. obs. | 65685 |
***p < 0.001; **p < 0.01; *p < 0.05 |
Question 5 Explain the
expression that starts with data[, res_hat := {...
Question 6 For \(\rho=0.7,0.8,0.9\) run 500 replications and report the proportion at each value of replication for which the coefficient on our ficutous policy was significant at 5%.
We have not covered this in class yet, but one could instead try to resample the data.
Use the following procedure:
Note do not redraw
fp
!
Question 7 Report the 0.05 and 0.095 quantiles for the regression coefficients. This is a test at 10%, does this interval include 0?