现在公式暂时还不能正常显示,由于是这个issue
等作者修好了我再来更新吧
Assumptions
- linearity of the conditional probability
- Strict exogeneity: Erros are uncorrelated with indepedent variables (X)
- If violated, it is called endogeneity
- No multicollinearity: All regressor variables are linearly independent
- Variance of erros should be constant: it is called homoscedasticity.
- If violated, it is called heteroscedasticity
- Errors have No serial correlation/autocorrelation
- Errors are normally distributed
- Errors are independent and identically distributed
Estimation Model
- coefficients: $$\beta = (X^TX)^(-1)(X^TY)$$
- variance of coefficients: Var(\beta|X) = ^(\sigmaerr^2)/((n - 1)s_x^2)
- More variance in the noise means \beta is more variable
- Larger sample variance means smaller variance of coefficients. It is because it’s much easier to estimate the coefficients
- Higher sampling frequency reduce variance
Variance, Sum of Squares and R^2
TSS: total sum of squares
TSS = SUM of (Y_i - \overline{Y})^2
It is the total variance in oberserved dependent variableRegression SS:
RSS = SUM of (Y_fit - \overline{Y})^2
total variance in fitted observed dependent variablesResidual error SS:
RESS = SUM of (Y_i - Y_fit)^2R^2
R^2 = 1 - ^(RESS/_TSS)
R^2 is the sample covariance between Y and Y_fit- Special case: Single X variable
R^2 measures the sample covariance between Y and X
- Special case: Single X variable
Adjusted R^2
R^2 increases with number of parameters
Adjusted R^2 is adjusted by the degree of freedom
adj-R^2 = 1 - ^RSS(n - p - 1)^(-1)/_(TSS(n - 1)^(-1))Durbin-Watson Test
Test if there is serial correlation in residuals/autocorrelation
If the p-value from the test is low, it indicates they are probably autocorrelation in noiseACF
ACF graph is used to look for potential serial correlation at a number of lags
Testing
Test if multiple coefficients are significant (not zero)
F-test- This can be used to compare two models that one of the model has a subset of variables
Model Selection Criteria
- AIC & BIC
The smaller the error variance, the smaller AIC/BIC but it is penalized by number of variables - R^2
- AIC & BIC
Variance inflation factor (VIF)
- Measures how much the variance increases by including other predictor variables (test for multicollinearity)
- Calculate by runnning regression of X_j on X_1 … Xn
and get R^2: VIF = ^1/(1 - R^2)
Violation of Assumptions
- Multicollinearity
If two or more variables are strongly correlated, it brings in multicollinearity problem- the standard error of coefficients increases
- It’s harder to seperate effects for correlated variables
- Estimated coefficients are highly sensitive to whether the correlated variables exists