![]() R² is the explained sum of squared errors divided by the total sum of squared errors. The first session of the summary table has R² and F-statistic, which measure the overall explainability of the independent variables over the dependent variable. Now it is time to come back to the OLS Regression Results table and try to interpret the summary results. Standard error is the variance of sample parameter.ģ.3 How To Interpret OLS Statistical Summary? Sample distribution of β follows t distribution, because we do not exactly know the variance of population residual variance. import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm AAPL_price = pd.read_csv('AAPL.csv',usecols=) SPY_price = pd.read_csv('SPY.csv',usecols=) X = sm.add_constant(SPY_price) model = sm.OLS(AAPL_price,X) results = model.fit() plt.scatter(SPY_price,AAPL_price,alpha=0.3) y_predict = results.params + results.params*SPY_price plt.plot(SPY_price,y_predict, linewidth=3) plt.xlim(240,350) plt.ylim(100,350) plt.xlabel('SPY_price') plt.ylabel('AAPL_price') plt.title('OLS Regression') print(results.summary()) Linear regression can be easily done with statsmodels library in Python. Then, to find to what extent AAPL_price can be explained by the overall stock market price, we will build linear regression model with SPY_price as the independent variable x and AAPL_price as the dependent variable y. We scatter plot AAPL_price against SPY_price first. Here, we continue to use the historical AAPL_price and SPY_price obtained from Yahoo finance. Since x is no longer a random variable, the requirement needs to be fulfilled for all xₖ at all time points instead of just xᵢ at the time point as the residual term μᵢ. Thus, Gauss-Markov assumptions are stricter for time series data in terms of endogeneity, homoscedasticity, and no autocorrelation. For time series data, we are getting samples from the same process, and we can no longer assume that the independent variable x is random variable. For cross-sectional data, we are getting samples from a population and Gauss-Markov assumptions require the independent variable x and dependent variable y are both random variables. Time series data is slightly different from the cross-sectional data. Autocorrelation of μᵢ can arise from omitted independent variable, mis-specified regression function, measurement error in the independent variables, and cluster errors.Ģ.2 Gauss-Markov Assumptions for Time Series Data No autocorrelation of the residual term μᵢ.It requires the variance of μᵢ does not change with xᵢ. Endogeneity may arise from reverse causality or measurement error in x, which causes cov(μᵢ, xᵢ)!=0. To be endogenous, μᵢ does not change with xᵢ. Typically, when R² result is good but t test for each independent variable is poor, it indicates collinearity. If there is perfect collinearity, linear regression results will be random, as it cannot differentiate the contribution of x₁ and x₂. No perfect collinearity between multiple independent variables x₁ and x₂.It is worth mentioning that if x and y are both random variables, the residual term μ will not be autocorrelated. The independent variable x and dependent variable y are both random variables.However, there is no requirement for linearity in the independent variable. This assumption requires that parameter β is linear. ![]() When any one of the Gauss-Marcov assumptions is violated, the sample parameters calculated using OLS no longer represent population parameters well. For cross-sectional data, Gauss-Marcov assumptions have six assumptions that ensure estimators calculated using OLS are BLUE. It turns out that only when certain assumptions are fulfilled, OLS calculates the best linear unbiased estimator (BLUE) that well estimates the population parameters. ![]() A case when OLS does not generate the best regression line to describe the dataĢ.1 Gauss-Markov Assumptions for Cross-sectional Data
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |