Covariance, Correlation Coefficient and R-squared

Covariance

The covariance measures of the degree to which two random variables (X, Y) change together.

$Cov (X, Y) = E[(X-E(X))(Y-E(Y)]$

$= E[XY-XE(Y)-YE(X) + E(X)E(Y)]$

$= E(XY)-2E(X)E(Y) + E(X)E(Y)$

$= E(XY) - E(X)E(Y)$

Correlation Coefficient ( R )

The correlation coefficient, or Pearson’s R, standardise the convariance and constraints its value between -1 and +1. The two random variables X and Y have strong negative correlation if R < -0.5, meanwhile they have strong position correlation if R > +0.5.

$R(X,Y) = \frac {Cov(X,Y)} {\sqrt {Var(X)Var(Y)}}$

Generating Correlated Random Variables

Suppose $X$ is a standard normal random variable, we can generate another correlated strandard normal randdom variable $Z$ , that has correlated coefficient $\rho$ like this:

$Z = \rho X + \sqrt { 1 - \rho^2 } Y$

where $Y$ is an independant standard random variable with $X$ .

To Prove:

$E(X) = E(Y) = 0$

$Var(X) = Var(Y) = 1$

$Var(X) = E(X^2) - E(X)^2 = 1$

$E(X^2) = E(Y^2) = 1$

$E(XY) = 0 , \because independent$

$Var(\rho X + \sqrt { 1 - \rho^2 } Y) = \rho^2 Var(X) + (1-\rho^2)Var(Y) = 1$

$R(Z, X) = R(\rho X + \sqrt { 1 - \rho^2 } Y, X) = Cov(\rho X + \sqrt { 1 - \rho^2 } Y, X)$

$= E[X(\rho X + \sqrt { 1 - \rho^2 } Y) ] - E(X)E(\rho X + \sqrt { 1 - \rho^2 } Y)$

$= \rho E(X^2) + \sqrt { 1 - \rho^2 } E(XY) = \rho$

For generating more than two correlated random variables, refer to Cholesky Decomposition.

R-squared

R-squared, as the name implies, can be calculated from squaring the correlation coefficient R. The result ranges from 0 to 1.

In regression, R-squared can also be calculated by:

$R^2 = \frac {SST - SSE} {SST}$

, where

$SSE = \sum (y_i - \hat{y_i})^2$

$SST = \sum (y_i - \bar{y_i})^2$

It measures the reduction in residual error from using the regression line over the mean line, or just simply how well data points fit the regression line or curve.

It can also represents the variation in the dependent variable explained by the independent variable.

Reinforce Me

Sunday, July 29, 2018

Covariance, Correlation Coefficient and R-squared

Covariance

Correlation Coefficient ( R )

Generating Correlated Random Variables

R-squared

No comments:

Post a Comment

Principle Component Analysis