Sunday, July 29, 2018

Covariance, Correlation Coefficient and R-squared

Covariance, Correlation Coefficient and R-squared

Covariance

The covariance measures of the degree to which two random variables (X, Y) change together.

Cov(X,Y)=E[(XE(X))(YE(Y)] Cov (X, Y) = E[(X-E(X))(Y-E(Y)]

=E[XYXE(Y)YE(X)+E(X)E(Y)] = E[XY-XE(Y)-YE(X) + E(X)E(Y)]

=E(XY)2E(X)E(Y)+E(X)E(Y) = E(XY)-2E(X)E(Y) + E(X)E(Y)

=E(XY)E(X)E(Y) = E(XY) - E(X)E(Y)

Correlation Coefficient ( R )

The correlation coefficient, or Pearson’s R, standardise the convariance and constraints its value between -1 and +1. The two random variables X and Y have strong negative correlation if R < -0.5, meanwhile they have strong position correlation if R > +0.5.

R(X,Y)=Cov(X,Y)Var(X)Var(Y) R(X,Y) = \frac {Cov(X,Y)} {\sqrt {Var(X)Var(Y)}}

Generating Correlated Random Variables

Suppose XX is a standard normal random variable, we can generate another correlated strandard normal randdom variable ZZ, that has correlated coefficient ρ\rho like this:

Z=ρX+1ρ2Y Z = \rho X + \sqrt { 1 - \rho^2 } Y

where YY is an independant standard random variable with XX.

To Prove:

E(X)=E(Y)=0 E(X) = E(Y) = 0

Var(X)=Var(Y)=1Var(X) = Var(Y) = 1

Var(X)=E(X2)E(X)2=1 Var(X) = E(X^2) - E(X)^2 = 1

E(X2)=E(Y2)=1 E(X^2) = E(Y^2) = 1

E(XY)=0,independent E(XY) = 0 , \because independent

Var(ρX+1ρ2Y)=ρ2Var(X)+(1ρ2)Var(Y)=1 Var(\rho X + \sqrt { 1 - \rho^2 } Y) = \rho^2 Var(X) + (1-\rho^2)Var(Y) = 1

R(Z,X)=R(ρX+1ρ2Y,X)=Cov(ρX+1ρ2Y,X) R(Z, X) = R(\rho X + \sqrt { 1 - \rho^2 } Y, X) = Cov(\rho X + \sqrt { 1 - \rho^2 } Y, X)

=E[X(ρX+1ρ2Y)]E(X)E(ρX+1ρ2Y) = E[X(\rho X + \sqrt { 1 - \rho^2 } Y) ] - E(X)E(\rho X + \sqrt { 1 - \rho^2 } Y)

=ρE(X2)+1ρ2E(XY)=ρ = \rho E(X^2) + \sqrt { 1 - \rho^2 } E(XY) = \rho

For generating more than two correlated random variables, refer to Cholesky Decomposition.

R-squared

R-squared, as the name implies, can be calculated from squaring the correlation coefficient R. The result ranges from 0 to 1.

In regression, R-squared can also be calculated by:

R2=SSTSSESST R^2 = \frac {SST - SSE} {SST}

, where

SSE=(yiyi^)2 SSE = \sum (y_i - \hat{y_i})^2

SST=(yiyi¯)2 SST = \sum (y_i - \bar{y_i})^2

It measures the reduction in residual error from using the regression line over the mean line, or just simply how well data points fit the regression line or curve.

It can also represents the variation in the dependent variable explained by the independent variable.

No comments:

Post a Comment

Principle Component Analysis

Principle Component Analysis Eigenvector Decomposition Let A ∈ R n × n A \in \R^{n \times n} A ∈ R n × n be an n by n...