In statistics, we often come across an important concept called **covariance**. Covariance is a measure of comparison of data which is quite frequently used in probability theory and statistics. **Covariance** is a statistical representation of the degree to which two variables vary together. Basically, **covariance** is a number that reflects the degree to which two variable vary together. If the greater values of one variable correspond with the greater values of the other variable, or for the smaller values, then the variables show similar behavior, the covariance is a positive. If the greater values of one variable correspond to the smaller values of the other, the variables tend to show opposite behavior, the **covariance** is negative. If one variable is greater and paired equally often with both greater and lesser values on the other, the **covariance** will be near to zero.

Covariance between two variables is calculated using a suitable formula. Sometimes, covariance calculation might become less clumsy. But with the knowledge of covariance, we can get to the right result. TutorVista provides a detailed information about this topic which helps students making better understanding. So students, go ahead with us and learn below on this page about covariance, its definition, its formula, method of finding covariance and sample problems based on it.

Related Calculators | |

Covariance Calculator | |

Covariance Symbol of two variables x and y is denoted by = COV(x, y)

When we consider two random variables x and y: (x

Then covariance between two variables X and Y, denoted by Cov(x,y), can be measured by using the covariance equation.

Cov(X, Y) = E {[X - E(X)] [Y - E(Y)]}

where E{x} and E{y} can be defined as the means of x and y, respectively.

So it’s the expected values of the product $\bar{X} \bar{Y}$ where $\bar{X}$ is the deviation of X from expected mean, that is X - E(X) and $\bar{Y}$ is the deviation of Y from expected mean, that is Y - E(Y).

When $\bar{X} \bar{Y}$ becomes

- Both are above their respective means
- Both are below their respective means

calculated.

When $\bar{X} \bar{Y}$ becomes

- Either one of it is above and the other is below their respective means

It means both $\bar{X}$ and $\bar{Y}$ will have different sign (opposite) when the deviation from the mean are calculated.

If,

- Cov(X, Y) > 0 => either X becomes high when Y is high or Y becomes low when X is low
- Cov(X, Y) < 0 => either X becomes high when Y is low or Y becomes high when X is low.
- Cov(X, Y) = 0 => X and Y doesn’t show any of the above tendencies.

If any pairs of x and y are negative, the product will also result in negative values and hence for that set the values of x and y varies together, in the opposite direction. As the magnitude increases the strength of the relationship also increases. There can also arise a condition with covariance is zero. This happens when the pairs that resulted in positive values got cancelled by those in which the product was negative, and hence there resulted no relationship between the two of the random variables.

Now the above equation can be modified as changed into an equivalent covariance formula, which is more effective.

Cov(x,y) = E[xy] - E[x]E[y]

When using the datas, the covariance formula can be modified as

Cov(X, Y) = $\frac{\sum_{i = 1}^{N}(x_{i} - \bar{x})(y_{i} - \bar{y})}{N -1}$

This can be written using shortcut method as

Cov(X, Y) = $\frac{1}{N - 1}\left(\sum_{i = 1}^{N}x_{i}y_{i} - \frac{\sum_{i = 1}^{N}x_{i}\sum_{i = 1}^{N}y_{i}}{N} \right )$

Covariance is a measure of how much two variables change together and the covariance function describes the variance of a random variable process covariance function C(x, y) gives the covariance of the values of the random field at the two locations x and y.

C(X, Y) = Cov(Z(x), Z(y))

Covariance function also summarizes the dependency of observations at different locations and characterizes many of the primary properties. It is of significant interest to estimate the covariance function based on a random sample.

Covariance formula is often used to compute the covariance between two random variables:

COV(X, Y) = E(XY) - E(X)E(Y)

Let us start from LHS

COV(X, Y) = E[(X - E(X))(Y - E(Y))]

= E[XY - X E(Y) - E(X)Y + E(X)E(Y)]

= E(XY) - E(Y)E(X) - E(X)E(Y) + E(X)E(Y)

= E(XY) - E(X)E(Y)

= RHS

=> COV(X, Y) = E(XY) - E(X)E(Y)

The covariance exists and is well-defined only as long as E(X), E(Y) and E(XY) exist and are well-defined.

Covariance and correlation describe how two variables are related. Both covariance and correlation indicate whether variables are positively or inversely related. Lets take X and Y are any real valued random variable with means E(X) and E(Y) and variance Var(X), Var(Y) respectively.

Then, the covariance between X and Y is

Cov(X, Y) = E{[X - E(X)][Y - E(Y)]}

And the correlation between X and Y is

Cor(X, Y) = $\frac{Cov[X, Y]}{\sigma(X) \sigma(Y)}$

Correlation can be taken as the scale part of the covariance. When we take correlation vs covariance, we can see more similarities than differences.

To calculate the correlation coefficient for two variables (x, y), we would use the covariance formula, shown below:

r(x, y) = $\frac{COV(x, y)}{S_xS_y}$

where, Cov(x, y) = covariance of the variables x and y = $\frac{\sum_{i = 1}^n (x_i - \bar x)(y_i - \bar y)}{N - 1}$.

r(x,y) = correlation of the variables x and y.

s

s

There can be mainly three interpretations in terms of graph, when the points are plotted.

**Positive correlation**-The correlation can be positive means it rises. If the pattern in the graph slopes from lower left to upper right, that is upward sloping line, it means there is acorrelation between the variables. In simple sense, if the data makes a straight line going through the origin to the higher values of x and y, then these variables will be having positive correlation.**positive**

**Negative correlation**- The correlation can be negative means its falling. If the pattern in the graph slopes from upper left to lower right, that is downward sloping line, it means there is acorrelation between them. In simple sense, if the data makes a straight line going through the higher values of y down to the higher value of x, then these variables will be having negative correlation.**negative**

**Zero correlation**- There can also be a null meansrelation as we wouldn’t be able to find any straight line that passes through most of the datas. It doesn’t mean the variables will be independent. There can exist a non linear relationship between them.**no correlation**

The main objective of an experimental design in general is to ensure that the results attained only attributes to the treatment variable and to no other casual circumstances. For instance , the researcher studying one independent variable, X, may wish to control the influence of some uncontrolled variable , Z, which is known to be correlated with the dependent variable, Y, then its important to use the technique of analysis of covariance for a valid evaluation of the outcome of the experiment. Analysis of covariance can be written in short as

Covariance analysis consists in subtracting from each individual scores Y

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. For a multiple regression analysis in which there is at least one quantitative and one categorical explanatory variable. ANCOVA evaluates whether population means of a dependent variable are equal across levels of a categorical independent variable while statistically controlling for the effects of other continuous variables.

- Covariance of a variable with itself leads to the variable of that random variable

Cov[X, X] = E [(X - E[X]) (X - E[X])]

= E [(X - E[X])^{2}]

= Var[X]

Cov[X, Y] = Cov[Y, X] which implies covariance is symmetric. - When we consider the variance of any two random variables X and Y, we get

Var(X, Y) = Var(X) + Var(Y) + 2Cov(X, Y)

- Cov(aX, bY)=(ab)Cov(X,Y) where a and b are constants.

It means if the random variables are multiplied by constants a and b, then the covariance can be written as by taking the constants out and multiplying it with their covariance.

- Cov[a
_{1}X_{1}+ a_{2}X_{2}, Y] = a_{1}Cov[X_{1},Y_{1}] + a_{2}Cov[X_{2},Y], where a_{1}and a_{2}are constants and X_{1}, X_{2}, Y are random variables, which means covariance operation is linear.

The sample covariance for two variables x and y is defined in terms of the sample means as:

S

1. The covariance of two constants, a and b, is zero.

=> COV(a, b) = E[(a - E(a))(b - E(b))] = E[(0)(0)] = 0

2. The covariance of two independent random variables is zero.

=> COV(x, y) = 0

3. The covariance is a combination as is obvious from the definition.

=> COV(x, y) = COV(y, x)

4. Adding a constant to either or both random variables does not change their covariances.

=> COV(x + a, y + b) = COV(x, y)

5. The additive law of covariance holds that the covariance of a random variable with a sum of random variables is just the sum of the covariances with each of the random variables.

=> COV(x + y, z) = COV(x, z) + COV(y, z)

Cross-covariance is used to refer to the covariance cov(X, Y) between two random vectors X and Y. The cross variation between two random functions can be computed not only at locations x but also for pairs of locations separated by a vector. On the basis of an assumption of joint second order stationarity a cross covariance function between two random functions is defined which only depends on the separation vector.

The covariance between two linear combinations of the data. Consider the pair of linear combinations:

Y

where Y

=> Cov($Y_1, Y_2$) = $\sum_{i=1}^p$ $\sum_{j=1}^p$ $c_id_j \sigma_{ij}$.

The covariance is equal to the product of the correlation coefficient and standard deviation of each of the two variables. The covariance can be measures as how two random variables vary together.

The covariance is related to the correlation coefficient as follows:

Cov($X_1, X_2$) = std(X

where, cor($X_1, X_2$) is the correlation between the $X_1$ and $X_2$.

The standard deviation will always positive, the covariance will have the same sign as the correlation coefficient. If the random variables are positevily correlated , will have positive covariance. Similarly the covariance will be negative if the random variables are negatively correlated.

The variance of a random variable X with expected value E(X) = $\mu_x$ is defined as variance, Var(X) = E(X - $\mu_x$)

The correlation between Y and Z is defined as

Corr(Y, Z) = $\frac{Cov(y, Z)}{\sqrt{Var(Y)Var(Z)}}$

=> Var(X) = E(X

The matrix consists of weighted covariances between every possible pairwise combination of securities, with the weights consisting of the product of the proportion invested in each of the two security forming each pair. The covariances of returns from all the possible pairs of assets can be deppicted by a covariance matrix.

$\begin{bmatrix}

& Column 1(W_A) & Column 2(W_B) & Column 3(W_C)\\

Row 1 (W_A)& Cov(a,a) & Cov(a,b) & Cov(a,c) \\

Row 2(W_B) & Cov(b,a) &Cov(b,b) &Cov(b,c) \\

Row 3(W_C)& Cov(c,a) &Cov(c,b) & Cov(c,c)

\end{bmatrix}$

=> $\sigma^2$ = $W_AW_A Cov_{aa} + W_AW_B Cov_{ab} + ....+ W_CW_C Cov_{cc}

Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance is analogous to product-moment covariance. The Brownian distance covariance and correlation is a very useful and elegant alternative to the standard measures of correlation and is based on several deep and non-trivial theoretical calculations.

Step 1:

X | Y |

3 | 10 |

4 |
11 |

5 | 13 |

7 | 14 |

Calculate mean of both the sets

Mean of X is 4.75 and that of Y is 12

**Step 2**:

Multiply each pair of X and Y corresponding datas together and find their sum.

**Step3**:

Find the product of the sum of X and Y

The product of X and Y is 19 $\times$ 48 = 912

**Step 4**:

Divide the value used in Step 3 by N

Here $\frac{912}{4}$ = 228

**Step 5**:

Subtract the value got from step4 from the value in step 2.

Here the value obtained = 237 - 228 = 9

**Step 6**:

Divide through out by N - 1. This gives the covariance

Here value = $\frac{9}{(4-1)}$ = $\frac{9}{3}$ = 3

So covariance = 3

This shows a positive correlation, which indicates both variables increase and decrease together.

X |
Y |

3 | 10 |

4 | 11 |

5 | 13 |

7 | 14 |

19 |
48 |

Mean of X is 4.75 and that of Y is 12

Multiply each pair of X and Y corresponding datas together and find their sum.

X | Y |
XY |

3 | 10 | 30 |

4 | 11 | 44 |

5 | 13 | 65 |

7 | 14 | 98 |

19 |
48 |
237 |

Find the product of the sum of X and Y

The product of X and Y is 19 $\times$ 48 = 912

Divide the value used in Step 3 by N

Here $\frac{912}{4}$ = 228

Subtract the value got from step4 from the value in step 2.

Here the value obtained = 237 - 228 = 9

Divide through out by N - 1. This gives the covariance

Here value = $\frac{9}{(4-1)}$ = $\frac{9}{3}$ = 3

So covariance = 3

This shows a positive correlation, which indicates both variables increase and decrease together.

X |
Y |

2.1 | 8 |

2.5 | 12 |

4 | 14 |

3.6 | 10 |

In which direction both variables are moving.

=> $\bar X$ = $\frac{\sum X }{N}$

= $\frac{12.2}{4}$

= 3.1

and

$\bar Y$ = $\frac{\sum Y }{N}$

= $\frac{44}{4}$

= 11

Step 2:

Here N = 4, $\bar X$ = 3.1 and $\bar Y$ = 11

X |
Y | X - $\bar X$ | Y - $\bar Y$ | (X - $\bar X$)(Y - $\bar Y$) |

2.1 | 8 | -1 | -3 | 3 |

2.5 | 12 | -0.6 | 1 | -0.6 |

4 | 14 | 0.9 | 3 | 2.7 |

3.6 | 10 | 0.5 | -1 | -0.5 |

| $\sum$(X - $\bar X$)(Y - $\bar Y$) = 4.6 |

Now

Cov(x, y) = $\frac{\sum_{i = 1}^n (x_i - \bar x)(y_i - \bar y)}{N - 1}$

Cov(x, y) = $\frac{4.6}{3}$

= 1.53

= +ve

Since the covariance is positive, the variables are positively related. So they move together in the same direction.

Related Topics | |

Math Help Online | Online Math Tutor |