To get the best deal on Tutoring, call 1-855-666-7440 (Toll Free)
Top

Covariance

In statistics, we often come across an important concept called covariance. Covariance is a measure of comparison of data which is quite frequently used in probability theory and statistics. Covariance is a statistical representation of the degree to which two variables vary together. Basically, covariance is a number that reflects the degree to which two variable vary together. If the greater values of one variable correspond with the greater values of the other variable, or for the smaller values, then the variables show similar behavior, the covariance is a positive. If the greater values of one variable correspond to the smaller values of the other, the variables tend to show opposite behavior, the covariance is negative. If one variable is greater and paired equally often with both greater and lesser values on the other, the covariance will be near to zero.

Covariance between two variables is calculated using a suitable formula. Sometimes, covariance calculation might become less clumsy. But with the knowledge of covariance, we can get to the right result. TutorVista provides a detailed information about this topic which helps students making better understanding. So students, go ahead with us and learn below on this page about covariance, its definition, its formula, method of finding covariance and sample problems based on it.

Related Calculators
Covariance Calculator
 

Definition

Back to Top
Covariance is a measurement of how related the variances between two variables. The extend to which any two random variables change together or vary together can be defined as covariance. It is really the association between them.

Covariance Symbol


Covariance Symbol of two variables x and y is denoted by = COV(x, y)

Covariance Equation


When we consider two random variables x and y: (x1, y1), (x2. y1)…..(xn, yn)
Then covariance between two variables X and Y, denoted by Cov(x,y), can be measured by using the covariance equation.
Cov(X, Y) = E {[X - E(X)] [Y - E(Y)]}
where E{x} and E{y} can be defined as the means of x and y, respectively.

So it’s the expected values of the product $\bar{X} \bar{Y}$ where $\bar{X}$ is the deviation of X from expected mean, that is X - E(X) and $\bar{Y}$ is the deviation of Y from expected mean, that is Y - E(Y).

When $\bar{X} \bar{Y}$ becomes positive, it has two meanings
  • Both are above their respective means
  • Both are below their respective means
It means both $\bar{X}$ and $\bar{Y}$ will have the same sign when the deviation from the mean is
calculated.

When $\bar{X} \bar{Y}$ becomes negative, it has two meanings
  • Either one of it is above and the other is below their respective means
    It means both $\bar{X}$ and $\bar{Y}$ will have different sign (opposite) when the deviation from the mean are calculated.
Hence the covariance between any two variance X and Y provides a measure of the degree to which X and Y tends to move together.

If,
  1. Cov(X, Y) > 0 => either X becomes high when Y is high or Y becomes low when X is low
  2. Cov(X, Y) < 0 => either X becomes high when Y is low or Y becomes high when X is low.
  3. Cov(X, Y) = 0 => X and Y doesn’t show any of the above tendencies.
By the formula, each pairs if x and y are taken, the difference from their mean values are calculated and the differences are multiplied together. If any pairs of x and y are positive, the product will also result in positive values and hence for that set the values of x and y varies together, in the same direction.

If any pairs of x and y are negative, the product will also result in negative values and hence for that set the values of x and y varies together, in the opposite direction. As the magnitude increases the strength of the relationship also increases. There can also arise a condition with covariance is zero. This happens when the pairs that resulted in positive values got cancelled by those in which the product was negative, and hence there resulted no relationship between the two of the random variables.

Formula

Back to Top
Now the above equation can be modified as changed into an equivalent covariance formula, which is more effective.

Cov(x,y) = E[xy] - E[x]E[y]

When using the datas, the covariance formula can be modified as
Cov(X, Y) = $\frac{\sum_{i = 1}^{N}(x_{i} - \bar{x})(y_{i} - \bar{y})}{N -1}$
This can be written using shortcut method as
Cov(X, Y) = $\frac{1}{N - 1}\left(\sum_{i = 1}^{N}x_{i}y_{i} - \frac{\sum_{i = 1}^{N}x_{i}\sum_{i = 1}^{N}y_{i}}{N} \right )$Use the below widget to calculate covariance.


Function

Back to Top
Covariance is a measure of how much two variables change together and the covariance function describes the variance of a random variable process covariance function C(x, y) gives the covariance of the values of the random field at the two locations x and y.

C(X, Y) = Cov(Z(x), Z(y))

Covariance function also summarizes the dependency of observations at different locations and characterizes many of the primary properties. It is of significant interest to estimate the covariance function based on a random sample.

Proof

Back to Top
Covariance formula is often used to compute the covariance between two random variables:

COV(X, Y) = E(XY) - E(X)E(Y)

Proof:

Let us start from LHS

COV(X, Y) = E[(X - E(X))(Y - E(Y))]

= E[XY - X E(Y) - E(X)Y + E(X)E(Y)]

= E(XY) - E(Y)E(X) - E(X)E(Y) + E(X)E(Y)

= E(XY) - E(X)E(Y)

= RHS

=> COV(X, Y) = E(XY) - E(X)E(Y)

The covariance exists and is well-defined only as long as E(X), E(Y) and E(XY) exist and are well-defined.

Covariance Correlation

Back to Top
Covariance and correlation describe how two variables are related. Both covariance and correlation indicate whether variables are positively or inversely related. Lets take X and Y are any real valued random variable with means E(X) and E(Y) and variance Var(X), Var(Y) respectively.

Then, the covariance between X and Y is

Cov(X, Y) = E{[X - E(X)][Y - E(Y)]}

And the correlation between X and Y is

Cor(X, Y) = $\frac{Cov[X, Y]}{\sigma(X) \sigma(Y)}$

Correlation can be taken as the scale part of the covariance. When we take correlation vs covariance, we can see more similarities than differences.

Covariance Correlation Formula


To calculate the correlation coefficient for two variables (x, y), we would use the covariance formula, shown below:

r(x, y) = $\frac{COV(x, y)}{S_xS_y}$

where, Cov(x, y) = covariance of the variables x and y = $\frac{\sum_{i = 1}^n (x_i - \bar x)(y_i - \bar y)}{N - 1}$.

r(x,y) = correlation of the variables x and y.
sx = sample standard deviation of the random variable x.
sy = sample standard deviation of the random variable y.

Result Interpretation
Covariance and correlation always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated. When the sign is negative, the variables are said to be negatively correlated and when the sign is 0, the variables are said to be uncorrelated. A positive covariance would indicates a positive linear relationship and negative covariance indicated negative linear relationship between the variables.

There can be mainly three interpretations in terms of graph, when the points are plotted.
  • Positive correlation -The correlation can be positive means it rises. If the pattern in the graph slopes from lower left to upper right, that is upward sloping line, it means there is a positive correlation between the variables. In simple sense, if the data makes a straight line going through the origin to the higher values of x and y, then these variables will be having positive correlation.
  • Negative correlation - The correlation can be negative means its falling. If the pattern in the graph slopes from upper left to lower right, that is downward sloping line, it means there is a negative correlation between them. In simple sense, if the data makes a straight line going through the higher values of y down to the higher value of x, then these variables will be having negative correlation.
  • Zero correlation - There can also be a null means no correlation relation as we wouldn’t be able to find any straight line that passes through most of the datas. It doesn’t mean the variables will be independent. There can exist a non linear relationship between them.
Hence both of them measures to an extend, a certain type of dependence between the variables.

Analysis of Covariance

Back to Top
The main objective of an experimental design in general is to ensure that the results attained only attributes to the treatment variable and to no other casual circumstances. For instance , the researcher studying one independent variable, X, may wish to control the influence of some uncontrolled variable , Z, which is known to be correlated with the dependent variable, Y, then its important to use the technique of analysis of covariance for a valid evaluation of the outcome of the experiment. Analysis of covariance can be written in short as ANOCOVA. While applying the ANCOVA technique, the influence of the uncontrolled variable is usually removed by simple linear regression method and the residual sum of squares are used to provide variance estimate which intern are used to make tests of significance.

Covariance analysis consists in subtracting from each individual scores Yi that portion of it Y'i that is predictable from uncontrolled variable Zi and then computing the usual analysis of the variance on the resulting (Y - Y')’s, of course making the due adjustment to the degree of freedom because of the fact that estimation using the regression method required loss of degree of freedom.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. For a multiple regression analysis in which there is at least one quantitative and one categorical explanatory variable. ANCOVA evaluates whether population means of a dependent variable are equal across levels of a categorical independent variable while statistically controlling for the effects of other continuous variables.

Properties

Back to Top
Properties of the Covariance:
  • Covariance of a variable with itself leads to the variable of that random variable
    Cov[X, X] = E [(X - E[X]) (X - E[X])]
    = E [(X - E[X])2]
    = Var[X]
    Cov[X, Y] = Cov[Y, X] which implies covariance is symmetric.
  • When we consider the variance of any two random variables X and Y, we get
    Var(X, Y) = Var(X) + Var(Y) + 2Cov(X, Y)
  • Cov(aX, bY)=(ab)Cov(X,Y) where a and b are constants.
    It means if the random variables are multiplied by constants a and b, then the covariance can be written as by taking the constants out and multiplying it with their covariance.
  • Cov[a1X1 + a2X2, Y] = a1Cov[X1,Y1] + a2Cov[X2,Y], where a1 and a2 are constants and X1, X2, Y are random variables, which means covariance operation is linear.

Sample Covariance

Back to Top
Sample covariance is statistics computed from a collection of data on one or more random variables. The sample covariance is a square matrix whose (i, j) element is the covariance between the sets of observed values of two of the variables and whose i, i element is the variance of the observed values of one of the variables.

Sample Covariance Formula


The sample covariance for two variables x and y is defined in terms of the sample means as:

Sxy = $\frac{\sum_{i = 1}^n (x_i - \bar x)(y_i - \bar y)}{N - 1}$

Rules

Back to Top
Rules for the Covariance:

1. The covariance of two constants, a and b, is zero.

=> COV(a, b) = E[(a - E(a))(b - E(b))] = E[(0)(0)] = 0

2. The covariance of two independent random variables is zero.

=> COV(x, y) = 0

3. The covariance is a combination as is obvious from the definition.

=> COV(x, y) = COV(y, x)

4. Adding a constant to either or both random variables does not change their covariances.

=> COV(x + a, y + b) = COV(x, y)

5. The additive law of covariance holds that the covariance of a random variable with a sum of random variables is just the sum of the covariances with each of the random variables.

=> COV(x + y, z) = COV(x, z) + COV(y, z)

Cross Covariance

Back to Top
Cross-covariance is used to refer to the covariance cov(X, Y) between two random vectors X and Y. The cross variation between two random functions can be computed not only at locations x but also for pairs of locations separated by a vector. On the basis of an assumption of joint second order stationarity a cross covariance function between two random functions is defined which only depends on the separation vector.

Population Covariance

Back to Top
The covariance between two linear combinations of the data. Consider the pair of linear combinations:

Y1 = $\sum_{i=1}^p c_i X_i$ and Y2 = $\sum_{j=1}^p d_j X_j$

where Y1 and Y2 are two distinct linear combinations. Both variables Y1 and Y2 are random variables and so they will be correlated. We can assess the association between these variables using the covariance as the two vectors c and d are distinct. The population covariance between Y1 and Y2 is obtained by summing over all pairs of variables. For that multiply respective coefficients from the two linear combinations as ci times dj times the covariances between i and j.

=> Cov($Y_1, Y_2$) = $\sum_{i=1}^p$ $\sum_{j=1}^p$ $c_id_j \sigma_{ij}$.

Covariance Standard Deviation

Back to Top
The covariance is equal to the product of the correlation coefficient and standard deviation of each of the two variables. The covariance can be measures as how two random variables vary together.

The covariance is related to the correlation coefficient as follows:

Cov($X_1, X_2$) = std(X1)std(X2) cor($X_1, X_2$)

where, cor($X_1, X_2$) is the correlation between the $X_1$ and $X_2$.

The standard deviation will always positive, the covariance will have the same sign as the correlation coefficient. If the random variables are positevily correlated , will have positive covariance. Similarly the covariance will be negative if the random variables are negatively correlated.

Variance Covariance

Back to Top
The variance of a random variable X with expected value E(X) = $\mu_x$ is defined as variance, Var(X) = E(X - $\mu_x$)2. The covariance between random variables Y and Z with expected values $\mu_y$ and $\mu_z$ is defined as Cov(Y, Z) = ((Y - $\mu_y$)(Z - $\mu_z$)).

The correlation between Y and Z is defined as


Corr(Y, Z) = $\frac{Cov(y, Z)}{\sqrt{Var(Y)Var(Z)}}$

Variance Covariance Formula


=> Var(X) = E(X2) - E(X)2 and Cov(X, Y) = E(XY) - E(X)E(Y).

Covariance Matrices

Back to Top
The matrix consists of weighted covariances between every possible pairwise combination of securities, with the weights consisting of the product of the proportion invested in each of the two security forming each pair. The covariances of returns from all the possible pairs of assets can be deppicted by a covariance matrix.

$\begin{bmatrix}
& Column 1(W_A) & Column 2(W_B) & Column 3(W_C)\\
Row 1 (W_A)& Cov(a,a) & Cov(a,b) & Cov(a,c) \\
Row 2(W_B) & Cov(b,a) &Cov(b,b) &Cov(b,c) \\
Row 3(W_C)& Cov(c,a) &Cov(c,b) & Cov(c,c)
\end{bmatrix}$

=> $\sigma^2$ = $W_AW_A Cov_{aa} + W_AW_B Cov_{ab} + ....+ W_CW_C Cov_{cc}

Brownian Distance Covariance

Back to Top
Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance is analogous to product-moment covariance. The Brownian distance covariance and correlation is a very useful and elegant alternative to the standard measures of correlation and is based on several deep and non-trivial theoretical calculations.

Calculate Covariance

Back to Top
Steps for calculating the variance:

Step 1:
Calculate the mean of the first and the second variable separately.

Step 2: Multiply each data point for the first variable by the corresponding data point for the second variable and calculate the mean of the new data.

Step 3: Follow the formula of the covariance and find the value of Cov(X, Y).

Example

Back to Top
Given below are some of the examples on covariance.

Solved Examples

Question 1: Consider the table below, find the covariance sample for two random variables X and Y.

X Y
3 10
4
11
5 13
7 14

Solution:
Calculate mean of both the sets

X
Y
3 10
4 11
5 13
7 14
19 48

Mean of X is 4.75 and that of Y is 12

Step 2:
Multiply each pair of X and Y corresponding datas together and find their sum.

X Y
XY
3 10 30
4 11 44
5 13 65
7 14 98
19 48 237

Step3:
Find the product of the sum of X and Y

The product of X and Y is 19 $\times$ 48 = 912

Step 4:
Divide the value used in Step 3 by N

Here $\frac{912}{4}$ = 228

Step 5:
Subtract the value got from step4 from the value in step 2.

Here the value obtained = 237 - 228 = 9

Step 6:
Divide through out by N - 1. This gives the covariance

Here value = $\frac{9}{(4-1)}$ = $\frac{9}{3}$ = 3

So covariance = 3

This shows a positive correlation, which indicates both variables increase and decrease together.

Question 2: Consider the table below, containing the values of the variables, x and y.

X
Y
2.1 8
2.5 12
4 14
3.6 10

In which direction both variables are moving.
Solution:
Step 1: Find the mean of both the variables, X and Y.

=> $\bar X$ = $\frac{\sum X }{N}$

= $\frac{12.2}{4}$

= 3.1

and

$\bar Y$ = $\frac{\sum Y }{N}$

= $\frac{44}{4}$

= 11

Step 2:


Here N = 4, $\bar X$ = 3.1 and $\bar Y$ = 11

X
Y
X - $\bar X$ Y - $\bar Y$ (X - $\bar X$)(Y - $\bar Y$)
2.1 8 -1 -3 3
2.5 12 -0.6 1 -0.6
4 14 0.9 3 2.7
3.6 10
0.5 -1 -0.5

$\sum$(X - $\bar X$)(Y - $\bar Y$) = 4.6

Now
Cov(x, y) = $\frac{\sum_{i = 1}^n (x_i - \bar x)(y_i - \bar y)}{N - 1}$

Cov(x, y) = $\frac{4.6}{3}$

= 1.53

= +ve

Since the covariance is positive, the variables are positively related. So they move together in the same direction.

Related Topics
Math Help Online Online Math Tutor
*AP and SAT are registered trademarks of the College Board.