Top

# Bivariate Frequency Distribution

Univariate distribution is also known as one variable frequency distribution (As “uni” means “one”).
It is the simplest form of representing data. It doesn’t deal with relationships of variable. It’s one of the most important requirement is for taking data, summarizing and finds patterns in the data.
But now, What if number of students and their respective marks with respect to subjects were given. Data in statistics needs to be classified according to how many variables are in a particular study. Here if two variables are involved then their frequency distribution is known as “Bivariate frequency distribution”.

 Related Calculators Frequency Distribution Calculator Frequency Calculator Calculate Relative Frequency Frequency and Wavelength Calculator

## Defintion

A Bivariate Frequency Distribution is the frequency distribution of two variables. Let us discuss discuss the concept with the help of an example.
A backbone of Statistics is data collection and process and analysis over the data. When small data is available for analysis then it is not of a problem:
For example: Marks achieved by a student out of 100 in all subjects.  Find percentage:

Mathematics =75

Statistics = 90

English = 80

Physics = 75

Chemistry = 85
Data seems easy to operate, isn’t it?
But when large data is available,

For example:
Marks and number of students achieving respective marks in Mathematics from class of 60 students: 40,50,80,90,56,60,40,77,78,92,95………….and so on. Similarly, 60 values are available. Then it is easy to represent the data in tabular format such as:

 Marks (in interval) Frequency (Number of students achieved marks in that interval) 0 - 10 0 11 - 20 1 21 - 30 2 31 - 40 2 41 - 50 5 51 - 60 10 61 - 70 10 71 - 80 20 81 - 90 10 91 - 100 0 Total 60

This makes analysis much easier.

Following are types of bivariate analysis:

1. Scatter plot:
In scatter plots, it is possible to get idea about relationship between both variables in a glance. In scatter plot, points are plotted on X and Y axis. The one which is dependent variable is taken on Y axis and independent is taken on X axis. The scatter plot looks as follows:

2. Regression analysis:
Regression analysis allows to estimate future trends of data. It identifies data, allows to fit that in one linear line and then by substituting values of independent variables, future values of dependent variables can be easily found. It also gives knowledge of slope and intercepts of line and hence can be tested for whole population of that sample.

3. Correlation coefficients:
Correlation coefficient indicates how much two variables are related to each other. Steps and calculations to be performed are shown below. Value for correlation is always between -1 and 1. Basically -1 means there is perfect negative correlation and 1 stands for perfect positive correlation. Where value of correlation coefficient is zero indicates, no relationship between x and y at all. [Negative relationship is when one variable increased, other has to decrease. And Positive indicates, when one variable increases, other has to increase.]

## Bivariate Frequency Distribution and correlation

If given data has numerical values on both sides, and it is required to recognize, how much they are related to each other. It such cases, there is a way to find out if there is correlation between 2 variables or if they are related to each other, if yes, how much. Using “correlation coefficient(r)”

Consider given table,

 $\frac{X_I}{Y_ J}$ $Y_1$ $Y_2$ . . $Y_k$ TOTAL $X_1$ $A_{12}$ $A_{12}$ . . $A_{1k}$ $T_{1}$ $X_2$ $A_{21}$ $A_{22}$ . . . . . . . . . . . . . . . . . . $X_N$ $A_{N1}$ $A_{N2}$ . . $A_{NK}$ $T_{N}$ TOTAL $S_{1}$ $S_{2}$ . . $S_{K}$ G

In above table, let $X_1$, $X_2…….. X_N$ are $N$ values of $X$.

Total of respective frequencies in rows be,

$T_1$, $T_2……T_N$ are sum of respective rows.

Similarly, another variable is $Y$, $Y_1$, $Y_2….Y_k$ are $k$ values of $Y$ with respective frequency total, $S1$, $S2…. S_k$.

$G$ is Grand total of all frequencies.

Now, how to calculate correlation coefficient for these $X$ and $Y$ values:

Formula used to calculate correlation coefficient,

$r$ = $\frac{Covariance(X,Y)}{(\sigma x\ \times\ \sigma y)}$

Where,

Covariance $(X,\ Y)$ = $E[XY]\ –\ E[X] \times E[Y]$

$\sigma_x$ is standard deviation of $x$.

And  $\sigma_y$ is standard deviation of $y$.

 Some intermediate calculations: $E[X]$ = $x^̅$ = $\frac{1}{\sum T_i}$$\times \sum X_i\ \times T_i E[Y] = y^̅ = \frac{1}{\sum S_i}$$\times \sum Y_i \times S_i$                      $E[XY]$ = $\frac{1}{G}$$\times \sum \sum X_i \times Y_i \times A_{ij}$$\sigma_{x}$ = $\sqrt{\frac{1}{\sum T_i} \times T_i(X_i - x^̅ )^2}$$\sigma_{y}$ = $\sqrt{\frac{1}{\sum S_i} \times T_i(Y_i - y^̅ )^2}$