Top

# Multiple Regression

In our daily life, we come across variables, which are related to each other. To study the degree of relationships between these variables, we make use of correlation. To find the nature of relationship between the variables, we have another measure, which is known as regression. In this, we find equations such that we can estimate the value of one variable when the values of other variables are given.

The variables whose values are given are called independent variables and the variables which are to be estimated are called dependent variables. The regression equations are usually explicit equation of the dependent variable with respect to the independent variables.

 Related Calculators Multiple Linear Regression Calculator Multiplication Calculator Multiplicative Inverse

## Definition

Multiple regression analysis is a statistical technique that analyzes the relationship between two or more variables and uses the information to estimate the value of the dependent variables. In multiple regression, the objective is to build a model that relates a dependent variable $y$ to more than one independent variable.

### Equation:

In linear regression, there is only one independent and dependent variable involved. But, in the case of multiple regression, there will be a set of independent variables that helps us to better explain or predict the dependent variable $y$.
The multiple regression equation is given by

$y$ = $a\ +\ b_{1\ \times\ 1}\ +\ b_{2\ \times\ 2}\ +\ …\ +\ b_{k\ \times\ k}$, where $x_1,\ x_2,\ ....x_k$ are the $k$ independent variables and $y$ is the dependent variable.

## Multiple $R^2$

The percent of variation in the dependent variable $y$ explained by the set of independent variables $x_1, x_2, ....x_k$ is known as coefficient of determination. It is denoted by $R^{2}$. The value of $R^{2}$ lies between $0$ and $1$. A value close to $0$ indicates little association between the set of independent and dependent variables. A value near $1$ means a strong association. It cannot have a negative value. It is easy too compare and understand.

### Adjusted Multiple $R^{2}$

The number of independent variables in a multiple regression equation makes the coefficient of determination larger. Each new independent variable causes the prediction to be more accurate. $R^{2}$ increases only because of the total number of independent variables and not because of the added independent variable is a good predictor of the dependent variable. To balance the effect that the number of independent variables has on the coefficient of multiple determination, we find the adjusted coefficient of determination.

Adjusted Multiple $R^{2}$ = $1 -$$\frac{\frac{SSE}{n-(k+1)}}{\frac{SST}{n-1}}$

## Stepwise Multiple Regression

Stepwise regression is a step by step process that begins by developing a regression model with a single predictor variable and adds deletes predictor one step at a time. A stepwise multiple regression is the method to determine a regression equation that begins with a single independent variables and add independent variables one by one. The stepwise multiple regression method is also known as the forward selection method because we begin with no independent variables and add one independent variable to regression equation at each of the iteration. There is another method called backward elimination method, which begins with entire set of variables and eliminates one independent variable at each of the iteration.
Only independent variables with non zero regressions coefficients are included in the regression equation.

### Advantages of Step-wise multiple regression

• Only independent variables with non zero regressions coefficients are included in the regression equation.
• The changes in the multiple standard error of estimate and the coefficient of determination are shown
• It is efficient in finding the regression equation with only significant regression coefficients
• The steps involved in building the regression equation are clear.

### Residual

The variations in the dependent variable explained by the regression model are called residual or error variation. It is also known as random error or sometimes just “error”. This is a random error due to sampling.

## Multiple Logistic Regression

It is an extension of the logistic regression where there will be many explanatory variables. The probability of one outcome is a function of linear combination of a set of explanatory variable. A special case of multiple logistic regression is when the probability varies as a polynomial function of a single quantitative explanatory variable. This is similar to polynomial regression

### Hierarchical Multiple Regression:

Hierarchical multiple regression is similar to step-wise regression and also called as sequential regression. In this regression the independent variables are entered into the equation in the order specified by the researcher based on theoretical grounds. Hierarchical regression is used to evaluate the relationship between a set of independent variables and the dependent variable, controlling for or taking into account the impact of a different set of independent variables on the dependent variable.

## ANOVA Table

A specimen of ANOVA table for multiple regression is given below

## Multicolinearity

Multicolinearity is a term reserved to describe the case when the inter correlation of predictor variables is high.

Signs of multicolinearity include

1. High correlation between pairs of predictor variables.
2. Regression coefficients whose signs or magnitudes do not make good physical sense.
3. Statistically non significant regression coefficients on important predictors.
4. Extreme sensitivity of sign or magnitude of regression coefficients to insertion or deletion of a predictor variable.

### Global Test

Global test is the test used to test the ability of the independent variables to explain the behavior of the dependent variable $y$.
First, we state the null hypothesis and alternative hypothesis of the test. The null hypothesis is all the regression coefficients are zero and the alternative hypothesis is at least one of the regression coefficients are not zero.

## Multiple Logistic Regression

Logistic regression is a type of regression analysis used for predicting the outcome of a categorical variables. It is used to test whether the independent variables are associated with outcome more than expected by chance. Logistic regression analysis is similar to the linear regression analysis except the outcome is dicholomous and it applies where there is a single dicholomous outcome and more than one independent variable.

### Multiple Regression Analysis:

Multiple regression analysis allows to explicitly control for many other factors that simultaneously affect the dependent variable. The objective of regression analysis is to model the relationship between a dependent variable and one or more independent variables. Let $k$ represent the number of variables and denoted by $x_1,\ x_2,\ x_3,\ ......,\ x_k$. Such an equation is useful for prediction of a value for $y$ when the values of the $x$ is known.

### Multiple Regression Model:

Multiple regression analysis is a study of how a dependent variable is related to two or more independent variables. Generally, we used $p$ to denote the number of independent variables. The equation that describes how the dependent variable $y$ is related to the independent variables $x_1,\ x_2,\ x_3,\ ......, x_n$ and an error term is called as multiple regression model. A close result of this model reveals that $y$ is a linear function of $x_1,\ x_2,\ x_3,\ ......,\ x_p$ plus the error term $\varepsilon$.
$y$ = $\beta_0\ +\ \beta_1\ x_1,\ ............,\ \beta_px_p$ + $\varepsilon$.

Where, $\beta_0,\ \beta_1,\ ............,\ \beta_p$ are the parameters and the error term $\varepsilon$ is a random variable

## Multiple Regression Assumptions

When we calculate a regression equation, we are attempting to use the independent variables to predict what the dependent variable will be. In the process of calculating the regression equation, we assume that certain conditions exist with regard to the data we are using. The properties of least squares estimators and the statistical analysis are based on the following assumptions:

### 1. Assumption about the form of the model

The equation that describes how the dependent variable $y$ is related to the independent variables $x_1, x_2, x_3,......, x_n$ is assumed to be in linear in the regression parameters $\beta_0 + \beta_1,............, \beta_p$.

$y$ = $\beta_0\ +\ \beta_1\ x_1,\ ............, \beta_px_p$ + $\varepsilon$.

### 2. Assumption about the errors:

The errors are assumed to be independent and identically distributed normal random variables each with mean zero and common variance.

There are three assumption for the predictor variables
1. The predictor variables $x_1,\ x_2,\ x_3,\ ......, x_p$ are non random and the values $x_{1j},\ x_{2j},\ x_{3j},\ ......,\ x_{nj}$, $j$ = $1,\ 2,\ ....., p$ are assumed fixed in advance.
2. The values $x_{1i},\ x_{2j},\ ......,\ x_{nj}$ are measured without error.
3. The predictor variables $x_1,\ x_2,\ x_3,\ ......, x_p$ are assumed to be linearly independent of each other.

### 4. Assumption about the observations

All observations are equally reliable and have approximately equal role in finding regression result.