How to do regression with non numeric data in Excel
Regression analysis is a statistical method that is used to find the relationship between two or more variables. Typically, the variables are numeric. However, it is also possible to do regression analysis with non-numeric data.
Categorical data is data that can be categorized into groups, such as gender, age, or product type. Regression analysis with non-numeric data can be used to predict future behavior, such as which products a customer is likely to buy or which customers are likely to churn.
There are two ways to do regression analysis with non-numeric data in Excel:
Using the Multiple Regression tool:
- Click the Data tab.
- In the Analysis group, click Data Analysis.
- In the Data Analysis dialog box, select Multiple Regression and click OK.
- In the Multiple Regression dialog box, specify the range of the dependent variable (y) values in the Y Range box, the range of the independent variables (x) in the X Range box, and the range of the categorical variable (z) in the Z Range box.
- Click OK.
Using the CHISQ.TEST function:
Select the cell where you want to put the p-value.
Enter the following formula:
=CHISQ.TEST(known_y_values, known_x_values, known_z_values)
where:
- known_y_values is the range of the dependent variable (y) values.
- known_x_values is the range of the independent variables (x) values.
- known_z_values is the range of the categorical variable (z) values.
The Multiple Regression tool can be used to do regression analysis with multiple independent variables.
The CHISQ.TEST function can be used to test the significance of the relationship between the dependent variable and the independent variables.