07Jan2022

Why stata drops variables from a regression

You need to use the ibn. Checkout Continue shopping. Stata: Data Analysis and Statistical Software. Go Stata. Purchase Products Training Support Company. How do I keep all levels of my categorical variable in my model? How do I specify a cell means model? Title Keeping all levels of a variable in the model Author Kenneth Higbee, StataCorp In the following example, we use regress as our estimation command, but the same thing applies to other estimation commands that have a noconstant option.

Asked 2 years, 7 months ago. Active 2 years, 7 months ago. Viewed 7k times. Improve this question. Add a comment. Active Oldest Votes. Improve this answer. The Laconic The Laconic 1, 2 2 gold badges 10 10 silver badges 18 18 bronze badges. Aksakal Aksakal 53k 5 5 gold badges 84 84 silver badges bronze badges.

Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Featured on Meta. Now live: A fully responsive profile. Related 0. Hot Network Questions.

It is generally very convenient to use dummy coding but that is not the only kind of coding that can be used. As you have seen, when you use dummy coding one of the groups becomes the reference group and all of the other groups are compared to that group.

This may not be the most interesting set of comparisons. Say you want to compare group 1 with groups 2 and 3, and for a second comparison compare group 2 with group 3. You need to generate a coding scheme that forms these 2 comparisons. We will illustrate this using a Stata program, xi3 , an enhanced version of xi that will create the variables you would need for such comparisons as well as a variety of other common comparisons. The comparisons that we have described comparing group 1 with 2 and 3, and then comparing groups 2 and 3 correspond to Helmert comparisons see Chapter 5 for more details.

We use the h. Otherwise, you see that xi3 works much like the xi command. Both of these comparisons are significant, indicating that group 1 differs significantly from groups 2 and 3 combined, and group 2 differs significantly from group 3.

Using the coding scheme provided by xi3 , we were able to form perhaps more interesting tests than those provided by dummy coding. The xi3 program can create variables according to other coding schemes, as well as custom coding schemes that you create, see help xi3 and Chapter 5 for more information.

As a result, cell3 is the reference cell. The constant is the predicted value for this cell. Since this model has only main effects, it is also the difference between cell2 and cell5, or from cell1 and cell4.

Since this model only has main effects, it is also the predicted difference between cell4 and cell6. We should note that if you computed the predicted values for each cell, they would not exactly match the means in the 6 cells. The predicted means would be close to the observed means in the cells, but not exactly the same. This is because our model only has main effects and assumes that the difference between cell1 and cell4 is exactly the same as the difference between cells 2 and 5 which is the same as the difference between cells 3 and 6.

Note that we get the same information that we do from the xi : regress command, followed by the test command. The anova command automatically provides the information provided by the test command. If we like, we can also request the parameter estimates later just by doing this. However, the anova command is rigid in its determination of which group will be the omitted group and the last group is dropped. Since this differs from the coding we used in the regression commands above, the parameter estimates from this anova command will differ from the regress command above.

In summary, these results indicate the differences between year round and non-year round schools is significant, and the differences among the three mealcat groups are significant. When using xi , it is easy to include an interaction term, as shown below. We can test the overall interaction with the test command. This interaction effect is not significant.

It is important to note how the meaning of the coefficients change in the presence of these interaction terms. The presence of an interaction would imply that the difference between year round and non-year round schools depends on the level of mealcat. Below we have shown the predicted values for the six cells in terms of the coefficients in the model.

It can be very tricky to interpret these interaction terms if you wish to form specific comparisons. Constructing these interactions can be somewhat easier when using the anova command. As you see below, the anova command gives us the test of the overall main effects and interactions without the need to perform subsequent test commands. It is easy to perform tests of simple main effects using the sme command. You can download sme from within Stata by typing search sme see How can I used the search command to search for programs and get additional help?

Although this section has focused on how to handle analyses involving interactions, these particular results show no indication of interaction. We could decide to omit interaction terms from future analyses having found the interactions to be non-significant. This would simplify future analyses, however including the interaction term can be useful to assure readers that the interaction term is non-significant.

Say that we wish to analyze both continuous and categorical variables in one analysis. This is the slope of the lines shown in the above graph. The graph has two lines, one for the year round schools and one for the non-year round schools.

As you can see in the graph, the top line is about units higher than the lower line. You can see that the intercept is and that is where the upper line crosses the Y axis when X is 0.

The lower line crosses the line about units lower at about We can run this analysis using the anova command. If we square the t-values from the regress command above , we would find that they match those of the anova command.

Such a model assumed that the slope was the same for the two groups. Perhaps the slope might be different for these groups.

Note that the slope of the regression line looks much steeper for the year round schools than for the non-year round schools. This is confirmed by the regression equations that show the slope for the year round schools to be higher 7. Indeed, the yrXsome interaction effect is significant. We can make a graph showing the regression lines for the two types of schools showing how different their regression lines are. We first create the predicted value, we call it yhata. Then, we create separate variables for the two types of schools which will be called yhata0 for non-year round schools and yhata1 for year round schools.

You can see how the two lines have quite different slopes, consistent with the fact that the yrXsome interaction was significant. If we had used l[. The options to make dashed and dotted lines are new to Stata 7 and you can find more information via help grsym. The graph above used the same kind of dots for the data points for both types of schools. We can then make the same graph as above except show the points differently for the two types of schools.

blaccumpthome1976's Ownd

0コメント

1000 / 1000