16.11 Choose variables using stepwise selection - Video Tutorials & Practice Problems
Video duration:
2m
Play a video:
<v Voiceover>Choosing</v> variables in a model is a very important process. A way that used to be very common today perhaps is a bit old fashioned is step-wise selection. That is you start with a very basic model and you have a cap of a very complex model, and you step through them, adding variables, taking away variables, and in the end, you pick the one with the best score, usually a low AIC. So to do this, let's take a look again at the housing data. And we want to model value per square foot, then pick the best predictors, either for a prediction or inference. So there's a function in R called step. It take a very simple model and a complex model and works its way between them. So first, let's build the simple model. That is simply going to be value per square foot on the intercepts, nothing else. It's essentially just a straight average. Then we'll make the full model. This is the most complex we'll let it get. It's value per square foot. And we're going to do this on units plus square foot times boro plus boro times class. And that will be in housing. Okay, we have that. Now we're going to go call a function that will go in both directions. We will call the result house step. The function is step. The first argument is the null model. Then you provide the scope. The scope is a list. And you provide a lower bound and an upper bound. The lower is the null model... and the upper bound is the full model. And then you can start at the most complex, work your way down to the simple, start at the simple, work to the complex, or go in both directions. Going in both directions might be a little more computationally intensive but it's worth it and computation isn't that hard to come by these days. So direction equals both. Now we can run this, and it spits out a lot of information. That's it working its way through the various models it can do. So let's go through here and find out what model it liked best. So these coefficients are the resulting chosen model. This is what it deemed to be best. So it might be simple, and it might be brute force, but this is one way to choose variables in a model. It has its critics and it has its proponents, but it's ultimately up to you whether or not you use it.