17.3 Fit nonlinear least squares - Video Tutorials & Practice Problems
Video duration:
5m
Play a video:
<v Voiceover>Sometimes a</v> linear model just won't do. And you need to fit a nonlinearly squares. This can be for any number of reasons. One good example is locating a Wi-Fi hotspot. Let's look at some data that I put up on the website. That is www.jaredlander.com/data/wifi/rdata. Let's load that in. And let's take a look at it. We can see we have a number of X, Y coordinates and a distance. These are measurements from around, say an office building, and the x,y coordinates are on the coordinate plane, and you know from the strafe reading how far you are from that hotspot. You don't know exactly, there's some random noise in there, but you have a good rough idea of how far away you are. Using this information, we can perform a nonlinearly squares to figure out exactly where that hotspot is. So before we do that let's visualize our data. So require ggplot2, and let's say ggplot, wifi data, aes, x equals x, y equals y, and color equals distance. Plus geom_point, plus we're gonna color code these points. Scale_color_gradient2, because there's a few different types of gradients, we're going to do the one that has a low, middle and high. The low will be blue, the middle will be white, and then high will be red. And we'll make the mid point equal the mean of the distance. We can run this to see that our data looks something like this. Red points mean farther away from the hotspot. Blue points mean they're closer. So it seems like there's a concentration of blue points over here, to figure this out we need to figure out a proper formula for determining the distance. Well luckily that's an old mathematical problem, seen here, the distance "I" is equal to the square root of beta x minus xi squared, plus beta y minus yi squared. These are just your traditional pythagorian theorem. The xis and the yis are the x y coordinates of all the readings, and that beta x and beta y are the unknown coordinates of the hotspot, it's our job to estimate what beta x and beta y is. That's why this is a nonlinearly squares. The formula is not linear in the coefficients. It's stuck underneath the square symbol that is square root, you can't use a regular regression to do this. So let's go back to r and start setting up a model. We'll call it Wifimod1, gets nls, nonlinearly squares. It takes a formula environment, so it's distance, and we're going to put in that formula we just saw. Square root of beta x minus x squared. Plus beta y minus y squared, and close off the square root. The data is coming from wifi, but now we need to give it a starting point, this is done throughout through an optimization algorithm, so it's sensitive to the starting points, you need to give it some best guess. We're gonna give it a best guess of being right in the middle of the grid at 5050. So we'll say it's a list where beta x equals 50 and beta y also equals 50. We run this, it comes back with an answer, and we have a summary of wifimod1. We can see it gives us an estimate, plus standard errors, of where it thinks the hotspot is. Thinks it's at 17,52, somewhere around here. Let's plot that on the map to see how we did. We're gonna copy the code again, the ggplot code, and this time we're going to add another point here, it'll be geom_point, data equals as.data.frame, the transpose of the coefficients of wifimod1, that would be these right here. And then we'll just set another aesthetic, that we want x equals beta x, remember that's the name of the coefficient here, and y equals beta y. And then we will say that size equals five, and let's break this on to another line. We will also say that color equals green, so that way it will stand out. So let's run this now, and since beta y wasn't found because capitalization matters. Run it again, and we see now, the estimated hotspot point is right in the middle of all the blue points. This gives us a pretty good sense that we nailed it. Nonlinearly squares is a useful solution to when you have problems when the coefficients are stuck in a nonlinear situation and you can't use an ordinary regression.