6.8 Run checks on entire vectors - Video Tutorials & Practice Problems
Video duration:
5m
Play a video:
<v Instructor>Being that</v> R is vectorized language, almost everything is better when done with vectors. And that includes if statements. Fortunately, there's a special statement called if else. So we're going to take a look at that. For people coming from Excel, this should be very similar to the if statement in Excel but with more flexibility and little more ease of use. So let's just start with something simple like does one equal one? Now, the if else statement is sort of like a single function. It takes three arguments. The first argument is what you're checking. The second is what to do if the check is true. The third argument is what to do if the check is false. So we do if else one equals one. If that's true, yes. If it's false, no. So we run this and we see it printed out yes, it's true. Let's go ahead and repeat that but say is one equal to zero? So that way we get a false statement. If else one equals zero, yes, no. This prints no as we expected. Calling if else on a simple check like this is actually computationally wasteful. If you're just checking one number against one other number or one statement against another statement, it's better to use an if statement. If else is computationally better when checking vectors. So let's create a little small vector, just something to play around with to go ahead and test this out. We will call it toTest and let's give it the values one, one, zero, one, zero, one, just so we could see how the checker will work. So we run that and now we can do an if else statement. We say if else toTest equals one, say yes, otherwise, no. Running this, we get a series of yes's and no's, corresponding to the ones and zeros. It's really great that this does this one to one matching so each element gets checked and the appropriate response gets put in. It's a great way to check vectors. Now getting a little more complicated, let's do some math with this. We say if else toTest equals one, if it's true, do toTest times three. Otherwise, just bring back toTest. Running this, we see that the times when toTest is equal to one, we get three. When toTest doesn't equal one, it returns whatever toTest was originally, which in this case, was zero. There's a great way of inserting special values or doing special checks. If a case doesn't hold, leave the vector alone. If there's a special case, change the vector. Now, we have just seen a situation where if else the yes portion was a vector and the no portion was a vector. Those sort of makes sense that it's all vectorized. Let's do another test where the yes portion will be vectorized and the no portion will just be a simple string. So we do if else toTest equals one. If it's true, do toTest times three. If it's false, just put the string zero and see what we get. We get the threes where toTest was one. We get the zeros where it wasn't. Importantly, they all came back as characters because this returns a vector and since some of the entries were characters, they all had to be characters. The other important takeaway is that even though the else statement didn't represent a vector, it was repeated as necessary. As a repeated theme, missing data is very important in statistics. So R has a special missing type NA that could be sitting in a vector. Let's put that into our vector and see how that affects our checks. So we'll do toTest and we'll make the second element be NA. Simple as that. Run that. If we now print out toTest again, we see it's now one, NA, zero, one, zero, one. So now we will do an if else statement. If else toTest equals one, put out toTest times three. Otherwise, zero as a string. When we run this, it works everywhere and where the vector was NA, it returns NA. That's very important. You put NA in, you get NA out. That's because they don't want people saying, oh, my test worked and I got some result and wait, but the data should be missing. So you might be tempted to think that NA doesn't equal one. So it should return the false value which is zero. But that could be misleading and that could make missing data disappear. And that introduces a lot of bias into your results. So it's very good thing that if an element of a vector is NA the result of the if else will also be NA. That's a very important safety check. These if else statements are a great way to check a vector, element by element and allow a lot of power and control.