8.3 Use filter to choose rows - Video Tutorials & Practice Problems
Video duration:
3m
Play a video:
Video transcript
<v Voiceover>Subsetting</v> data is a very important step in any data process. We could subset dia the base way using square brackets by doing dia[dia$cut is equal to ideal. And that works just fine, it goes out and gets us only the rows where cut equals ideal, and since this is a tbl, it prints out just a few rows so we don't get everything. This is a bit cumbersome, you need to specify dia$cut, things just get a little ugly. So we are going to use dplyr's filter function to make this easier. We'll say dia Pipe filter cut equals ideal. We run this, gets us the same results, but we typed it a little differently. Now let's say we wanna get all the rows where cut is ideal, and color equals E. We have two options to do this, dia Pipe filter cut equals ideal Comma color equals B. Or, we could have used the Ampersand, dia Pipe filter cut equals ideal And color equals B. Both options work, by default filter automatically takes each individual expression to be together as an and statement, or we can use And. Now if we want to say or, such as cut equals ideal or cut equals premium, we need to use the vertical pipe. So we will say dia Pipe filter cut equals ideal Or cut equals premium. And of course we could still use Or's in operator, by saying dia Pipe filter cut percent in percent the vector ideal premium. And that gets us the same result just by using different syntax. Now there might be the case, maybe you're writing this inside a function, where your conditional expression is passed in as a variable stored as a string. So fortunately, we could use the filter underscore function to do that. We could say dia Pipe filter Underscore then in a quote, cut equals, now since the equality statement is testing for a string, we need to use single quotes because it's inside the double quotes. And that worked just as well as when we did it normally. And we can also formula notation to do a quoted expression. We can say dia Pipe filter Underscore Tilde cut equals ideal. Filter provides a convenient way to subset data, and it is very useful in a long data workflow, much the same way that the where clause is very helpful in SQL.