8.2 Use select to choose columns - Video Tutorials & Practice Problems
Video duration:
3m
Play a video:
<v Voiceover>With the</v> diamonds data stored as a table in the dia variable, we are gonna go ahead and select a few columns. We can do this the base our way using square bracket notation. We can say dia, square bracket, leave the row argument blank, and then say c, then in quotes, carat, comma, price. And that goes through and selects those two columns. And since it stores the TBL, it only printed out the first 10 rows, so it's very smart, even though we're using base notation with the square brackets, it still has a TBL so it prints very intelligently. This way works, it might be a little cumbersome. So we can use one of dplyr's built in functions, select, to make this a little bit easier to work with. It might seem at first that it's more verbose, but once you're in a long data flow it's actually quite nice to have. So we say dia, and we pipe that into the select function. dplyr makes great use of the magritter package for piping arguments into functions, and it's a really nice feature. And then select, we type in the columns we wanted without quotes. We say carat comma price. And it gets us the same results as before. But now let's say, for some reason perhaps we're adding this inside of a function, we need to specify the column names as quoted characters. We need carat and price to be quoted. So what we do then, we pipe dia into select underscore. This allows us to pass quoted expressions, so we can do in quotes, carat, and price. And there we get the results again. There might be the case where you don't have the column name either directly or as a quote, and you need to use column indices. For that we can do dia pipe select one comma 7. That gets us the first and seventh columns which are carat and price. Perhaps we want everything except the first and seventh column, then we can do dia pipe select, negative, and I'm going to make it a vector of 1,7, just so that way I only type in the negative once. And here we get every column that is not 1 or 7. But maybe we don't want to specify the indices of the columns we don't want, we want to specify their names, so we could say, dia, pipe, select, negative carat, negative price. And that gives us every column except carat and price. The select function which can be very reminiscent of SQL code makes for a nice easy experience when selecting columns out of data frames or TBLs.