7.6 Combine datasets - Video Tutorials & Practice Problems
Video duration:
3m
Play a video:
<v Voiceover>There are times</v> when you don't need all this complex data manipulation, and you just need to stack a few vectors together, either as columns or as rows to make a new matrix or dataframe. Very simply done in cbind. So let's create a vector of sports, let's say hockey, baseball and football. And let's say what leagues those are: the NHL, MLB and NFL. Notice there's a correspondence: NHL to hockey, MLB to baseball. The order does matter. And for the trophies, those are Stanley Cup, the Commissioner's Trophy, and the Vince Lombardi Trophy. So if we want to combine these into a matrix where the first column is sport, second is league, and the third is trophy, we say sports1 gets cbind, sport, league, trophy. And if we look at that, we see we have this matrix with sport, league and trophy. Now, let's say we create a dataframe with a few more sports in it, basketball and golf, and their leagues and championships. So let's create a dataframe, and this time, we'll create it straight as a dataframe, not going to do any cbinding. So we'll say sports2 gets dataframe, and this time, we will give names right into it. Notice before that cbind automatically picked up the names of the variables. With dataframe, since we're not building it with pre-existing variables, we have to name it along the way. And to do what we're about to do in a minute, the dataframe needs to have the same variable names as the matrix does. So we will say sport equals basketball and golf. Put this on the next line so it's easier to see. The league is NBA and PGA. And the trophy is the Larry O'Brien Championship trophy and the Wanamaker Trophy. So let's run this dataframe, and take a look at it. So we see, it's similar to the matrix we created earlier, three columns for sports, league and trophy. Now let's say we want to combine this matrix and this dataframe into one longer dataframe. We can do that by saying sports gets rbind this time, so we're going to bind together rows, and we can say sports1, comma, sports2. As long as they both have the same column names, it doesn't matter if one's a matrix in a dataframe because we get our nice, combined dataframe showing sport, league and trophy. Easy as can be. So cbind is for when you want to take vectors, or even matrices, and stack them next to each other side by side, as columns in a new object. Rbind is when you want to take vectors, or dataframes, or matrices and stack them vertically, one underneath the other, to make a longer dataframe, or matrix as it may be.