3.2 Create and access information in lists - Video Tutorials & Practice Problems
Video duration:
10m
Play a video:
Video transcript
<v Voiceover>Another powerful</v> data storage type in R is the list. It is essentially a container object that can hold any amount of arbitrary objects. That is, one element could be a single number, another one could be an entire data frame, yet another's a vector or a matrix or even a list inside a list. They hold vast amounts of information and are incredibly convenient. So let's start by looking at a simple one. Let's create a simple list, list one gets list. You create lists using the list function where each argument to the list becomes another object in the list. So we'll just put in one, two, and three and if we look at this we see that we have a three element list. Here's one element, here's a second, and here's a third. In each of these elements there's a vector of length one, and when one of them shows one, one shows two, one shows three. Very simple. Now a lot of times when people put this in for the first time, they expect to get single element list with a vector inside it. To do that, we would have to do list two gets list and make the entire first argument of lists the vector you want to input. Doing that, we get a single element list and that holds a three element vector. Now it is possible to store different types. So we'll call list three and we'll give it a list. The first element will be the vector one, two, and three. The second element will be the vector three through seven and if we run this, we see we have two elements. First one is a vector with three elements, the second one is a vector with five elements. Lists can hold even bigger amounts of information so let's store a data frame in there. So let's first create the data frame, let's say the DF and we'll recreate one from another section and we'll call it a data frame and we'll make it first equals one through five and second equals five through one, we'll say sport, we'll make it a shorter version, it should be a capital sport, make it shorter, we'll say hockey, (typing) lacrosse, (typing) football, (typing) curling, need to keep that in quotes, and tennis and we will give it that last special argument strings as factors equals false. Now let's break this up over a few lines so we can see everything that's happening. (clicking) So these four lines will create that data frame and just to confirm, we will look at it and we now have a five row, three column data frame. We will now go ahead and create yet another list, we'll call it list four, and this list will consist of the data frame and just the vector one through 10. Looking at that, the first element is a five by three data frame and the second element is a ten element vector. So lots of things we can do with this and remember, lists can be recursive. Let's create list five and this gets list, we'll put in the data frame, we'll put in one through 10 and we'll put in list three, which we created earlier. Looking at this, we now have a list with the first element's a data frame, the second is a vector, and the third element is a list with two elements of its own, the first of which is a three element vector and the second of which is a five element vector. So lots of things we can do with lists. Now, working with a list, we want to be aware of what the names are. So let's look at list five and check out its names. So names, just like we did with data frames, and we'll say list five and you see it says null. That's because we never actually assigned names. During the creation process, we didn't give it names so let's assign it names right now. We'll say names of list five gets, and remember this has to be a vector that has the same number of elements as the list, we'll say data frame, vector, and list. Now if we look at names of list five, we see that and when we print out the whole list, we can see that each element got the name. The first element was data frame, second element was vector, and the third element was list and again, within list it's another list with two elements. Rather than adding the names later, let's go ahead and include the names while we create the list. Call this list, the first one we will call the data frame and that gets the element the DF. Call the second one the vector, which we'll make one through 10, and the last one we will call the list, which will be list three. We can now check out the names and we see the data frame, the vector, the list and if we print out the whole object, we can see they're all appropriately named. So let's clear out the console as we continue forward. There are times when you will want to create an empty list. Perhaps you plan on filling it in later and it is more memory and processor efficient to create an empty list of a full size you're going to need and insert elements than to create a list with one element then add a second, then add a third, then add a fourth. That is very bad computationally. So let's go ahead and create a list of nothing. So we will say empty list, that will be the name, and we create this by using the vector function. It does seem weird, we're creating a list with a vector, but lists are actually a type of vector. We don't need to get into the minutiae of the language, but technically lists are just a certain, specialized type of vectors and we will declare that specialization by saying mode equals list and we will tell it have four elements. If we look at empty list, we see it has four elements, each of which are null. Now it is possible to access an individual element of a list so let's say we want to give the first element of empty list the value, let's call it, let's say five. We access that first element by saying empty list and we use two brackets, put in the element we want and assign it some value, say five. If we look at this now, we see the other elements remained null, but the first one became five. Let's look at grabbing individual elements of lists in a little more detail. For this, we will return back to list five. Let's say we want to get just the first element of list five and which, as we recall, is a data frame. So we do list five, and again, just put in the number one. That returns the data frame. If we remember, we gave list five names: data frame, vector, list. So it would be great to be able to specify the element we want by its name instead of its number. So we do list five, again the double brackets, and then, in quotes, the name of the element you want and now we get, once again, that data frame. Now once we have subsetted a list, such as doing list five, and we'll get the first element which is that data frame, we can then treat that as the object it is and we know that this first element is a data frame, has columns, so we can even go ahead and use that dollar sign and say sport to grab that third column from the list. Of course, it returns it as a vector, because that's how the dollar sign works. We similarly could have done list five, get the first element, and now use the square bracket notation to go ahead and grab the second column from the data frame. As we could see it returned five, four, three, two, one as a vector. Of course, if we want that to remain a single column data frame, we could do list five, get the first element, grab the second column from there and say drop equals false. We now have that single column data frame. To find out how many elements are in our list, we could do length of list five. In addition to length, we can also use capital NROW of list five to get the same information. Going to clear the screen again. So let's say we now have our list five and I'll pull that up again so we can see it and remember, it has three elements. Let's say we now want to add a fourth element. We can do this simply by saying list five and putting in an element number which doesn't exist. There is no fourth element. So we could assign two to that and we'll look at it now, list five, it now has that fourth element. Now again, as I mentioned earlier, adding elements to the list like this is both memory and processor inefficient. In little one-offs like this it's okay, but generally it's frowned upon. You can also add another element to the list by specifying a name which doesn't exist. So for instance, list five, double square brackets, and we'll call it new element and we will assign this the numbers three through six. If we check out the length of list five now, we see it now has five elements in it and now let's check out the names of this list. Names of list five and we see, the original list five had three elements, each of which had a name. We then added an unnamed fourth element and then we added a named fifth element. So these are the names showing up properly. Printing this out, we can see, after scrolling up, our data frame, our vector, a list which itself was a list of two elements, a single number two, and another vector. Lists are a great way to store varying types of information, whether that is multiple vectors or a mix of vectors and data frames or any other data type, including recursively storing lists within lists. It is a very powerful tool that really helps pass around data.