4.5 Load binary R files - Video Tutorials & Practice Problems
Video duration:
4m
Play a video:
<v Voiceover>R has a special type of file</v> called an rdata file. This is a binary file format that can store one or more R objects in a file on disk which can then be transferred between R users even on different operating systems. It is truly a great file type. Let's take a look at how we could use this type. First of all, let's create some data to save to on our data file. Let's load tomato using read.table. The file is data slash tomato first.csv, remember to specify that the header is equal to true, 'cause there is a header and that sep is equal to comma and strings as factors equals false. You run that, we can then go ahead and look at our tomato data. Before we can load from in our data file, let's go ahead and create one. To do this we'll use a save function. The first argument to save, even the first few, are the objects you want to store in the file. In this case, just one, tomato. The next argument, which you need to specify by name, is file and you can then put in where the file's gonna go and the name of the file, in this case that's tomato.rdata. Doing that saves the data. Let's now remove tomato so that way when we load it in there's no doubts that we already had it there, rm, tomato. Notice here that when we save the rdata file it showed up in our get pane because a file was modified. This file already existed on my computer when we over rode it got modified so now get is telling us that, hey we have a file to deal with here. Now that we have removed tomato let's check to see if it's there or not. Error, object tomato not found. This is because it is completely removed. Now, how can we get it back from this file on disk. Well, we do load, then the name of the file, in this case, data slash tomato.rdata. We run that, then we do head of tomato and we have our data back. Now, I know that the data frame was called tomato because the rdata file stores the r objects as they appeared, so if the name of the object was tomato when it was save into the rdata file that's going to be the name of the object when it comes out. That's saving one file. Let's just confirm that it can save multiple files. Let's clear the console. Let's create a few simple variables. N gets 20, R gets one through 10, and W gets data frame and n, r. Let's take a look at W. Notice that the column N, even though it only had one value, that value was repeated as many times as necessary to match the number of values in R, which was 10. Let's go ahead now and save all of these variables into an rdata file. We will say, save, N, R, W, you can put in as many arguments as you need as long as each of these arguments is an object, these objects will be stored into the file. Now, when you're ready to specify the file name you actually have to make it a named argument. That's file equals data slash multiple.rdata. We'll run that and we'll see over here in get that the multiple rdata file that I already had was modified cause it has this new information in it. Let's go ahead and remove all those variables just to prove we have it working. Rm, N, R, W, you can remove multiple variable at once, confirm they're not there. No N, no R, no W. Now, let's load it back from the file. Load, remember it takes the file name, data slash multiple.rdata, I used tab completion there. Now, if we do N, R, W, they're all right back where they belong. Using our data files is a handy way to store data 'cause let's say you did all sorts of work reading from a database or a csv, munge the data, got into the right shape, got everything just right and ready, you don't want to repeat all that, you save it as a rdata file, and you can use it again somewhere else.