1.2 Work in the R Environment - Video Tutorials & Practice Problems
Video duration:
18m
Play a video:
<v Voiceover>So this is the R Console,</v> and yes it does look like a 1980's movie about 1970's computers. But this is the way people had to use R for years. Until JJ Allaire came out with RStudio. Now using this Console is as simple as typing commands into it. So as a basic example of some simple math, we can type in one plus one, and hit enter, to come back with two, and if two doesn't come up, there's something very wrong with your installation, and you should try to do a reinstall or maybe get a new computer at that point. Now, this Console, while perfectly workable, isn't so user friendly. So we are going to go ahead and download RStudio. We go back to our browser, open a new tab, and type in rstudio.org, which will redirect to rstudio.com. So either one works. Go ahead and click on RStudio IDE. Once you're there, you can go ahead and click on download Rstudio, and now there is both a desktop version, and a server version. The server version allows you to install the software on a Linux server, and then access that computer through a web browser anywhere in the world using the Rstudio IDE. We don't need to worry about that right now, we just need to download this desktop edition. So click desktop and the website automatically figures out your operating system. In my case that's Windows. So I'll click that link and download the installer. Now that it is done installing, you can come to the desktop and double click the file just like any other installer. The first menu is just saying, there's gonna be an installation. So go ahead and click next. This lets you choose where to install RStudio. Unlike with R there's allowed to be spaces in the path. So don't worry about that, and the default is just fine. So go ahead and click next again, and now it gives you the option of creating a start menu short cut. Might as well have that, and now you can click install and the process begins. Now that RStudio is installed, you can launch it by going to start, all programs, scroll down to RStudio, and click RStudio. Now we are in the integrated development environment. The bottom is the R Console, and that is the same Console as the standard R Console we saw just before like this. Except that it is more usable and a little prettier. Now notice, we are using R 3.0.0. We actually want to be using the latest version that we just installed 3.0.1. So we come up to tools, click global options and up here where it says R version, we click change, which brings up a list of the installed versions of R on your computer. Go ahead and click 3.0.1 Click ok and a little message comes up saying that we need to close RStudio and reopen it in order to make this change. So click ok. Click ok again, and then close out of it. Now I use Rstudio so much it is pinned to my task bar and I can just click this nice little icon. Now we're back in and we're using version 3.0.1. So everything is doing great right now. Down here in the Console as I said before, it is just like the R Console and we can type in one plus one, and that will give us two. So everything's pretty good. One of the great benefits of the IDE, is the text editor up here. This lets you type in complicated commands and edit them as you please without having to worry about putting them in the Console correctly. You'll go ahead and create a new file up here by clicking File, New File, R Script. We could of also pressed Ctrl, Shift, N, on our keyboard. Now up here it's just a text editor. So we can type in one plus one and have it sitting there. To run it into the Console, we can come over here and click Run. By doing that, it got put into the Console, both the command and the result. We also could of, let's say we have another line for two plus two, used the keyboard by pressing Ctrl, Enter, and that runs the command as well. Notice that when you have multiple lines, hitting control enter runs the line of code that the cursor is on. So, if we're on the first line and we hit control enter, it runs the first line. If we're on the second line, it runs the second line. Now let's say we have a more complicated expression such as one plus three times five. If we hit control enter it runs the entire line. If we highlight just a portion of it and then hit control enter, it runs just that one piece of code. That is a very great feature to have. Now with all this typing we've done we do see it in the Console what's been done, but this Console can drag on for a while. Thankfully there is the History tab. This History tab shows what commands were entered, and if you were to double click one, it would go into the Console, at which point you can hit the enter key to run that line. That History is a very useful feature. Another pane in our studio is the Files pane, and this shows all the files that are in the working directory. Now working directory is an important concept, and we'll cover that later when we talk about projects. The Environment tab shows all the objects and variables that are stored in R in memory that are available to you, and as we start using some of these, you'll see that get filled up. Down at the bottom there's the help pane. This provides all sorts of information about the functions you use in R. So let's say you want to use the mean function. You come down here to the Console and type in question mark m-e-a-n, hit enter. This brings up the Help documentation for mean. This has information such as a title, what it does, a little quick description. Tells you how to use it. Explains the arguments that are needed. What gets returned, and gives you plenty of examples. It is a very great thing to have when learning how to program a new language for the first time. The Packages pane shows all the different packages you have at your disposal. One of the most powerful things about R, is the package system. There are over 4900 packages available on CRAN, as of when this video was made, and there's another 6-700 or so on Bioconductor. These packages are really what make R so useful, and we will learn all about them in a little bit. Another great capability of R is its graphing capability, and when you create a graph it will be displayed in this area as we will see extensively. All of these panes we see here the Console, the text editor, all these panes, are highly customizable. You can move them around. You put the Console on the top. You can put the Plots over here at the top right hand corner, and you do all that through the Tools menu. So click Tools. Global options, which is where we were before when we changed the version of R, and in here you have many different options. There's a few important ones right on this general tab. I do not like restoring .RData into workspace at startup, or saving .RData on exit. What this means is, let's say you've done all sorts of work, created lots of variables, done all sorts of analysis, R gives you the option of saving that in a file, and the next time you open R, you can start right where you left off. While that might sound nice, that can lead to all sorts of problems. So it's best not to check restore .RData, and to never save workspace .RData. Aside from that not much else is really needed in the General tab. The Code Editing section, lets you change features about the text editor. For instance, I like having my code highly indented, but for compatibility reasons, I want to do spaces instead of actual tabs, and I like putting in four spaces instead of two. That's all about personal preferences, as is most of this stuff on this tab. A very nice feature here for people used to vim, is you can enable vim editing mode. I'm not personally a vim person, so I'm not going to do that. Appearance, lets you change the way your, for instance let's say you want this to look like an old timey hacker movie, you can click Cobalt and click apply, and now your Console is gonna have this dark screen. There is many different ones available to you like Solarized, which is supposed to be very good on the eyes, and Eclipse, if you're used to Eclipse. Personally, I'm fine with the TextMate editor, and I like using that. Pane layout is where you can move around all the different panes. For instance I like having the Console on the bottom, and I like having my Plots, Packages, and Help, on the lower right hand side. That's entirely up to you. CRAN mirror lets you set where to download Packages from. I like the Carnegie Mellon mirror, because it's physically the closest to me in New York. Other than that, lots of other options that we don't need to worry about to much right now. Sweave or S-Weave, as it is properly pronounced, is a great system for integrating code and text into one nice document. You can use this to write reports, articles, or even books. My book was written entirely in RStudio. While Sweave is the way this was done for years, relatively recently a new package called Knitr was released which extends Sweave, and makes it much easier to use. So you can choose that option to use either Sweave or Knitr right up here, I prefer Knitr, and you can also choose what LaTex program to use. We will learn much more about that later on when we learn about Knitr extensively. Also useful when writing a book or report, is a spell checker. Which RStudio has built right in. Rstudio also has capabilities to integrate Git or SVN for Version Control. This is incredibly useful when you're working even by yourself or especially with a team of people. It gives you the ability to track your changes, and be really careful about what you do. Let's say you're coding something today and it works great, tomorrow you make a change and everything breaks, you can roll it back to yesterday. Now, I personally prefer Git, but both Git and SVN are both supported. So we can go ahead and click okay. Another great feature of Rstudio is Projects. This lets you keep your work in organized discreet units. So let's say we want to create a new project for this video. We can come to File, New Project. This brings up a dialogue for how we can create it. Now project is essentially a directory that holds all the files that you'll be using, and we'll learn more about storing R files and different stuff you can keep in your project. Were gonna start a project from scratch. To do that, we click New Directory, and here we can give the directory name. We're going to use tutorial. Now we have the option of where we create this folder. It's gonna create a new folder in a subdirectory of my Consulting folder. That works for me. So I'll click create project. So now it's like we have a fresh usage of RStudio. We have just the console open because we don't have any text editor areas. The Environment is empty, and the files in this folder only consist of tutorial .Rproj. That is the project folder and you don't really ever have to worry about that. So why don't we go ahead and actually save a file. I'm going to create a new text editor file by pressing Ctrl, Shift and N, on my keyboard. We now have a text editor file. Let's say we type in a very simple command such as, one plus one again, and we now want to save this. That is as simple as clicking File, Save, and now you can see in Consulting we're in the tutorial folder, and we have all of our files in here, which right now consist of R project folder. In here I will just call this first.r. Now it's generally considered good practice to name your R code files with a .R extension. It can be a capital case or a lower case, though most people think lower case is better. By clicking save we now have a saved file which shows up in the files pane as first.r. Projects can be used for many things, and a nice feature about Projects is that once you have a project running you can use Version Control. In order to use Git, you first need to have it installed. The two most popular sites on the internet right now for using Git are GitHub and Bitbucket. I suggest you sign up for an account at GitHub or Bitbucket and follow along the directions for installing Git onto your computer and getting started with it. So to create a new Git repository, I'm going to go to Bitbucket and initiate one from there. So I'll come to my browser and i'll go to Bitbucket. Gonna log in. My credentials are already there thank you to Chrome, and I'll click Log in, and now we see we have our interface here. I want to create a new repository, so I will click Create, and I will call this tutorial, and for now I'll leave it as a private repository. Going to use Git. I'll allow issue tracking and a Wiki, I'm gonna tell it that I will be using mostly R. So I just click Create, and that will create a new repository on this site. So this gives you a little walk through of how to get started. So I'm going to click on getting started from scratch. Now some of these steps I've already done. This does require you to be familiar with the command line. But that's okay, it won't be too intimidating. To use this the first thing I'm going to do is open a command line, which gets installed for you. Its called Git Bash, and I'll click that and I'll launch it, and I'll re-size it so everyone can see things more easily. First thing I'm going to do is go to the directory where I have my Rstudio project. For me that happens to be in Documents/Consulting/tutorial. I come here and I type in git init, and that starts up git. Now I want to add this to the Bitbucket servers. So, in order to see this I'll make this a little smaller, so I can see exactly what to type, and we type in git remote add origin ssh://git@bitbucket .org /jaredlander/tutorial .git Now we have our repository set to be mirrored to Bitbucket. So we come back to the website and click next, and we're going to create a README just so that we have something to push for the first time. So we'll follow their instructions even. Now don't worry about too much of this. if you're not familiar with the command line. You're not going to need to use to many of these commands once you really start using Rstudio. So by typing echo and typing in this is my README, and telling it to write to a README file called README.md. That creates a new file which we can see by typing in ls. We now have a README file that just has the information that says this is README. We're going to add this and adding it means we're going to track it and make sure we notice any changes. Now that we have it added as tracking them, we need to actually commit it and tell it hey we made changes we want to commit these changes, and make sure they're stored. So we type in git commit-m cuz your going to put in a message. First commit Adding a README. Now again, don't worry too much about this. Rstudio will make this a lot easier. Click enter, and then it's time to push it up to the website. Now in this one step were going to sync up our code with the website, and it's gonna make it track for future changes. So we type in git push -u origin master, and this asks for your Bitbucket password, and now all that information is on the website. In fact we can see that by coming to the website and clicking Source, and we now have a README. Now going back to Rstudio nothing has changed just yet. So what were going to do though is reopen this project by coming to File, Recent projects, tutorial. Now that it's open we have a new pane called Git. Now up here it shows all the files we have that are either being tracked or need to be tracked. So for instance first.r has two question marks next to it. That means we are not tracking this file, and it will not be protected by Version Control. In order to track it you can click the check box, and it will be set to add it, and when you're ready to add, and actually commit it, you can click commit and before where we had to type git commit -m and type in a message, here all we have to do is Adding first.r, any message you would like, and click commit. By clicking that it's now being tracked. When you're ready to Push up to the website you can click Push, and type in your passphrase. This makes it a lot easier. You don't need to remember all those command lines. Makes using Git a lot better. An even nicer feature about this is, let's say we go ahead, we'll close out of here, we'll make a change to first.r. Now we're going to type in two plus two. When we save that, the file is listed again here in Git pane, and it has a big m for modified. By clicking on that and clicking commit, we can see what the difference was. We can see how it deleted the first line, and added in a new first line. It's really the same, but the way Git tracks things since we had to enter a new line, things got a little different, but pink shows you what you've deleted, and green shows you what you have added. So now, we can come here to our message, Added new addition example. Let me click commit. It's stored again, and now you might be ready to go ahead and Push it. You can click Push, or if you're ready to Pull you can click Pull. It all works very similarly. Using Git is going to make your work a lot better and a lot safer. and that's an overview of the features of R and in particularly RStudio, and we'll continue using these features and more and learn all about them as we go forward.