9.1 Combine strings together - Video Tutorials & Practice Problems
Video duration:
7m
Play a video:
<v Voiceover>In this age of big data,</v> people are dealing with more and more unstructured data. And usually that takes the form of text. So working with text is an important part with almost all analysis jobs these days. A large part of that will be combining texts, making new texts, and really working with the creation of text. In R there are two main functions for doing that. Paste and sprintf. So let's look at paste first. Paste function takes any number of arguments and stitches them together. So let's say we give the arguments hello, Jared, and others. And if you just run that, you'll see it automatically puts spaces between each of those words. We could take this line of code, copy it and paste it. And specify that sep should equal a slash. What this will do, instead of putting spaces between each of the entries, it will put slashes between each of the entries. Separator is allowed to be any valid character, and that's what will get stuck between entries into paste. Paste very nicely accepts vectorized arguments. So let's test this out with two arguments, both of which are a vector with three elements. The first one will be a vector of hello, hey, and howdy, while the second one will be three names. Jared, Bob, and David. Running that we see it put hello with Jared, hey with Bob, howdy with David. It created the spaces automatically, because we didn't specify a seperator. This also works when one argument is a vector, and the other one isn't. Like say, our first argument is hello, and the second argument is Jared, Bob, and David. In this case we have hello Jared, hello Bob, hello David. And we start adding in more and more arguments, each of differing lengths, R automatically recycles some of the entries. So, let's take a look at that. So we do paste. We're going to paste in the first argument will simply be hello. The second argument will be three names. Jared, Bob, and David. And the third argument will be two ways to say goodbye. Both goodbye and seeya. Running that we get Hello Jared Goodbye, Hello Bob Seeya, but then by the time it gets to David it has to recycle back to goodbye. So its Hello David Goodbye. So that's taking multiple entries and stitching them together. There will be some instances where you are given a vector and you need to collapse it down into a single piece of text. Let's check that out. We'll create a new variable called vectorOfText, and we will assign to that vector four elements. Hello, everyone, out there, and a period. By calling paste on this vector, and then specifying that collapse should be empty space. We get a nice one line of text saying hello everyone out there. Paste is often used as a way of stitching together text and variables. So let's see what that would look like. Let's create three variables, and paste them together into a sentence. We will say the first variable is person. And that gets Jared. Second argument is party size. That gets eight. And wait time gets 25. What we want to say is, Hello, Jared your party of 8 will be seated in 25 minutes. The way to do that in paste is a bit clunky, but does get the job done. So we say paste, hello. You put a space at the end of hello, because we're gonna say a word afterwards, so that space needs to be there in the text. You might be tempted to make the space in the separator, but that can other issues. So in these situations, it's best to have an empty separator, nothing in it. So now we're ready to paste in the variable person. So you just type in person with no strings. Then you start another quote with the comma after the person's name and a space. Your part of, and again, you need a trailing space after of. End the string, put in a comma. Then you put in party size. Put in a comma then get back to the string with an opening space. Then you say will be seated in. Now I'm saying you need a space to close it. That's a common theme, starting to get a little laborious, but it's something that needs to be done. Close the quote, then get ready for another argument. We'll go to another new line. And here, we'll put the variable wait time. After that argument we start another string again with a leading space and say minutes. We can end the sentence, but before we can stop doing paste, we need to put a separator equals to no string. The two double quotes back-to-back with no space means no separator. When we run this, we get out nice sentence. Hello Jared, your part of eight will be seated in 25 minutes. This got the job done, but it was a bit laborious, and it's a bit fragile too. Making even a small change to the sentence could require a lot of messing around. That's why the sprintf function is quite a bit easier. So in sprintf, the first argument is a long string with special indicators anywhere you want to insert a variable. The remaining arguments are those variables you want to insert in the correct order. So let's reconstruct our sentence in sprintf and see the difference. We still say hello. Now where we want to insert the variable, we put percent S. There are many different types of special characters instead of %s, but since we're just constructing strings this whole time, %s will work just fine. We then right your party of, then another percent S, 'cause it's another variable, will be seated in and then another %s. And then minutes. That's much more natural to write. Just write a long sentence with special place holders, and this last place holder does indeed need an s. Then the remaining arguments are the variables you want to insert. In this case, person, party size, and wait time. Running this will get us, Hello, Jared, your party of eight will be seated in 25 minutes. The same result as paste, but it works a lot better. It' a lot easier to use. It's much more natural. Changing this string will be easier. When trying to combine text to make new text, paste and sprintf will be your friends. And they're both very valuable and have strengths in different areas. One time paste will be better, another time sprintf will be better. It's all about the context of the situation.