2.2 What is Machine Learning? - Video Tutorials & Practice Problems
Video duration:
16m
Play a video:
<v ->All right.</v> So, machine learning actually ushered in one of the biggest changes in AI. This happened in the '90s, where, it went from rule-based systems to what we call probabilistic systems. So what do we mean by those two things? Well rules-based system works in much the same way as a traditional software program. So if this, then do that, and so it might look at conditions, it's testing things, and then it's executing instructions. So it's essentially a rule. So, a person can define the rule, they have logic associated with them, it's really how traditional software programs work. Now the big change was to go to something probabilistic. So it wasn't a rule, so instead it said, if this, then wait that. So say for example, that it's more likely that we should do that, rather than you should absolutely, 100% do that. So let's look at an example. So this actually in the '90s was the first time I was ever exposed to machine learning, and I was working on a problem where we're using lots of rules. So, we were building a system that was trying to figure out how to show the right news stories to business people. And so, we were trying to figure out in this situation, how to show stories to the travel industry. So professionals working in the travel industry, how do we show them the right news stories? And the problem we had was we were constantly showing them stories that they had no interest in, and because they were personal stories. So they wanted to see the types of stories on the left. And we were showing them the stories on the left for sure, but we're also showing them the stories on the right, and they did not wanna see those stories. Those were stories written to people who were trying to figure out what they were gonna do on vacation, those are not stories that the industry wanted to understand. And so, how did we solve the problem? Well, we tried really hard. We tried looking at all the different words in the stories, and no matter what we did, we could not find any words that really distinguished the industry stories from the personal stories, and we tried for a while. I had really smart people, I had librarians who knew how to do what's called Boolean searches, things that had ands and ors and nots in them, it was very complicated type of searching, very sophisticated at the time. And they worked on this problem for months, could not figure it out. So I heard that there were people working on machine learning. And somebody said to me, you might wanna talk to them because maybe machine learning could solve this intractable problem coping with. And so we gave them all the data and we told them, hey, these are the industry story, so we labeled the data, here, industry stories, here are personal stories, can you find some pattern that shows us the difference between industry stories and personal stories? And within a day or two, it found the pattern, and here's what the pattern was, the word, you. And you might say, what are you talking about? Well, I said the same thing when they showed this to me. And, here's what they figured out, that the personal stories were all written in second person, you should do this, you should go to Tahiti on vacation, you should take a cruise. You should do this, you should do that, whereas the industry stories hardly ever were written that way. The only time you'd find the word you is if it was inside a quote. And so if we just eliminated all of the stories that were constantly saying, you should do this, you should do that, we would do a really good job of eliminating all the personal stories. Now I don't know about you, but I don't think a person would ever have figured that out. It's very weird, we don't even think that way. We're never looking at the stories trying to figure out whether it uses the word you or not, we were looking at all sorts of other words that we thought might make sense, that was never something that made sense to us. And so, we wouldn't have figured that out, but the computer did, it figured it out right away. Because computers don't have any of those preconceived notions, it's just looking for raw patterns. And it found that pattern, it was incredibly useful that it found it. And that's what we did, we actually solved the problem and made our customers very happy because they were only getting the industry stories. And this is an example of how machine learning works. So it doesn't do things the way a person does it, and that's good because it makes it possible for the computer to solve problems that people can't solve. It also makes it possible to solve problems faster than people even when people can solve them. And so, when we look at what the drawbacks are of using rules, suppose we were trying to analyze social media, and we were trying to find all of the tweets about Delta Airlines, or all of the blog posts or all of the Facebook posts, suppose that's what we were looking for. Well, you can't just search for the word delta because it's too common. You're gonna find vaccine stories and stories about other companies like Delta Faucet and Delta Dental, and you can't say, well, let's look for Delta Airlines because most people who are writing a tweet are not gonna say Delta Airlines. They're gonna say I hate Delta 'cause my flight always delayed. Well, you can tell which company that's about, but the word airline was never in it. And maybe the word flight wouldn't have been in it either. How are you gonna distinguish the stories about Delta Airlines from the other companies, or by any other story that just has the word Delta in it? Because by the way, the word delta is a somewhat common word, people might talk about a River Delta, they might talk about the delta of a change of data. So there's all sorts of uses of the word delta, and you don't wanna pick up all of these things and say these are stories about Delta Airlines. So what can you do instead? Well, what you're going to do is figure out how you're going to use something that isn't a rule. And you might say, well, why do I care? Why can't I just get all the stories that have the word delta in it and then I can just go and pick out the ones that are about the airline? Well, it's because there might be an awful lot of them, it would take you a really long time. And suppose what you wanna do is aggregate the data, suppose you wanna do sentiment analysis where you're saying, hey, 60% of the conversations with the word delta have positive sentiment. Well, what does that mean? Well, if you have isolated only the stories about Delta Airlines, that's not really gonna tell you anything. So isolating the right conversation is what's going to give you the ability to aggregate things and really get insights from it. So we can see that just using rules has a lot of drawbacks. And so you might say, well, what about those librarians who talked about that have all those clever ways of using Boolean searches? Well, I once struggled with the problem of how would I show conversations that were about Sprint, the phone company. And so we would try to use all sorts of Boolean logic. So we couldn't just look for the word sprint, because that would bring up all sorts of things like sprint car races and school track events. And so we said, well, how about if we said sprint and phone? Well then if you had a tweet that said Sprint sucks, well, you're pretty sure you know what that means, but, you would lose it because it didn't have the word phone in it. Or sprint and cell, well, if you'd said, Sprints phones are awful, well, now you lost that one because it didn't have the word cell in it. So what if you went the opposite way and tried to do negative things? So say, let's get rid of all the sprint car races by saying, sprint and not car, well then the thing is that you lose the tweet that says Sprint drops calls every time I'm in the car. Well that you wanted that one. What if you said sprint and not race? Well, it would say Sprint is losing the 5G race, okay, well, you miss that one too. Sprint and not run? Well, you say, well, my Sprint app just won't run. Okay, so it's really hard to get the right conversation by using Booleans, they're just not up to the challenge. The other thing that's hard is it's varied variable. 'Cause sometimes, like in this example for all sorts of models from Volkswagen, sometimes it'll work really well. So sometimes you can do this kind of word spotting thing, where you have a word that's so unique, like Jetta or Tiguan. And what happens when you do that is you get all of the tweets about that model of Volkswagen, it's perfect. But then, you'd say, well, let's do it for the rest of the models, Beetle, Atlas, Fox, Polo, Golf, uh-oh, now we're in trouble. Because, what happens when you try and search for the word golf? Well, you're not gonna do that well. And people are not gonna always say Volkswagen Golf or Volkswagen Fox, or Volkswagen Beetle. From the context of the tweet, you can figure out that it's a Volkswagen. And so this is what the problem is. You can't just use word spotting, you can't just use rules. Because these bullying rules kind of curve with a chainsaw, so they're very hard to use. And so this is an example of a client of mine that tried to get just the right conversation using Boolean rules, and what they found out was that the tweets that they got back were only accurate 15% of the time. So 85% of what they got back was still wrong. And so, this was a lot of work and it clearly didn't work. And so, what do we do instead? This is where we make that shift to machine learning, We make that shift from rules-based to probabilistic. And so you can use a technique called relevance feedback. So what happens here is that the system starts out by showing exactly what tweets are out there and you start labeling the data. So you remember that we talked about how with machine learning, it often starts with labeled data, where you're labeling, well, this is relevant and that is not. You could do this for spam to get rid of all the spam tweets. That social business example that I showed you on the previous slide that was 15% accurate, with relevance feedback, just an hour's worth of work, labeling things, it ended up being 85% accurate. And obviously they wanted to get higher than that, so he labeled it for a few more hours until it was really accurate. That query for Sprint, we got accurate results with relevance feedback in just 15 minutes, we were eliminating all the rest of the stuff that they didn't want in there. And so, how does this work? Well, it's a technique called active learning. Basically what's happening is that, you are labeling the data. So the system starts labeling the data and saying, we think this data is relevant, we think this data is not relevant, and we're highly confident about those things, and a person then checks to see whether those things are actually right. But as they start being correct about which things they're confident about, then what happens is there's a whole pile of tweets that they're unsure about, that's what they show you to ask for more labeling. So they're going to give you the tough cases, they're gonna give you the cases that the system really doesn't understand whether it's relevant or irrelevant, and ask you to label more data. And eventually you get to the point where you don't have to label any data anymore 'cause you're happy with what it's doing, it's being confident about everything, and it's being correct about everything. So what makes the system confident? Well, an analyst like a data scientist can choose features for the system to examine, and the system looks for patterns based on those features. And so, what's an example of some features? Well, suppose we're trying to eliminate spam tweets. So for those of you who aren't familiar with spam tweets, they're tweets that basically are trying to be found in search and they link to something that people want you to go to. So you might be searching for something in Twitter and a spam tweet might come up there, it seems like it's about what you're interested in, and then when you click on it, it puts up a page that's got a bunch of ads on it, for example, and so somebody got paid because they got you to click on that page with the ads on it. Now, back when we did this project of identifying spam tweets, tweets were limited to 140 characters, now they are 280. But at the time, we found out that some features of spam tweets were that they're almost always 140 characters or only a few less than that. Now, why would that be? Why wouldn't there be any 80 character spam tweets? Well, because if you think about it, if you're searching to find the spam tweets, I mean, that's not your goal, but it's what the spammers goal is, is they want you to find their tweets, they're gonna load it up with as many characters as they can to get as many keywords in it as possible so that it'll be found the most times. And so they're not gonna put an 80 character tweet out there, they're going to put 138 character tweet out there that has all the words that they can that you're gonna search for and find. And guess what? They're always going to contain a link. Why? 'Cause if there's no link in it, the spammer doesn't get paid. And so that's an example of something that we hadn't really thought about when we were setting out to look at spam tweets, but machine learning found those correlations very, very quickly. It found a correlation to having very long tweets that always have a link in them. Now does this mean that every 138 character tweet with a link in it is a spam tweet? No. If that were true, you would only need a rule. You wouldn't need probabilistic machine learning to figure it out. What it means is that, when you see that a tweet has 138 characters in it, and when you see that a tweet has a link in it, now you're much more heavily suspicious that it could be a spam tweet. And now you're gonna look at a lot of other factors to kind of decide if it's a spam tweet or not. And so that's what the probabilistic model does. Now, there's several different types of machine learning, I've talked about one of them, which is supervised machine learning, it uses training data. So that's what those labeled training data is. So that's why it wants you to label things. So if you had all the labels that says, these pictures are stop signs, and these pictures are not stop signs, that's using supervised machine learning. Now there's also another kind called unsupervised machine learning, and that uses pattern analysis. So you might use something like that if you're actually trying to figure out, for example, what the topic is of a particular website, maybe the website has 20, 30, 40 different topics, and you wanna know the topic of each webpage. Well, if you don't give it a set of labeled data with topics, maybe what it'll do is it'll just look at patterns to figure out which pages are similar to each other. And it'll say, well, this looks like the topic, maybe this one is. Now it might not know what the name of the topic is, and it might be wrong about some of them, but it'll at least show you things that seem to have patterns of word usage in common that may look similar to each other, and it might think, hey, these pages look similar to each other, that's unsupervised machine learning. Now, when we showed you the example of the relevant and irrelevant system, that was actually using active learning, that's called semi-supervised machine learning. Semi-supervised machine learning can start either with supervised machine learning and then you add more training data like we showed in our example, or it can start with unsupervised machine learning, and you as the user, the human being, actually try to correct what it has found and run it again. And so semi-supervised is sometimes called human-in-the-loop machine learning because it uses a human being to take the supervised or unsupervised machine learning and make it more and more accurate. So that's a good introduction for machine learning for you, hope it helped, and we'll be back soon.