18.3 Use VAR for multivariate time series - Video Tutorials & Practice Problems
Video duration:
8m
Play a video:
<v Voiceover>There are times</v> when you may have multiple time series that are all interdependent on each other. Modeling those require multi-varied techniques, the most popular of which is VAR, which stands for Vector Auto Regressive. Returning back to the GDP data, we will model all the GDP's together. As a reminder, the GDP data looks like this. We have all this information in long format, however, to you use VAR we need to put this into wide format, so we will load the reshape2 package. We then cast this into wide format, so we say gdpCast, gets dcast, Year onto Country, because year will be going down and all the countries will be put across, and we just need to feed it a few of the variables, so we will say data equals gdp. Then we'll say just give it Country, Year, and PerCapGDP,. Then we will say, value.var equals PerCapGDP, and what this is doing is it's going to put the year down and take each of the countries and make a new column for them, and store PerCapGDP in each of those columns. So, let's run this. And, take a look at our data. We now have it setup in nice wide format. It's a nice data frame, but we want to use a special time series object because that will just make our lives easier, so we could say gdpTS gets ts, data equals gdpCast, and we are going to leave out the first column we don't need the year to come along because we are going to specify that right now. We say start equals the minimum of gdpCast, year, and end equals max of gdpCast$Year. If we want to visualize this, we could just use base graphics to make life a little quicker, and do plot, gdpTS, and you say plot.type equals single to put them all in the same pane, and we will color code them using the colors one through eight, and we have this, not as pretty graph, but it was quick and easy. And we could add a legend to it if we want by doing legend, top left, you say legend equals colnames of gdpTS. Ncol will be two so we'll break it up, lty will be one so it's a solid line. Colors will be one through eight, and cex, which is the multiplyer for the font size will be nine. We now have a big legend that's way too big, unless we do some formatting off this. That's the problem with using base graphics, it can be sometimes hard to control it, and another reason why I prefer GGPO whenever possible. So, before we proceed, we need to see the data as stationary, because as this plot showed, this data is definitely climbing, as it's going on it's getting bigger, and you might be able to say that the variance is getting larger as well. That means the data is not stationary, so we might need to diff the data. Luckily the forecast package has a function that does that automatically for us. We will say numDiffs gets ndiffs, gdpTS, and if we look at that, it's suggesting we Diff it once, so we will go ahead and do just that. We will say, gtpdiffed gets diff of gdpTS, and we will say the differences equals numDiffs, and if we plot this... You can see that data looks like a lot more noise, and is no longer that upward trend. There still might be a variance issue because see how compact that data is here? Then it gets all spread out as it gets further across. So it might not be stationary just yet, but it is looking better because we got rid of that linear trend. So now, we are ready to fit a Vector Auto Regressive model. This is done using the VAR's package. The formula for a Vector Auto Regressive is this. Now, we have taken the auto regressive component that we saw for the arena model, which are the xt minus one, xt minus p, and turned them in matrices. The matrices consist of all the time series, and we also still have a matrix of white noise, zt, that we saw in the MA component of the arena model. This does get much more complicated to fit, but again there is a function to do this for us. So, lets go ahead and fit the model. You'll say, gdpVar gets VAR of gdpDiffed, we'll say lag.max equals 12, because this thing will fit a number of different models with lags 1, lags 2, lags 3, and it will keep trying, and we're saying stop trying at 12. Now we get a few warning messages, that's some NAN's that are produced, but they're just warnings. So, we can now go and check, and see how many lags it decided upon. We can do that by saying gdpVar, dollar, p. P is the number of ARlex. It shows six lags. That means the United States time series depends on six lags of itself, plus six lags of Japan, six lags of Great Britain, six lags of Israel, six lags of China, and also Great Britain depends upon ashy schematic kingdom, ashy schematic kingdom is country code as GB, which can lead to some misnomers. Now the kingdoms time series depends on six lags of itself, plus six lags of China, six lags of Canada, so forth and so on. So, what actually happened here is that each country got its own model, so since we have seven countries, we actually have seven models, and we can see that by doing names of gdpVar, varresult. And we see that there are seven different models, and if we inspect them we can see that each model is actually a LM model. So for instance, the one for Canada, it is just an LM model, and each model has its own coefficients. We just showed a few of them, but here are some lags. There is a lag 1 on itself, lag 1 on China, lag 1 on Israel, and if we were to run this line again, but this time with 10, we can see it starts getting to the two lags. Since these are just LM models, we can run coefplots on each of these individual countries. So, we can require a coefplot, and then run that on each of the countries, so we will do gdpVar, varresult, Canada. So we see it's really that constant term that's driving everything, even though it really has a wide compass interval. Let's look at Japan next. Similar result, lot of barely significant coefficients, and the big constant term. Now just like with the arena model, we can use this to make predictions. Predict with gdpVar, and then we will say n.ahead equals 5. So, for each country, this prints out, amongst other stuff the prediction, and lower and upper bounds for that prediction. Remember this is statistics, it's always important to have a confidence interval. It is not uncommon to have multiple time series that are correlated with each other, and in that case you need to use multivariate method, such as Vector Auto Regression, to make sure you get the appropriate results.