Yahoo Labs has just released an interesting new data set useful for research on detecting anomalies (or outliers) in time series data. There are many contexts in which anomaly detection is important. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers. Continue reading →
I spend much of my day sitting in front of a screen, coding or writing. To limit the strain on my eyes, I use a dark theme as much as possible. That is, I write with light colored text on a dark background. I don’t know why this is not the default in more software as it makes a big difference after a few hours of writing.
Most of the time, I am writing using either Sublime Text, RStudio or TeXstudio. Each of them can be set to use a dark theme with syntax coloring to highlight structural features in the text.
Continue reading →
From today’s email:
I have just finished reading a copy of ‘Forecasting:Principles and Practice’ and I have found the book really interesting. I have particularly enjoyed the case studies and focus on practical applications.
After finishing the book I have joined a forecasting competition to put what I’ve learnt to the test. I do have a couple of queries about the forecasting outputs required. The output required is a quantile forecast, is this the same as prediction intervals? Is there any R function to produce quantiles from 0 to 99?
If you were able to point me in the right direction regarding the above it would be greatly appreciated.
I occasionally get emails from people thinking they have found a bug in one of my R packages, and I usually have to reply asking them to provide a minimal reproducible example (MRE). This post is to provide instructions on how to create a MRE. Continue reading →
When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models). Continue reading →
Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models. Continue reading →
Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28.
It’s a nice introduction to trees, bagging and forests, plus a very brief entrée to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are. Continue reading →
With the latest version of the hts package for R, it is now possible to specify rather complicated grouping structures relatively easily.
All aggregation structures can be represented as hierarchies or as cross-products of hierarchies. For example, a hierarchical time series may be based on geography: country, state, region, store. Often there is also a separate product hierarchy: product groups, product types, packet size. Forecasts of all the different types of aggregation are required; e.g., product type A within region X. The aggregation structure is a cross-product of the two hierarchies.
This framework includes even apparently non-hierarchical data: consider the simple case of a time series of deaths split by sex and state. We can consider sex and state as two very simple hierarchies with only one level each. Then we wish to forecast the aggregates of all combinations of the two hierarchies.
Any number of separate hierarchies can be combined in this way. Non-hierarchical factors such as sex can be treated as single-level hierarchies. Continue reading →
Some new websites are being established offering “market places” for data science. Two I’ve come across recently are Experfy and SnapAnalytx. Continue reading →
We have an exciting new initiative at Monash University with some new positions in business analytics. This is part of a plan to strengthen our research and teaching in the data science/computational statistics area. We are hoping to make multiple appointments, at junior and senior levels. These are five-year appointments, but we hope that the positions will continue after that if we can secure suitable funding. Continue reading →