For all those people asking me how to obtain a print version of my book “Forecasting: principles and practice” with George Athanasopoulos, you now can. Order on Amazon.com Order on Amazon.co.uk Order on Amazon.fr The online book will continue to be freely available. The print version of the book is intended to help fund the development of the OTexts platform. The price is US195 for my previous forecasting textbook, 182 for Gonzalez-Rivera. No matter how good the books are, the prices are absurdly high. OTexts is intended to be a different kind of publisher — all our books are online and free, those in print will be reasonably priced. The online version will continue to be updated regularly. The print version is a snapshot of the online version today. We will release a new print edition occasionally, no more than annually and only when the online version has changed enough to warrant a new print edition. We are planning an offline electronic version as well. I’ll announce it here when it is ready.

## Posts Tagged ‘R’:

## Job at Center for Open Science

This looks like an interesting job. Dear Dr. Hyndman, I write from the Center for Open Science, a non-profit organization based in Charlottesville, Virginia in the United States, which is dedicated to improving the alignment between scientific values and scientific practices. We are dedicated to open source and open science. We are reaching out to you to find out if you know anyone who might be interested in our Statistical and Methodological Consultant position. The position is a unique opportunity to consult on reproducible best practices in data analysis and research design; the consultant will make shorts visits to provide lectures and training at universities, laboratories, conferences, and through virtual mediums. An especially unique part of the job involves collaborating with the White House’s Office of Science and Technology Policy on matters relating to reproducibility. If you know someone with substantial training and experience in scientific research, quantitative methods, reproducible research practices, and some programming experience (at least R, ideally Python or Julia) might you please pass this along to them? Anyone may find out more about the job or apply via our website: http://centerforopenscience.org/jobs/#stats The position is full-time and located at our office in beautiful Charlottesville, VA. Thanks in advance for your time

## Cover of my forecasting textbook

We now have a cover for the print version of my forecasting book with George Athanasopoulos. It should be on Amazon in a couple of weeks. The book is also freely available online. This is a variation of the most popular one in the poll conducted a month or two ago. The cover was produced by Scarlett Rugers who I can happily recommend to anyone wanting a book cover designed.

## Fast computation of cross-validation in linear models

The leave-one-out cross-validation statistic is given by where , are the observations, and is the predicted value obtained when the model is estimated with the th case deleted. This is also sometimes known as the PRESS (Prediction Residual Sum of Squares) statistic. It turns out that for linear models, we do not actually have to estimate the model times, once for each omitted case. Instead, CV can be computed after estimating the model once on the complete data set.

## Testing for trend in ARIMA models

Today’s email brought this one: I was wondering if I could get your opinion on a particular problem that I have run into during the reviewing process of an article. Basically, I have an analysis where I am looking at a couple of time-series and I wanted to know if, over time there was an upward trend in the series. Inspection of the raw data suggests there is, but we want some statistical evidence for this. To achieve this I ran some ARIMA (0,1,1) models including a drift/trend term to see if the mean of the series did indeed shift upwards with time and found that it did. However, we have run into an issue with a reviewer who argues that differencing removes trends and may not be a suitable way to detect trends. Therefore, the fact that we found a trend despite differencing suggest that differencing was not successful. I know there are a few papers and textbooks that use ARIMA (0,1,1) models as ‘random walks with drift’-type models so I cited them as examples of this procedure in action, but they remained unconvinced. Instead it was suggested that I look for trends in the raw undifferenced time-series as these would be more reliable as no trends had been removed. AT the moment I am hesitant to do this

## Unit root tests and ARIMA models

An email I received today: I have a small problem. I have a time series called x : — If I use the default values of auto.arima(x), the best model is an ARIMA(1,0,0) — However, I tried the function ndiffs(x, test=“adf”) and ndiffs(x, test=“kpss”) as the KPSS test seems to be the default value, and the number of difference is 0 for the kpss test (consistent with the results of auto.arima() ) but 2 for the ADF test. I then tried auto.arima(x, test=“adf”) and now I have another model ARIMA(1,2,1). I am unsure which order of integration I should use as tests give fairly different results. Is there a test that prevails ?

## Using old versions of R packages

I received this email yesterday: I have been using your ‘forecast’ package for more than a year now. I was on R version 2.15 until last week, but I am having issues with lubridate package, hence decided to update R version to R 3.0.1. In our organization even getting an open source application require us to go through a whole lot of approval processes. I asked for R 3.0.1, before I get approval for 3.0.1, a new version of R ( R 3.0.2 ) came out. Unfortunately for me forecast package was built in R3.0.2. Is there any version of forecast package that works in older version of R(3.0.1). I just don’t want to go through this entire approval war again within the organization. Please help if you have any work around for this This is unfortunately very common. Many corporate IT environments lock down computers to such an extent that it cripples the use of modern software like R which is continuously updated. It also affects universities (which should know better) and I am constantly trying to invent work-arounds to the constraints that Monash IT services place on staff and student computers. Here are a few thoughts that might help.

## Forecasting weekly data

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25÷7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation.

## Fitting models to short time series

Following my post on fitting models to long time series, I thought I’d tackle the opposite problem, which is more common in business environments. I often get asked how few data points can be used to fit a time series model. As with almost all sample size questions, there is no easy answer. It depends on the number of model parameters to be estimated and the amount of randomness in the data. The sample size required increases with the number of parameters to be estimated, and the amount of noise in the data.

## More time series data online

Earlier this week I had coffee with Ben Fulcher who told me about his online collection comprising about 30,000 time series, mostly medical series such as ECG measurements, meteorological series, birdsong, etc. There are some finance series, but not many other data from a business or economic context, although he does include my Time Series Data Library. In addition, he provides Matlab code to compute a large number of characteristics. Anyone wanting to test time series algorithms on a large collection of data should take a look. Unfortunately there is no R code, and no R interface for downloading the data.