# Hyndsight

Thoughts on research, forecasting, statistics, and other distractions

### WOMBAT 2022

conferences

data science

Monash University

R

statistics

WOMBAT is back! The WOMBAT conferences are “Workshops Organized by the Monash Business Analytics Team”. The first one was held in 2016, and later editions took place in 2017 and 2019. The 2022 version will take place on 6-7 December.

### Migrating from Disqus to giscus

computing

I’ve long wanted to ditch Disqus as the commenting system on this blog, as it is bloated, adds a lot of extra and unnecessary links, and generally looks noisy. I’ve been using Disqus for more than 13 years, largely because it was the only available solution at the time I added comments. To make Disqus interface a little cleaner, I disabled all the advertising and as much of the other noise as possible, but it still looked like something from mySpace (for those of you who remember the 20th century).

### Time series and forecasting workshop: 9-10 November 2022

forecasting

R

In most recent years, I’ve run a 2-3 day workshop, held in various locations around the world. The one this year will be in Canberra on 9-10 November, and will be taught jointly with Associate Professor Bahman Rostami-Tabar. Details are here.

### Monash time series forecasting repository

time series

R

forecasting

data science

The Monash time series forecasting respository is a comprehensive collection of time series data made available in a convenient form to encourage empirical forecast evaluations. The repository includes the data from many forecasting competitions including the M1, M3, M4, NN5, tourism, and KDD cup 2018, as well as many other data sets from diverse applications. The associated paper discusses the various data sets and their characteristics. Where a time series collection contains data with different observation frequencies, they are split into different data sets so that the series within each data set has the same frequency.

### Simulating from TBATS models

time series

R

forecasting

data science

I’ve had several requests for an R function to simulate future values from a TBATS model. We will eventually include TBATS in the

`fable`

package, and the facilities will be added there. But in the meantime, if you are using the `forecast`

package and want to simulate from a fitted TBATS model, here is how do it.
### Job advertisements

jobs

Employers often contact me asking how to find a good statistician, econometrician or forecaster for their organization. Students also ask me how to go about finding a job when they finish their degree. This post is for both groups, hopefully making it easier for them to pair up appropriately.

### Time series cross-validation using fable

time series

R

forecasting

data science

Time series cross-validation is handled in the

`fable`

package using the `stretch_tsibble()`

function to generate the data folds. In this post I will give two examples of how to use it, one without covariates and one with covariates.
### Co-authorships for sale

research

publishing

journals

writing

This is an interesting development! How many papers are published by bogus authors, and what is the going price for a coauthorship? Needless to say, this is appalling and contrary to every academic integrity policy I’ve seen. See the Monash authorship policy for example.

### Terminology matters

time series

statistics

econometrics

I was reminded again this week that getting the right terminology is important. Some of my colleagues who work in machine learning wrote a paper entitled “Time series regression” which began with “This paper introduces Time Series Regression (TSR): a little-studied task …”. Statisticians and econometricians have done time series regression for many decades, so this beginning led to the paper being lampooned on Twitter.

### Excess deaths for 2020

time series

statistics

epidemiology

demography

The reported COVID19 deaths in each country are often undercounts due to different reporting practices, or people dying of COVID19 related causes without ever being tested. One way to explore the true mortality effect of the pandemic is to look at “excess deaths” — the difference between death rates this year and the same time in previous years.

### Why log ratios are useful for tracking COVID-19

time series

statistics

epidemiology

There have been some great data visualizations produced of COVID-19 case and deaths data, the best known of which is the graph from John Burn-Murdoch in the

*Financial Times*. To my knowledge, it was first used by Matt Cowgill from the Grattan Institute, and has been widely copied. This is a great visualization and has helped introduce log-scale graphics to a wide audience.### Electricity demand data in tsibble format

time series

statistics

R

tidyverts

The

`tsibbledata`

packages contains the `vic_elec`

data set, containing half-hourly electricity demand for the state of Victoria, along with corresponding temperatures from the capital city, Melbourne. These data cover the period 2012-2014.
### ABS time series as tsibbles

time series

statistics

R

tidyverts

Australian data analysts will know how frustrating it is to work with time series data from the Australian Bureau of Statistics. They are stored as multiple ugly Excel files (each containing multiple sheets) with inconsistent formatting, embedded comments, meta data stored along with the actual data, dates stored in a painful Excel format, and so on.

### Non-Gaussian forecasting using fable

time series

graphics

statistics

R

tidyverts

forecasting

In my previous post about the new

**fable**package, we saw how fable can produce forecast distributions, not just point forecasts. All my examples used Gaussian (normal) distributions, so in this post I want to show how non-Gaussian forecasting can be done.### Tidy forecasting in R

time series

graphics

statistics

R

tidyverts

forecasting

The

**fable**package for doing tidy forecasting in R is now on CRAN. Like**tsibble**and**feasts**, it is also part of the tidyverts family of packages for analysing, modelling and forecasting many related time series (stored as tsibbles).### Feature-based time series analysis

time series

graphics

statistics

R

tidyverts

anomalies

data science

In my last post, I showed how the

`feasts`

package can be used to produce various time series graphics.
### Time series graphics using feasts

time series

graphics

statistics

R

tidyverts

This is the second post on the new tidyverts packages for tidy time series analysis. The previous post is here.

### Tidy time series data using tsibbles

time series

graphics

statistics

R

tidyverts

There is a new suite of packages for tidy time series analysis, that integrates easily into the tidyverse way of working. We call these the

`tidyverts`

packages, and they are available at tidyverts.org. Much of the work on these packages has been done by Earo Wang and Mitchell O’Hara-Wild.
### Poll position: statistics and the Australian federal election

politics

statistics

R

One of the few people in Australia who did not write off a possible Coalition win at the recent federal election was Peter Ellis. We’ve invited him to come and give a talk about making sense of opinion polls and the Australian federal election on Friday this week at Monash University. Visitors are welcome. Here are the details.

### You are what you vote

politics

statistics

R

I’ve tried my hand at writing for the wider public with an

**article for The Conversation**based on my paper with Di Cook and Jeremy Forbes on “Spatial modelling of the two-party preferred vote in Australian federal elections: 2001-2016”. With the next Australian election taking place tomorrow, we thought it was timely to put out a publicly accessible version of our analysis.### Translations of “Forecasting: principles and practice”

forecasting

fpp

otexts

references

writing

There are now translations of my forecasting textbook (coauthored with George Athanasopoulos) into Chinese and Korean.

### Post-docs in wind and solar power forecasting

forecasting

energy

R

time series

We currently have two postdoc opportunities together with an industry partner in the field of wind and solar power forecasting (full time, Level B). They are suitable for recently graduated PhD students that can start between now and June-July.

### Advice to PhD applicants

computing

mathematics

statistics

supervision

**For students who are interested in doing a PhD at Monash under my supervision.**

### forecast 8.5

forecasting

graphics

R

time series

The latest minor release of the forecast package has now been approved on CRAN and should be available in the next day or so.

### M4 Forecasting Conference

forecasting

R

reproducible research

time series

conferences

Following the highly successful M4 Forecasting Competition, there will be a conference held on 10-11 December at Tribeca Rooftop, New York, to discuss the results.

### MeDaScIn 2018

conferences

data science

seminars

forecasting

R

The annual Melbourne Data Science Initiative (or MeDaScIn, pronounced medicine) is on again next month (24-27 September) with lots of tutorials, and the annual datathon.

### Saving ts objects as csv files

time series

R

Occasionally R might not be the tool you want to use (hard to believe, but apparently that happens). Then you may need to export some data from R via a csv file. When the data is stored as a

`ts`

object, the time index can easily get lost. So I wrote a little function to make this easier, using the `tsibble`

package to do almost all of the work in looking after the time index. (Thanks to Earo in the comments for greatly simplifying my original code.)
### A forecast ensemble benchmark

forecasting

time series

R

Forecasting benchmarks are very important when testing new forecasting methods, to see how well they perform against some simple alternatives. Every week I get sent papers proposing new forecasting methods that fail to do better than even the simplest benchmark. They are rejected without review.

### Forecasting in NYC: 25-27 June 2018

forecasting

time series

R

conferences

data science

In late June, I will be in New York to teach my 3-day workshop on

**Forecasting using R**. Tickets are available at Eventbrite.### Upcoming talks: May-July 2018

forecasting

time series

R

conferences

data science

First semester teaching is nearly finished, and that means conference season for me. Here are some talks I’m giving in the next two months. Click the links for more details.

### forecast v8.3 now on CRAN

forecasting

time series

R

The latest version of the forecast package for R is now on CRAN. This is the version used in the 2nd edition of my forecasting textbook with George Athanasopoulos. So readers should now be able to replicate all examples in the book using only CRAN packages.

### R package for M4 Forecasting Competition

forecasting

R

reproducible research

time series

The M4 forecasting competition is well under-way, and a few of my PhD students have been working on submissions.

### IJF Tao Hong Award 2018

prizes

forecasting

energy

IJF

Every two years, the

*International Journal of Forecasting*awards a prize to the best paper on energy forecasting. The prize is generously funded by Professor Tao Hong. This year, we will award the prize to a paper published in the*IJF*during the period 2015-2016. The prize will be US$1000 plus an engraved plaque. The award committee is Rob J Hyndman, Pierre Pinson and James Mitchell.### M4 Forecasting Competition update

forecasting

R

reproducible research

time series

The official guidelines for the M4 competition have now been published, and there have been several developments since my last post on this.

### Data Science for Managers: May 2018

conferences

data science

Monash University

For the last few years, I have been involved with running a 3-day short course on “Data Science for Managers”. We have run it twice each year since 2015, and it continues to prove very popular. We have some awesome presenters including Monash University professors Di Cook, Geoff Webb, and Kim Marriott, as well as several very experienced data scientists working in industry.

### Some new time series packages

R

reproducible research

time series

data science

data

anomalies

This week I have finished preliminary versions of two new R packages for time series analysis. The first (

**tscompdata**) contains several large collections of time series that have been used in forecasting competitions; the second (**tsfeatures**) is designed to compute features from univariate time series data. For now, both are only on github. I will probably submit them to CRAN after they’ve been tested by a few more people.### M4 Forecasting Competition: response from Spyros Makridakis

forecasting

R

reproducible research

time series

*Following my post on the M4 competition yesterday, Spyros Makridakis sent me these comments for posting here.*### M4 Forecasting Competition

forecasting

R

reproducible research

time series

The “M” competitions organized by Spyros Makridakis have had an enormous influence on the field of forecasting. They focused attention on what models produced good forecasts, rather than on the mathematical properties of those models. For that, Spyros deserves congratulations for changing the landscape of forecasting research through this series of competitions.

### Come and work with me

Monash University

jobs

acems

R

data science

I have funding for a new post-doctoral research fellow, on a 2-year contract, to work with me and Professor Kate Smith-Miles on analysing large collections of time series data. We are particularly seeking someone with a PhD in computational statistics or statistical machine learning.

### Looking for a new research assistant

Monash University

jobs

R

data science

I’m currently looking for a new research assistant to help (primarily) with some modelling and R coding as part of a project on forecasting mobile phone sales. The position is likely to last for about 6–9 months, and will be casual.

### rOpenSci OzUnconference coming to Melbourne

conferences

data science

R

rOpenSci

reproducible research

For a second year running, there will be another

**rOpenSci OzUnconference**in Australia. This one will be held in Melbourne, on 26-27 October 2017.### Finding distinct rows of a tibble

R

statistics

computing

I’ve been using R or its predecessors for about 30 years, so I tend to I know a lot about R, but I don’t necessarily know how to use modern R tools. Lately, I’ve been teaching my students the tidyverse approach to data analysis, which means that I need to unlearn some old approaches and to re-learn them using new tools. But old dogs and new tricks…

### Forecasting workshop in Perth

conferences

forecasting

fpp

otexts

R

teaching

On 26-28 September 2017, I will be running my 3-day workshop in Perth on “Forecasting: principles and practice” based on my book of the same name.

### IIF Tao Hong Award 2016

prizes

forecasting

energy

IJF

A generous donation from Professor Tao Hong has funded this new award for papers on energy forecasting published in the

*International Journal of Forecasting*. The award for 2016 is for papers published within 2013–2014. Next year we will award a paper published in 2015–2016, and we will make the award every two years after that.### Why I’m not celebrating the 2016 impact factors

ijf

journals

Once every year, the journal citation reports are released including journal impact factors. This year, the

*International Journal of Forecasting*2-year impact factor has increased to**2.642**which is the highest it has been in the journal’s history, and puts the journal higher than such notable titles as*Journal of the American Statistical Association*and just below*Management Science*.### Prediction intervals for NNETAR models

forecasting

R

time series

The

`nnetar`

function in the **forecast**package for R fits a neural network model to a time series with lagged values of the time series as inputs (and possibly some other exogenous inputs). So it is a nonlinear autogressive model, and it is not possible to analytically derive prediction intervals. Therefore we use simulation.### ISI Karl Pearson Prize for 2017

statistics

prizes

Recently I was privileged to sit on the committee that selects the winner of the Karl Pearson Prize. KP was, of course, an early mathematical statistician, famous for many commonly-used statistical methods and tools including histograms, the correlation coefficient, the method of moments, p-values, the chi-squared test and principal components analysis. He is also infamous for his highly racist views, support for eugenics, anti-semitism and for refusing a knighthood.

### Monthly seasonality

R

seasonality

time series

I regularly get asked why I don’t consider monthly seasonality in my models for daily or sub-daily time series. For example, this recent comment on my post on seasonal periods, or this comment on my post on daily data. The fact is, I’ve never seen a time series with monthly seasonality, although that does not mean it does not exist.

### Converting to blogdown

computing

R

This website has gone through several major updates over the years. It began in 1993 as some handcrafted html files, transitioned to Joomla and later to Wordpress. Then it slowly grew into a collection of ten connected Wordpress installations that became increasingly difficult to maintain, and rather slow.

### Software for honours students

LaTeX

Monash University

R

references

I spoke to our new crop of honours students this morning. Here are my slides, example files and links.

### Monash Rmarkdown templates on github

beamer

Monash University

reproducible research

research team

Rmarkdown templates for staff and students in my department are now available via the numbats package on github. This provides templates for

### Follow-up forecasting forum in Eindhoven

data science

forecasting

fpp

R

seminars

statistics

teaching

time series

Last October I gave a 3-day masterclass on “Forecasting with R” in Eindhoven, Netherlands. There is a follow-up event planned for Tuesday 18 April 2017. It is particularly designed for people who attended the 3-day class, but if anyone else wants to attend they would be welcome.

### IJF Best Paper Award 2014-2015

forecasting

IJF

journals

references

prizes

Every two years we award a prize for the best paper published in the

*International Journal of Forecasting*. It is now time to identify the best paper published in the IJF during 2014 and 2015. There is always about 18 months delay after the publication period to allow time for reflection, citations, etc. The prize is US$1000 plus an engraved plaque. I will present the prize at the ISF in Cairns in late June.### forecast 8.0

forecasting

graphics

R

time series

In what is now a roughly annual event, the forecast package has been updated on CRAN with a new version, this time 8.0.

### The Australian Macro Database

data science

econometrics

reproducible research

time series

AusMacroData is a new website that encourages and facilitates the use of quantitative, publicly available Australian macroeconomic data. The Australian Macro Database hosted at ausmacrodata.org provides a user-friendly front end for searching among over 40000 economic variables and is loosely based on similar international sites such as the Federal Reserve Economic Database (FRED).

### Forecasting practitioner talks at ISF 2017

conferences

data science

forecasting

ISF2017

The International Symposium on Forecasting is a little unusual for an academic conference in that it has always had a strong presence of forecasters working in business and industry as well as academic forecasters, mostly at universities. We value the combination and interaction as it helps the academics understand the sorts of problems facing forecasters in practice, and it helps practitioners stay abreast of new methods and developments coming out of forecasting research.

### IJF Tao Hong Award for the best paper in energy forecasting 2013-2014

energy

forecasting

IJF

prizes

Professor Tao Hong has generously funded a new prize for the best IJF paper on energy forecasting, to be awarded every two years. The first award will be for papers published in the International Journal of Forecasting during the period 2013-2014. The prize will be US$1000 plus an engraved plaque. The award committee is Rob J Hyndman, Pierre Pinson and James Mitchell.

### Q&A: predictive analytics

data science

forecasting

technology

A major news outlet interviewed me on predictive analytics. Here were my responses.

### Q&A time

consulting

data science

forecasting

R

Someone sent me some questions by email, and I decided to answer some of them here.

### Tourism forecasting competition data as an R package

forecasting

R

reproducible research

time series

The data used in the tourism forecasting competition, discussed in Athanasopoulos et al (2011), have been made available in the Tcomp package for R. The objects are of the same format as for Mcomp package containing data from the M1 and M3 competitions.

### GEFCom2017: Hierarchical Probabilistic Load Forecasting

conferences

energy

forecasting

After the great success of the previous two energy forecasting competitions we have run (GEFCom2012 and GEFCom2014), we are holding another one, this time focused on hierarchical probabilistic load forecasting. Check out all the details over on Tao Hong’s blog.

### Come to Melbourne, even if not to Monash

data science

jobs

mathematics

Monash University

statistics

The University of Melbourne is advertising for a “Professor in Statistics (Data Science)”. Melbourne (the city) is fast becoming a vibrant centre for data science and applied statistics, with more than 4700 people signed up for the Data Science Meetup Group, a thriving start-up scene, the group at Monash Business School (including Di Cook and me), and the Monash Centre for Data Science (including Geoff Webb and Wray Buntine). Not to mention that Melbourne is a wonderful place to live, having won the “World’s most liveable city” award from the Economist for the last 6 years in a row.

### Forecast intervals for aggregates

forecasting

R

time series

A common problem is to forecast the aggregate of several time periods of data, using a model fitted to the disaggregated data. For example, you may have monthly data but wish to forecast the total for the next year. Or you may have weekly data, and want to forecast the total for the next four weeks.

### R package forecast v7.2 now on CRAN

forecasting

graphics

R

seasonality

time series

I’ve pushed a minor update to the forecast package to CRAN. Some highlights are listed here.

### Sponsorship for the Cairns forecasting conference

conferences

forecasting

ISF2017

Regular readers will know that the International Symposium on Forecasting is coming to Australia in June 2017. This is the leading international forecasting conference, and one I’ve attended every year for the past 17 years.

### Rmarkdown template for a Monash working paper

LaTeX

Monash University

R

reproducible research

research team

writing

This is only directly relevant to my Monash students and colleagues, but the same idea might be useful for adapting to other institutions.

### “Forecasting with R” short course in Eindhoven

data science

forecasting

fpp

hts

R

seminars

statistics

teaching

time series

I will be giving my 3-day short-course/workshop on “Forecasting with R” in Eindhoven (Netherlands) from 19-21 October.

### Tourism time series repository

forecasting

IJF

reproducible research

time series

A few years ago, I wrote a paper with George Athanasopoulos and others about a tourism forecasting competition. We originally made the data available as an online supplement to the paper, but that has unfortunately since disappeared although the paper itself is still available.

### The latest IJF issue with GEFCom2014 results

energy

forecasting

IJF

journals

reproducible research

The latest issue of the

*IJF*is a bumper issue with over 500 pages of forecasting insights.### 2017 International Symposium on Energy Analytics

conferences

energy

forecasting

IJF

ISF2017

This will be a great conference, and it is in a great location — Cairns, Australia, right by the Great Barrier Reef. Even better, if you stay on you can attend the

**International Symposium on Forecasting**which immediately follows the**International Symposium on Energy Analytics**.### Melbourne Data Science Initiative 2016

conferences

data science

forecasting

R

seminars

In just over three weeks, the inaugural MeDaScIn event will take place. This is an initiative to grow the talent pool of local data scientists and to promote Melbourne as a world city of excellence in Data Science.

### Sample quantiles 20 years later

computing

R

statistics

Almost exactly 20 years ago I wrote a paper with Yanan Fan on how sample quantiles are computed in statistical software. It was cited 43 times in the first 10 years, and 457 times in the next 10 years, making it my third paper to receive 500+ citations.

### Plotting overlapping prediction intervals

forecasting

graphics

R

I often see figures with two sets of prediction intervals plotted on the same graph using different line types to distinguish them. The results are almost always unreadable. A better way to do this is to use semi-transparent shaded regions. Here is an example showing two sets of forecasts for the Nile River flow.

### rOpenSci unconference in Brisbane, 21-22 April 2016

conferences

data science

R

rOpenSci

reproducible research

The first rOpenSci unconference in Australia will be held on Thursday and Friday (April 21-22) in Brisbane, at the Microsoft Innovation Centre.

### Making data analysis easier: Hadley Wickham at WOMBAT2016

conferences

data science

R

seminars

statistics

Slides for Hadley’s talk

### Electricity price forecasting competition

conferences

data science

energy

forecasting

The GEFCom competitions have been a great success in generating good research on forecasting methods for electricity demand, and in enabling a comprehensive comparative evaluation of various methods. But they have only considered price forecasting in a simplified setting. So I’m happy to see this challenge is being taken up as part of the European Energy Market Conference for 2016, to be held from 6-9 June at the University of Porto in Portugal.

### What’s your Hall number?

obituary

productivity

statistics

Today I attended the funeral of Peter Hall, one of the finest mathematical statisticians ever to walk the earth and easily the best from Australia. One of the most remarkable things about Peter was his astonishing productivity, with over 600 papers. As I sat in the audience I realised that many of the people there were probably coauthors of papers with Peter, and I wondered how many statisticians in the world would have been his coauthors or second-degree co-authors.

### ACEMS Business Analytics Prize 2016

data science

Monash University

acems

phd

prizes

statistics

We have established a new annual prize for research students at Monash University in the general area of business analytics, funded by the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS). The rules of the award are listed below.

### Farewell Peter Hall (1951-2016)

obituary

statistics

Peter Hall passed away on Saturday after a long battle with illness over the last couple of years. No statistician will need reminding of Peter’s extensive contributions to the field. He had over 500 published papers, and had won every major award available, many of them listed on his Wikipedia page.

### Starting a career in data science

computing

data science

jobs

R

statistics

I received this email from one of my undergraduate students:

### Making data analysis easier

conferences

data science

Monash University

R

statistics

Di Cook and I are organizing a workshop on “

*” for 18-19 February 2016.***Making data analysis easier**### Who’s downloading the forecast package?

computing

forecasting

fpp

otexts

R

The github page for the forecast package currently shows the following information

### The hidden benefits of open-source software

computing

consulting

data science

forecasting

jobs

R

research team

statistics

I’ve been having discussions with colleagues and university administration about the best way for universities to manage home-grown software.

### ODI looking for young postgrad statisticians

econometrics

jobs

statistics

The Overseas Development Institute Fellowship Scheme sends young postgraduate statisticians (and economists) to work in the public sectors of developing countries in Africa, the Caribbean and the Pacific on two-year contracts. This is a great way to develop skills and gain experience working within a developing country’s government. And you get to live in a fascinating place!

### Stanford seminar

forecasting

hts

R

seminars

time series

I gave a seminar at Stanford today. Slides are below. It was definitely the most intimidating audience I’ve faced, with Jerome Friedman, Trevor Hastie, Brad Efron, Persi Diaconis, Susan Holmes, David Donoho and John Chambers all present (and probably other famous names I’ve missed).

### Reproducibility in computational research

data science

LaTeX

R

reproducible research

statistics

Jane Frazier spoke at our research team meeting today on “Reproducibility in computational research”. We had a very stimulating and lively discussion about the issues involved. One interesting idea was that reproducibility is on a scale, and we can all aim to move further along the scale towards making our own research more reproducible. For example

### Chinese R conference

conferences

data science

forecasting

R

seminars

time series

I will be speaking at the Chinese R conference in Nanchang, to be held on 24-25 October, on

**“Forecasting Big Time Series Data using R”.**### Upcoming talks in California

computing

data science

forecasting

hts

R

seminars

statistics

time series

I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis.

### IJF vol 31(4): Forecasting in telecommunications and ICT

forecasting

IJF

journals

The last issue of the

*International Journal of Forecasting*for 2015 has been released. This one contains the usual mix of topics, plus a special section on**Forecasting in telecommunications and ICT**including a nice review article by Nigel Meade and Towhidul Islam. Enjoy!### Advice to other journal editors

journals

refereeing

I get asked to review journal papers almost every day, and I have to say no to almost all of them. I know it is hard to find reviewers, but many of these requests indicate very lazy editors. So to all the editors out there looking for reviewers, here is some advice.

### Mathematical annotations on R plots

computing

graphics

LaTeX

R

I’ve always struggled with using

`plotmath`

via the `expression`

function in R for adding mathematical notation to axes or legends. For some reason, the most obvious way to write something never seems to work for me and I end up using trial and error in a loop with far too many iterations.
### The bias-variance decomposition

data science

statistics

teaching

This week, I am teaching my Business Analytics class about the bias-variance trade-off. For some reason, the proof is not contained in either ESL or ISL, even though it is quite simple. I also discovered that the proof currently provided on Wikipedia makes little sense in places.

### Murphy diagrams in R

forecasting

graphics

R

statistics

time series

At the recent

*International Symposium on Forecasting*, held in Riverside, California, Tillman Gneiting gave a great talk on**“Evaluating forecasts: why proper scoring rules and consistent scoring functions matter”**. It will be the subject of an IJF invited paper in due course.### Useful tutorials

computing

data science

productivity

R

reproducible research

There are some tools that I use regularly, and I would like my research students and post-docs to learn them too. Here are some great online tutorials that might help.

### My Yahoo talk is now online

computing

data science

forecasting

R

seminars

time series

video

Last week I gave a talk in the Yahoo! Big Thinkers series. The video of the talk is now online and embedded below.

### IJF best paper awards

conferences

econometrics

forecasting

IJF

journals

references

time series

prizes

Today at the International Symposium on Forecasting, I announced the awards for the best paper published in the International Journal of Forecasting in the period 2012-2013.

### North American seminars: June 2015

conferences

data science

energy

forecasting

R

seminars

statistics

time series

For the next few weeks I am travelling in North America and will be giving the following talks.

### R vs Autobox vs ForecastPro vs …

computing

forecasting

IJF

R

reproducible research

time series

Every now and then a commercial software vendor makes claims on social media about how their software is so much better than the forecast package for R, but no details are provided.

### A new R package for detecting unusual time series

computing

data science

R

time series

The anomalous package provides some tools to detect unusual time series in a large collection of time series. This is joint work with Earo Wang (an honours student at Monash) and Nikolay Laptev (from Yahoo Labs). Yahoo is interested in detecting unusual patterns in server metrics.

### New in forecast 6.0

computing

forecasting

R

statistics

time series

This week I uploaded a new version of the forecast package to CRAN. As there were a lot of changes, I decided to increase the version number to 6.0.

### Nominations for IJF Best Paper 2012-2013

forecasting

IJF

journals

references

The following papers have been nominated for the best paper published in the

*International Journal of Forecasting*in 2012-2013. I have included an excerpt from the nomination in each case. The papers in bold have been short-listed for the award, and the editorial board are currently voting on them.### Thinking big at Yahoo

data science

forecasting

seminars

time series

I’m speaking in the “Yahoo Labs Big Thinkers” series on

**Friday 26 June**. I hope I can live up to the title!### Paperpile makes me more productive

organization

phd

productivity

references

research team

technology

One of the first things I tell my new research students is to use a reference management system to help them keep track of the papers they read, and to assist in creating bib files for their bibliography. Most of them use Mendeley, one or two use Zotero. Both do a good job and both are free.

### A new open source data set for detecting time series outliers

computing

data science

R

reproducible research

time series

Yahoo Labs has just released an interesting new data set useful for research on detecting anomalies (or outliers) in time series data. There are many contexts in which anomaly detection is important. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers.

### Dark themes for writing

computing

LaTeX

R

writing

I spend much of my day sitting in front of a screen, coding or writing. To limit the strain on my eyes, I use a dark theme as much as possible. That is, I write with light colored text on a dark background. I don’t know why this is not the default in more software as it makes a big difference after a few hours of writing.

### Statistical modelling and analysis of big data

conferences

data science

forecasting

graphics

seminars

statistics

I’m currently attending the one day workshop on this topic at QUT in Brisbane. This morning I spoke on “Visualizing and forecasting big time series data”. My slides are here.

### Standard error: a poem

poetry

statistics

writing

This poem was written by David Goddard from the Monash University Department of Epidemiology and Preventive Medicine. It is reproduced here with his permission. The poem won the inaugural Monash University poetry competition and will soon be published in an anthology of contemporary poetry.

### IASC Data Analysis Competition 2015

conferences

data science

statistics

The International Association for Statistical Computing (IASC) is holding a Data Analysis Competition. Winners will be invited to present their work at the Joint Meeting of IASC-ABE Satellite Conference for the 60th ISI WSC 2015 to be held at Atlântico Búzios Convention & Resort in Búzios, RJ, Brazil (August 2-4, 2015). They will also be invited to submit a manuscript for possible publication (following peer review) to IASC’s official journal,

*Computational Statistics & Data Analysis*.### RSS feeds for statistics and related journals

data science

forecasting

journals

R

references

statistics

I’ve now resurrected the collection of research journals that I follow, and set it up as a shared collection in feedly. So anyone can easily subscribe to all of the same journals, or select a subset of them, to follow on feedly.

### Seminars in Taiwan

data science

forecasting

graphics

hts

kaggle

R

seminars

statistics

I’m currently visiting Taiwan and I’m giving two seminars while I’m here — one at the National Tsing Hua University in Hsinchu, and the other at Academia Sinica in Taipei. Details are below for those who might be nearby.

### Di Cook is moving to Monash

data science

graphics

Monash University

R

research team

I’m delighted that Professor Dianne Cook will be joining Monash University in July 2015 as a Professor of Business Analytics. Di is an Australian who has worked in the US for the past 25 years, mostly at Iowa State University. She is moving back to Australia and joining the Department of Econometrics and Business Statistics in the Monash Business School, as part of our initiative in Business Analytics.

### New R package for electricity forecasting

consulting

energy

forecasting

R

Shu Fan and I have developed a model for electricity demand forecasting that is now widely used in Australia for long-term forecasting of peak electricity demand. It has become known as the “Monash Electricity Forecasting Model”. We have decided to release an R package that implements our model so that other people can easily use it. The package is called “MEFM” and is available on github. We will probably also put in on CRAN eventually.

### Honoring Herman Stekler

forecasting

IJF

journals

The first issue of the

*IJF*for 2015 has just been published, and I’m delighted that it includes a special section honoring Herman Stekler. It includes articles covering a range of his forecasting interests, although not all of them (sports forecasting is missing). Herman himself wrote a paper for it looking at “Forecasting—Yesterday, Today and Tomorrow”.### Prediction competitions

data science

forecasting

IJF

kaggle

statistics

Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range of prediction problems and associated methods. Recent KDD cups are hosted on kaggle.

### New Australian data on the HMD

demography

R

reproducible research

The Human Mortality Database is a wonderful resource for anyone interested in demographic data. It is a carefully curated collection of high quality deaths and population data from 37 countries, all in a consistent format with consistent definitions. I have used it many times and never cease to be amazed at the care taken to maintain such a great resource.

### Visualization of probabilistic forecasts

forecasting

graphics

R

statistics

This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.

### IJF review papers

forecasting

IJF

journals

references

Review papers are extremely useful for new researchers such as PhD students, or when you want to learn about a new research field. The

*International Journal of Forecasting*produced a whole review issue in 2006, and it contains some of the most highly cited papers we have ever published. Now, beginning with the latest issue of the journal, we have started publishing occasional review articles on selected areas of forecasting. The first two articles are:### Seasonal periods

forecasting

R

seasonality

statistics

I get questions about this almost every week. Here is an example from a recent comment on this blog:

### ABS seasonal adjustment update

jobs

seasonality

statistics

Since my last post on the seasonal adjustment problems at the Australian Bureau of Statistics, I’ve been working closely with people within the ABS to help them resolve the problems in time for tomorrow’s release of the October unemployment figures.

### Jobs at Amazon

data science

econometrics

forecasting

jobs

R

statistics

I do not normally post job adverts, but this was very specifically targeted to “applied time series candidates” so I thought it might be of sufficient interest to readers of this blog.

### Prediction intervals too narrow

forecasting

R

statistics

Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.

### hts with regressors

forecasting

hts

R

statistics

The hts package for R allows for forecasting hierarchical and grouped time series data. The idea is to generate forecasts for all series at all levels of aggregation without imposing the aggregation constraints, and then to reconcile the forecasts so they satisfy the aggregation constraints. (An introduction to reconciling hierarchical and grouped time series is available in this Foresight paper.)

### Congratulations to Dr Souhaib Ben Taieb

forecasting

research team

Souhaib Ben Taieb has been awarded his doctorate at the Université libre de Bruxelles and so he is now officially Dr Ben Taieb! Although Souhaib lives in Brussels, and was a student at the Université libre de Bruxelles, I co-supervised his doctorate (along with Professor Gianluca Bontempi). Souhaib is the 19th PhD student of mine to graduate.

### Explaining the ABS unemployment fluctuations

jobs

seasonality

statistics

Although the

*Guardian*claimed yesterday that I had explained “what went wrong” in the July and August unemployment figures, I made no attempt to do so as I had no information about the problems. Instead, I just explained a little about the purpose of seasonal adjustment.### Connect with local employers

data science

jobs

seminars

statistics

I keep telling students that there are lots of jobs in data science (including statistics), and they often tell me they can’t find them advertised. As usual, you do have to do some networking, and one of the best ways of doing it is via a Data Science Meetup. Many cities now have them including Melbourne, Sydney, London, etc. It is the perfect opportunity to meet with local employers, many of which are hiring due to the huge expansion in the use of data analysis in business (aka business analytics).

### IIF Sponsored Workshops

conferences

forecasting

IJF

The International Institute of Forecasters sponsors workshops every year, each of which focuses on a specific theme. The purpose of these workshops is to facilitate small, informal meetings where experts in a particular field of forecasting can discuss forecasting problems, research, and solutions. Over the years, our workshops have covered topics from

*Predicting Rare Events*,*ICT Forecasting*, and, most recently,*Singular Spectrum Analysis*. Often these workshops are associated with a special issue of the*International Journal of Forecasting*.### TBATS with regressors

forecasting

R

statistics

I’ve received a few emails about including regression variables (i.e., covariates) in TBATS models. As TBATS models are related to ETS models,

`tbats()`

is unlikely to ever include covariates as explained here. It won’t actually complain if you include an `xreg`

argument, but it will ignore it.
### FPP now available as a downloadable e-book

forecasting

fpp

otexts

R

statistics

My forecasting textbook with George Athanasopoulos is already available online (for free), and in print via Amazon (for under $40). Now we have made it available as a downloadable e-book via Google Books (for $15.55). The Google Books version is identical to the print version on Amazon (apart from a few typos that have been fixed).

### Tim Harford on forecasting

forecasting

A few weeks ago I had a Skype chat with Tim Harford, the “Undercover Economist” for Britain’s

*Financial Times*. He was working on an article for the FT on forecasting, and wanted my perspective as an academic forecaster. I mostly talked about what makes some things more predictable than others, as discussed in this blog post. In the end, his article headed in a different direction, so I don’t get quoted, but it is still a good read!### Resources for the FPP book

forecasting

fpp

otexts

R

statistics

teaching

The FPP resources page has recently been updated with several new additions including

### Forecasting with R in WA

conferences

forecasting

fpp

otexts

R

teaching

On 23-25 September, I will be running a 3-day workshop in Perth on “Forecasting: principles and practice” mostly based on my book of the same name.

### biblatex for statisticians

LaTeX

references

writing

I am now using biblatex for all my bibliographic work as it seems to have developed enough to be stable and reliable. The big advantage of biblatex is that it is easy to format the bibliography to conform to specific journal or publisher styles. It is also possible to have structured bibliographies (e.g., divided into sections: books, papers, R packages, etc.)

### GEFCom 2014 energy forecasting competition is underway

data science

energy

forecasting

IJF

kaggle

R

statistics

GEFCom 2014 is the most advanced energy forecasting competition ever organized, both in terms of the data involved, and in terms of the way the forecasts will be evaluated.

### Visit of Di Cook

data science

graphics

Monash University

R

research team

seminars

statistics

Next week, Professor Di Cook from Iowa State University is visiting my research group at Monash University. Di is a world leader in data visualization, and is especially well-known for her work on interactive graphics and the XGobi and GGobi software. See her book with Deb Swayne for details.

### Student forecasting awards from the IIF

forecasting

grants

prizes

IJF

Monash University

teaching

At the IIF annual board meeting last month in Rotterdam, I suggested that we provide awards to the top students studying forecasting at university level around the world, to the tune of $100 plus IIF membership for a year. I’m delighted that the idea met with enthusiasm, and that the awards are now available. Even better, my second year forecasting subject has been approved for an award.

### Coherent population forecasting using R

demography

forecasting

R

statistics

This is an example of how to use the demography package in R for stochastic population forecasting with coherent components. It is based on the papers by Hyndman and Booth (IJF 2008) and Hyndman, Booth and Yasmeen (Demography 2013). I will use Australian data from 1950 to 2009 and forecast the next 50 years.

### Plotting the characteristic roots for ARIMA models

computing

forecasting

R

statistics

When modelling data with ARIMA models, it is sometimes useful to plot the inverse characteristic roots. The following functions will compute and plot the inverse roots for any fitted ARIMA model (including seasonal models).

### I am not an econometrician

econometrics

Monash University

statistics

I am a statistician, but I have worked in a department of predominantly econometricians for the past 17 years. It is a little like an Australian visiting the United States. Initially, it seems that we talk the same language, do the same sorts of things, and have a very similar culture. But the longer you stay there, the more you realise there are differences that run deep and affect the way you see the world.

### Variations on rolling forecasts

computing

forecasting

R

statistics

Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models.

### Varian on big data

computing

econometrics

forecasting

R

references

research team

statistics

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”,

*Journal of Economic Perspectives*,**28**(2): 3-28.### Specifying complicated groups of time series in hts

computing

forecasting

hts

R

StackExchange

statistics

time series

With the latest version of the hts package for R, it is now possible to specify rather complicated grouping structures relatively easily.

### European talks. June-July 2014

conferences

demography

forecasting

R

seminars

statistics

For the next month I am travelling in Europe and will be giving the following talks.

### Creating a handout from beamer slides

beamer

conferences

LaTeX

seminars

teaching

I’m about to head off on a speaking tour to Europe (more on that in another post) and one of my hosts has asked for my powerpoint slides so they can print them. They have made two false assumptions: (1) that I use powerpoint; (2) that my slides are static so they can be printed.

### Data science market places

computing

consulting

data science

forecasting

jobs

statistics

Some new websites are being established offering “market places” for data science. Two I’ve come across recently are Experfy and SnapAnalytx.

### Structural breaks

econometrics

forecasting

statistics

I’m tired of reading about tests for structural breaks and here’s why.

### To explain or predict?

econometrics

forecasting

references

research team

statistics

Last week, my research group discussed Galit Shmueli’s paper “To explain or to predict?”,

*Statistical Science*,**25**(3), 289-310. (See her website for further materials.) This is a paper everyone doing statistics and econometrics should read as it helps to clarify a distinction that is often blurred. In the discussion, the following issues were covered amongst other things.### Questions on the business analytics jobs

jobs

Monash University

I’ve received a few questions on the business analytics jobs advertised last week. I think it is best if I answer them here so other potential candidates can have the same information. I will add to this post if I receive more questions.

### New jobs in business analytics at Monash

computing

econometrics

forecasting

jobs

kaggle

Monash University

R

statistics

We have an exciting new initiative at Monash University with some new positions in business analytics. This is part of a plan to strengthen our research and teaching in the data science/computational statistics area. We are hoping to make multiple appointments, at junior and senior levels. These are five-year appointments, but we hope that the positions will continue after that if we can secure suitable funding.

### Great papers to read

forecasting

references

research team

statistics

My research group meets every two weeks. It is always fun to talk about general research issues and new tools and tips we have discovered. We also use some of the time to discuss a paper that I choose for them. Today we discussed Breiman’s classic (2001) two cultures paper — something every statistician should read, including the discussion.

### Publishing an R package in the Journal of Statistical Software

computing

journals

JSS

R

refereeing

I’ve been an editor of JSS for the last few years, and as a result I tend to get email from people asking me about publishing papers describing R packages in JSS. So for all those wondering, here are some general comments.

### Seven forecasting blogs

econometrics

forecasting

R

statistics

There are several other blogs on forecasting that readers might be interested in. Here are seven worth following:

### Errors on percentage errors

forecasting

fpp

IJF

R

references

reproducible research

statistics

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as
\text{MAPE} = 100\text{mean}(|y_t - \hat{y}_t|/|y_t|)
where y_t denotes an observation and \hat{y}_t denotes its forecast, and the mean is taken over t.

### Generating tables in LaTeX

computing

LaTeX

productivity

tables

Typing tables in LaTeX can get messy, but there are some good tools to simplify the process. One I discovered this week is tablesgenerator.com, a web-based tool for generating LaTeX tables. It also allows the table to saved in other formats including HTML and Markdown. The interface is simple, but it does most things. For complicated tables, some additional formatting may be necessary.

### My forecasting book now on Amazon

forecasting

fpp

otexts

publishing

R

references

For all those people asking me how to obtain a print version of my book “Forecasting: principles and practice” with George Athanasopoulos, you now can.

### Job at Center for Open Science

jobs

R

reproducible research

statistics

This looks like an interesting job.

### Fast computation of cross-validation in linear models

computing

forecasting

fpp

otexts

R

statistics

teaching

The leave-one-out cross-validation statistic is given by
\text{CV} = \frac{1}{N} \sum_{i=1}^N e_{[i]}^2,
where e_{[i]} = y_{i} - \hat{y}_{[i]}, the observations are given by y_{1},\dots,y_{N}, and \hat{y}_{[i]} is the predicted value obtained when the model is estimated with the i\text{th} case deleted. This is also sometimes known as the PRESS (Prediction Residual Sum of Squares) statistic.

### Probabilistic forecasting by Gneiting and Katzfuss (2014)

forecasting

IJF

journals

statistics

The IJF is introducing occasional review papers on areas of forecasting. We did a whole issue in 2006 reviewing 25 years of research since the International Institute of Forecasters was established. Since then, there has been a lot of new work in application areas such as call center forecasting and electricity price forecasting. In addition, there are areas we did not cover in 2006 including new product forecasting and forecasting in finance. There have also been methodological and theoretical developments over the last eight years. Consequently, I’ve started inviting eminent researchers to write survey papers for the journal.

### Highlighting the web

computing

fpp

otexts

teaching

technology

Users of my new online forecasting book have asked about having a facility for personal highlighting of selected sections, as students often do with print books. We have plans to make this a built-in part of the platform, but for now it is possible to do it using a simple browser extension. This approach allows any website to be highlighted, so is even more useful than if we only had the facility on OTexts.org.

### Forecasting weekly data

forecasting

R

statistics

This is another situation where Fourier terms are useful for handling the seasonality. Not only is the seasonal period rather long, it is non-integer (averaging 365.25/7 = 52.18). So ARIMA and ETS models do not tend to give good results, even with a period of 52 as an approximation.

### The forecast mean after back-transformation

forecasting

R

statistics

Many functions in the forecast package for R will allow a Box-Cox transformation. The models are fitted to the transformed data and the forecasts and prediction intervals are back-transformed. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the

**median**of the forecast densities (assuming the forecast densities on the transformed scale are symmetric). For many purposes, this is acceptable, but occasionally the mean forecast is required. For example, with hierarchical forecasting the forecasts need to be aggregated, and medians do not aggregate but means do.### Statistical politicians

mathematics

statistics

Last week we had the pleasure of Professor Stephen Pollock (University of Leicester) visiting our Department, best known in academic circles for his work on time series filtering (see his papers, and his excellent book). But he has another career as a member of the UK House of Lords (under the name Viscount Hanworth – he is a hereditary peer).

### Backcasting in R

forecasting

R

statistics

Sometimes it is useful to “backcast” a time series — that is, forecast in reverse time. Although there are no in-built R functions to do this, it is very easy to implement. Suppose

`x`

is our time series and we want to backcast for h periods. Here is some code that should work for most univariate time series. The example is non-seasonal, but the code will also work with seasonal data.
### Hierarchical forecasting with hts v4.0

computing

forecasting

fpp

hts

R

statistics

A new version of my

`hts`

package for R is now on CRAN. It was completely re-written from scratch. Not a single line of code survived. There are some minor syntax changes, but the biggest change is speed and scope. This version is many times faster than the previous version and can handle hundreds of thousands of time series without complaining.
### Interview for the Capital of Statistics

consulting

forecasting

fpp

otexts

R

StackExchange

statistics

teaching

Earo Wang recently interviewed me for the Chinese website Capital of Statistics. The English transcript of the intervew is on Earo’s personal website.

### Top papers in the International Journal of Forecasting

forecasting

IJF

references

statistics

Every year or so, Elsevier asks me to nominate five

*International Journal of Forecasting*papers from the last two years to highlight in their marketing materials as “Editor’s Choice”. I try to select papers across a broad range of subjects, and I take into account citations and downloads as well as my own impression of the paper. That tends to bias my selection a little towards older papers as they have had more time to accumulate citations. Here are the papers I chose this morning (in the order they appeared):### Computational Actuarial Science with R

demography

forecasting

R

reproducible research

statistics

I recently co-authored a chapter on “Prospective Life Tables” for this book, edited by Arthur Charpentier. R code to reproduce the figures and to complete the exercises for our chapter is now available on github. Code for the other chapters should also be available soon. The book can be pre-ordered on Amazon.

### Monash Econometrics in the top 10

forecasting

Monash University

statistics

Dave Giles pointed out on his blog yesterday that my department is currently ranked in the top 10 in the world for econometrics, according to IDEAS. We are also ranked 13th in the world in forecasting. Since IDEAS only covers the economics literature, the forecasting rank does not take account of our work in other areas such as demographic forecasting, and electricity demand forecasting.

### Automatic time series forecasting in Granada

computing

conferences

forecasting

R

references

seminars

statistics

time series

In two weeks I am presenting a workshop at the University of Granada (Spain) on

*Automatic Time Series Forecasting*.### Free books on statistical learning

otexts

R

references

statistics

teaching

Hastie, Tibshirani and Friedman’s

*Elements of Statistical Learning*first appeared in 2001 and is already a classic. It is my go-to book when I need a quick refresher on a machine learning algorithm. I like it because it is written using the language and perspective of statistics, and provides a very useful entry point into the literature of machine learning which has its own terminology for statistical concepts. A free downloadable pdf version is available on the website.### Online collaborative writing

LaTeX

productivity

writing

Everyone who has written a paper with another author will know it can be tricky making sure you don’t end up with two versions that need to be merged. The good news is that the days of sending updated drafts by email backwards and forwards is finally over (having lasted all of 25 years – I can barely imagine writing papers before email).

### Looking for a new post-doc

computing

forecasting

jobs

Monash University

R

research team

statistics

We are looking for a new post-doctoral research fellow to work on the project “Macroeconomic Forecasting in a Big Data World”. Details are given at the link below

### Judgmental forecasting experiment

forecasting

R

statistics

The Centre for Forecasting at Lancaster University is conducting some research on judgmental forecasting and model selection. They hope to compare the performance of judgmental model selection with statistical model selection, in order to learn how to best design forecasting support systems. They would like forecasting students, practitioners and researchers to participate, and are offering £50 Amazon Gift Cards as prizes. Here is a brief description from Fotios Petropoulos:

### How to get your paper rejected quickly

forecasting

IJF

journals

refereeing

I sent this rejection letter this morning about a paper submitted to the International Journal of Forecasting.

### Probabilistic Energy Forecasting

energy

forecasting

IJF

journals

kaggle

statistics

The

*International Journal of Forecasting*is calling for papers on probabilistic energy forecasting. Here are the details (taken from Tao Hong’s blog).### MAXIMA research centre at Monash Uni

mathematics

maxima

Monash University

statistics

The “Monash Academy for Cross and Interdisciplinary Mathematical Applications” (MAXIMA) is a new research centre that aims to maximise the potential of mathematics to deliver impact to society. It will be led by Kate Smith-Miles. I will also be involved along with several other mathematicians at Monash. Our mission at MAXIMA is to find solutions to 21st century problems by dismantling mathematical barriers.

### Reflections on UseR! 2013

computing

conferences

forecasting

graphics

R

reproducible research

statistics

This week I’ve been at the R Users conference in Albacete, Spain. These conferences are a little unusual in that they are not really about research, unlike most conferences I attend. They provide a place for people to discuss and exchange ideas on how R can be used.

### Facts and fallacies of the AIC

forecasting

R

statistics

Akaike’s Information Criterion (AIC) is a very useful model selection tool, but it is not as well understood as it should be. I frequently read papers, or hear talks, which demonstrate misunderstandings or misuse of this important tool. The following points should clarify some aspects of the AIC, and hopefully reduce its misuse.

### Forecasting annual totals from monthly data

computing

consulting

forecasting

R

StackExchange

statistics

This question was posed on crossvalidated.com:

### Establishing priority

journals

phd

progress

refereeing

references

supervision

writing

The nature of research is that other people are probably working on similar ideas to you, and it is possible that someone will beat you to publishing them.

### The difference between prediction intervals and confidence intervals

forecasting

statistics

Prediction intervals and confidence intervals are not the same thing. Unfortunately the terms are often confused, and I am often frequently correcting the error in students’ papers and articles I am reviewing or editing.

### ETS models now in EViews 8

computing

forecasting

R

statistics

The ETS modelling framework developed in my 2002 IJF paper (with Koehler, Snyder and Grose), and in my 2008 Springer book (with Koehler, Ord and Snyder), is now available in EViews 8. I had no idea they were even working on it, so it was quite a surprise to be told that EViews now includes ETS models.

### Removing white space around R figures

beamer

computing

graphics

LaTeX

R

seminars

When I want to insert figures generated in R into a LaTeX document, it looks better if I first remove the white space around the figure. Unfortunately, R does not make this easy as the graphs are generated to look good on a screen, not in a document.

### Out-of-sample one-step forecasts

computing

forecasting

R

statistics

It is common to fit a model using training data, and then to evaluate its performance on a test data set. When the data are time series, it is useful to compute one-step forecasts on the test data. For some reason, this is much more commonly done by people trained in machine learning rather than statistics.

### Batch forecasting in R

computing

forecasting

R

statistics

I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:

### Man vs Wild Data

seminars

statistics

teaching

ysc2013

I’m speaking on this topic at the Young Statisticians Conference, 7-8 February 2013.

### New in forecast 4.0

computing

forecasting

R

statistics

A few days ago I released version 4.0 of the forecast package for R. There were quite a few changes and new features, so I thought it deserved a new version number. I keep a list of changes in the Changelog for the package, but I doubt that many people look at it. So for the record, here are the most important changes to the forecast package made since v3.0 was released.

### Makefiles for R/LaTeX projects

computing

LaTeX

organization

productivity

R

reproducible research

**Updated:**21 November 2012

### LaTeX loops

computing

graphics

LaTeX

productivity

Today I was writing a report which included 20 figures, with the names

`demandplot1.pdf`

, `demandplot2.pdf`

, …, `demandplot20.pdf`

, and all with similar captions. Clearly a loop was required. After all, LaTeX is a programming language, so we should be able to take advantage of its capabilities.
### The Young Stats Communication Challenge

graphics

statistics

video

ysc2013

The Australian Young Statisticians Conference (Feb 2013) is organizing a communication competition. They invite all early-career statisticians (studying, or within 5 years of graduation) to produce a short (3-5 minute) video for the ABS YSC2013 Video Competition, or a static infographic for the ABS YSC2013 Infographic Competition.

### Forecasting research grants

forecasting

The International Institute of Forecasters and SAS have an annual research grant scheme that has been offered for the past ten years. The amounts offered are small (a total of $10K per year, usually split between 2 or 3 projects), but it might be useful for young researchers wanting a bit of funding to help with their forecasting research. The deadline for 2012 has just been extended to 26 October, which is a sure sign that they don’t have enough applications yet.

### Why are some things easier to forecast than others?

forecasting

R

statistics

Forecasters are often met with skepticism. Almost every time I tell someone that I work in forecasting, they say something about forecasting the stock market, or forecasting the weather, usually suggesting that such forecasts are hopelessly inaccurate. In fact, forecasts of the weather are amazingly accurate given the complexity of the system, while anyone claiming to forecast the stock market deserves skepticism. So what is the difference between these two types of forecasts, and can we say anything about what can be reasonably be forecast and what can’t?

### COMPSTAT2012

computing

R

reproducible research

statistics

This week I’m in Cyprus attending the COMPSTAT2012 conference. There’s been the usual interesting collection of talks, and interactions with other researchers. But I was struck by two side comments in talks this morning that I’d like to mention.

### Blogs about research

journals

organization

productivity

progress

references

research team

statistics

supervision

If you find this blog helpful (or even if you don’t but you’re interested in blogs on research issues and tools), there are a few other blogs about doing research that you might find useful. Here are a few that I read.

### Read the literature

journals

refereeing

references

I’ve just finished another reviewer report for a journal, and yet again I’ve had to make comments about reading the literature. It’s not difficult. Before you write a paper, read what other people have done. A simple search on Google scholar will usually do the trick. And before you submit a paper, check again that you haven’t missed anything important.

### Put your pre-prints online

journals

writing

I have argued previously that research papers should be posted online at the same time as they are submitted to a journal. Sometimes people claim that journals don’t allow it, which is nonsense. Almost every journal allows it, and many also allow the published version of a paper to appear on your personal website.

### Bare bones beamer

beamer

LaTeX

seminars

Beamer is far and away the most popular software for presentations amongst researchers in mathematics and statistics. Most conference and seminar talks I attend these days use beamer. Unfortunately, they all look much the same. I think people find beamer themes too hard to modify easily, so a small number of templates get shared around. Even the otherwise wonderful LaTeX Templates site has no beamer examples.

### My new forecasting textbook

forecasting

fpp

otexts

R

references

writing

After years of saying that I was going to write a book to replace Makridakis, Wheelwright and Hyndman (1998), I’m finally ready to make an announcement!

### Blog aggregators

journals

LaTeX

productivity

R

statistics

A very useful way of keeping up with blogs in a particular area is to subscribe to a blog aggregator. These will syndicate posts from a large number of blogs and provide links back to the original sources. So you only need to subscribe once to get all the good stuff in that area.

### Seeking help

productivity

StackExchange

Every day I receive emails, or comments on this blog, asking for help with R, forecasting, LaTeX, possible research topics, how to install software, or some other thing I’m supposed to know something about. Unfortunately, I cannot provide a one-man help service to the rest of the world. I used to reply to all the requests explaining where to go for help, but I stopped replying a while ago as it took too much time to do even that.

### Measuring time series characteristics

forecasting

R

statistics

time series

A few years ago, I was working on a project where we measured various characteristics of a time series and used the information to determine what forecasting method to apply or how to cluster the time series into meaningful groups. The two main papers to come out of that project were:

### Data visualization

graphics

R

statistics

For those who have not read the seminal works of Tufte and Cleveland, please hang your heads in shame. To salvage some sense of self-worth, you can then head over to Solomon Messing’s blog where he is starting a series on data visualization based on the principles developed by Tufte and Cleveland (with R examples).

### Exponential smoothing and regressors

forecasting

R

I have thought quite a lot about including regressors (i.e. covariates) in exponential smoothing (ETS) models, and I have done it a couple of times in my published work. See my 2008 exponential smoothing book(chapter 9) and my 2008

*Tourism Management*paper. However, there are some theoretical issues with these approaches, which have come to light through the research of Ahmad Farid Osman, one of our PhD students at Monash University. Basically, they are never forecastable in the sense explained in Section 10.2 my 2008 book (forecastability is the ETS equivalent of invertibility in ARIMA models).### Mailing lists

computing

Staying in touch with other researchers is important. You need to know about up-coming conferences, seminars, jobs, etc. To this end, it is worth finding a few key email lists to join. A long list of lists in econometrics and statistics is provided by EconometricLinks. Browse through it to find topics of interest. No doubt researchers in other disciplines have their own lists, but I don’t know about them.

### Table design

tables

writing

Almost every research paper and thesis in statistics contains at least some tables, yet students are rarely taught how to make good tables. While the principles of good graphics are slowly becoming part of a statistical education (although not an econometrics education!), the principles of good tables are often ignored. Perhaps people think they are obvious, although the results I see in papers and theses suggest otherwise.

### Following authors on Google Scholar

computing

journals

organization

productivity

references

A great new feature has been added to Google Scholar Citations. For those authors who have set up a citations page, it is now possible to get email alerts for any new articles they publish, or for any new citations of their articles. So you can track citations to your own work this way, and stay up-to-date with key authors in your field.

### Organizing travel

organization

productivity

technology

Whether travelling to a seminar or conference, or just having a holiday, using a travel organizer can make the process simpler and easier. A good travel organizer keeps all your travel details (flights, hotels, car rentals, meetings, weather forecasts, etc.) organized and synced to whatever devices you use (two computers, an iPad and an iPhone in my case).

### The art of R programming

computing

R

references

This is a gem of a book. It will become the book I give PhD students when they are learning how to write good R code. That is, if I ever see it again. I had hoped to write a review of it, but I haven’t seen it since it arrived in the mail a couple of weeks ago because a research student or research assistant has always had it on loan. I guess that’s a testament to how useful it is.

### Researcher portals

computing

journals

organization

productivity

references

A researcher portal is a website that attempts to list all the publications of a given researcher. Some portals also allow sharing papers, interacting with other researchers, calculating citation statistics, etc. Every researcher wants their work read and cited, so these websites can be useful tools for getting your work noticed. They can also function as a de facto home page if you don’t already have a personal website. Conversely, they can be a good way to find new work by researchers in your field. However, unless a site provides a relatively complete list of your publications, and covers a large proportion of the research community in your discipline, it is of limited value.

### Help for forecasting practitioners

forecasting

I often get email from forecasters wanting assistance. As much as I’d like to provide a free forecasting advice service to the world, that’s not what I’m paid to do, and I choose to spend my unpaid time on other things. However, there are some very helpful resources available for forecasting practitioners.

### The scourge of the academic publishers

journals

Academic publishing is built on an old model where publishers were needed to print and distribute journals to libraries. Under this system, it makes sense that the journals are distributed by publishing companies who charge fees for their work. On the other hand, the academics who write for the journals, the peer reviewers and (almost all) editors, have always contributed their time and expertise without cost. Essentially, they are being paid by universities and other research organizations to do this work. While we had print-based distribution, this model largely worked.

### Time series cross-validation: an R example

forecasting

R

time series

I was recently asked how to implement time series cross-validation in R. Time series people would normally call this “forecast evaluation with a rolling origin” or something similar, but it is the natural and obvious analogue to leave-one-out cross-validation for cross-sectional data, so I prefer to call it “time series cross-validation”.

### Major changes to the forecast package

forecasting

R

The forecast package for R has undergone a major upgrade, and I’ve given it version number 3 as a result. Some of these changes were suggestions from the forecasting workshop I ran in Switzerland a couple of months ago, and some have been on the drawing board for a long time. Here are the main changes in version 3, plus a few earlier additions that I thought deserved a mention.

### Crowd sourcing forecasts

forecasting

kaggle

Forecasting Ace is looking for participants to develop improved methods for predicting future events and outcomes. Their goal is to develop methods for aggregating many individual judgments in a manner that yields more accurate predictions than any one person or small group alone could provide. Potential applications of the system include forecasting economic conditions, political changes, technological development and medical breakthroughs.

### Learn Machine Learning at Stanford for free

computing

statistics

Andrew Ng’s machine learning course at Stanford is being offered free to anyone online in the (northern) fall of 2011. I’ve seen some of the notes from this course and it looks to be an excellent broad introduction to machine learning and data mining. For example, support vector machines, neural networks, kernels, clustering, dimension reduction, etc.

### Recommended survey papers

journals

references

Survey articles are particularly helpful in getting a foothold in a new research area, or in looking for important papers that you may have overlooked. Whatever area of research you are in, look out for survey papers and journals dedicated to publishing survey papers.

### Social networking for researchers

computing

StackExchange

technology

It would be nice to have a place to share ideas, links, comments in a very informal way with others involved in research in statistical methodology and data science. CrossValidated.com is great for specific questions, but is not suitable for commenting on papers or sharing ideas and links.

### I’m switching to TeXstudio

computing

LaTeX

writing

I’ve happily used WinEdt for all my LaTeX editing for about 15 years and I’ve encouraged my whole research team to use it. But I’m tired of problems with WinEdt that take up my time. I regularly have requests for help from one of my research team because something in WinEdt is not working properly — such as pdf synchronization problems, or it is using an old version of MikTeX that no longer updates, or that it has switched to using another pdf viewer without warning. These aren’t that hard to fix, but they shouldn’t happen. When a coauthor at another university has a request for help, it is much more difficult. If a new person joins our research team, there is always a hassle getting WinEdt configured for their use. Jeromy Anglim has a nice post on configuring WinEdt 6.0, but it should work nicely without needing this sort of configuration.

### Ten rules for data analysis

statistics

Peter Kennedy was an associate editor of the

*International Journal of Forecasting*and a superb applied econometrician. He died unexpectedly in August 2010. He was best known for his excellent book*A Guide to Econometrics*as well as his “Ten Commandments of Applied Econometrics”. He provided a variation on his ten commandments in advice to his students in the form of the following ten rules:### Statistical tests for variable selection

forecasting

R

statistics

I received an email today with the following comment:

### RStudio: just what I’ve been looking for

computing

R

For many years I used RWinEdt as my text editor for R code, but when WinEdt 6.0 came out, RWinEdt stopped working. So I’ve been looking for something to replace it. I’ve tried Tinn-R, NppToR, Eclipse with StatET and a couple of other editors, but nothing was quite right.

### Authorship ethics

journals

writing

With the constant pressure on academics to publish research papers, there is a temptation for research groups to include “coauthors” who have not really made any contribution to the manuscript. This seems more prevalent in some fields (e.g., the health sciences) than others.

### In praise of Dropbox

computing

organization

productivity

technology

Every couple of years, a new technology has a big impact on how I work. Gmail was one. My iPhone was another. And I rank Dropbox in the same category.

### CrossValidated Journal Club

journals

R

research team

StackExchange

Journal Clubs are a great way to learn new research ideas and to keep up with the literature. The idea is that a group of people get together every week or so to discuss a paper of joint interest. This can happen within your own research group or department, or virtually online.

### Hamming on research

productivity

progress

Richard Hamming was an excellent mathematician who worked at the interface of mathematics and computer science. In 1986 he gave a wonderful talk entitled

*You and Your Research*. Derek Smith on the AMS Graduate Student blog reminded me of it today. If you haven’t read it previously, stop work immediately and**read it now**.### Forecasting workshop: Switzerland, June 2011

forecasting

R

seminars

I will be running a workshop on

**in Switzerland, 20-22 June 2011. Check out the venue: Waldhotel Doldenhorn, Kandersteg! So if you fancy a trip to the beautiful Swiss Alps next June, read on…***Statistical Forecasting: Principles and Practice*### Data visualization videos

graphics

R

seminars

statistics

video

Probably everyone has seen Hans Rosling’s famous TED talk by now. If not, here it is:

### Initializing the Holt-Winters method

computing

forecasting

R

The Holt-Winters method is a popular and effective approach to forecasting seasonal time series. But different implementations will give different forecasts, depending on how the method is initialized and how the smoothing parameters are selected. In this post I will discuss various initialization methods.

### A LaTeX template for a CV

jobs

LaTeX

Every researcher needs a Curriculum Vitae (Latin for “course of life”) or CV. You will need it for job applications, for annual performance appraisal, and just for keeping track of your publications. A CV typically contains lists of achievements including qualifications, publications, presentations, awards, plus teaching experience.

### CrossValidated launched!

computing

R

StackExchange

statistics

**The CrossValidated Q&A site is now out of beta and the new design and site name is live.**

### How to avoid annoying a referee

journals

R

refereeing

references

reproducible research

StackExchange

statistics

writing

It’s not a good idea to annoy the referees of your paper. They make recommendations to the editor about your work and it is best to keep them happy. There is an interesting discussion on stats.stackexchange.com on this subject. This inspired my own list below.

### Happy World Statistics Day!

R

StackExchange

statistics

The United Nations has declared today “World Statistics Day”. I’ve no idea what that means, or why we need a WSD. Perhaps it is because the date is 20.10.2010 (except in North America where it is 10.20.2010). But then, what happens from 2013 to 2099? And do we just forget the whole idea after 3112?

### Always listen to reviewers

journals

refereeing

This week I was asked to review a paper that I had seen before. It had been submitted to a journal a few months ago and I had written a detailed report describing some problems with the paper, and noting a large number of typos that needed fixing. That journal had rejected the paper, the authors had submitted it to a second journal, and the paper ended up on my desk again for review. I was interested to see what the authors had done about the problems I had described. Alas, nothing had changed. Not even the typos. It was identical to the previous version with every error still there. So I sent the same report off to the second journal advising the editor of the situation.

### Joining an editorial board

IJF

journals

refereeing

Being on the editorial board of a journal is a lot of work. I’m currently Editor-in-Chief of the

*International Journal of Forecasting*and previously I’ve been Theory & Methods Editor of the*Australian & New Zealand Journal of Statistics.*Although it is time-consuming and often goes un-noticed, there are some important rewards that make it worthwhile in my opinion.### The ARIMAX model muddle

forecasting

R

statistics

There is often confusion about how to include covariates in ARIMA models, and the presentation of the subject in various textbooks and in R help files has not helped the confusion. So I thought I’d give my take on the issue. To keep it simple, I will only describe non-seasonal ARIMA models although the ideas are easily extended to include seasonal terms. I will include only one covariate in the models although it is easy to extend the results to multiple covariates. And, to start with, I will assume the data are stationary, so we only consider ARMA models.

### Why every statistician should know about cross-validation

forecasting

StackExchange

statistics

Surprisingly, many statisticians see cross-validation as something data miners do, but not a core statistical technique. I thought it might be helpful to summarize the role of cross-validation in statistics, especially as it is proposed that the Q&A site at stats.stackexchange.com should be renamed CrossValidated.com.

### That syncing feeling

computing

productivity

Like many people, I use more than one computer and I like to have all my files, bookmarks and other settings synchronized across my computers. Fortunately, that is getting easier as more tools are made available for keeping computers synchronized. So I thought it might be timely to review how to keep computers “synced”.

### Forecasting with long seasonal periods

forecasting

R

I am often asked how to fit an ARIMA or ETS model with data having a long seasonal period such as 365 for daily data or 48 for half-hourly data. Generally, seasonal versions of ARIMA and ETS models are designed for shorter periods such as 12 for monthly data or 4 for quarterly data.

### Tourism forecasting competition results: part one

forecasting

kaggle

The first stage of the tourism forecasting competition on kaggle has finished. This stage involved forecasting 518 annual time series. Twenty one teams beat our Theta method benchmark which is a great result, and well beyond our expectations. Congratulations to Lee Baker for winning stage one.

### How to fail a PhD

phd

productivity

progress

supervision

I read an interesting post today by Matt Might on “10 reasons PhD students fail”, and I thought it might be helpful to reflect on some of the barriers to PhD completion that I’ve seen. Matt’s ideas are not all relevant to Australian PhDs, so I have come up with my own list below. Here are the seven steps to failure.

### Benchmarks for forecasting

forecasting

IJF

Every week I reject papers submitted to the

*International Journal of Forecasting*because they present new methods without ever attempting to demonstrate that the new methods are better than existing methods. It is a policy of the journal that every new method must be compared to standard benchmarks and existing methods before the paper will even be considered for publication.### The tourism forecasting competition

forecasting

IJF

kaggle

Recently I wrote a paper entitled “The tourism forecasting competition” in which we (i.e., George Athanasopoulos, Haiyan Song, Doris Wu and I) compared various forecasting methods on a relatively large set of tourism-related time series. The paper has been accepted for publication in the

*International Journal of Forecasting*. (When I submit a paper to the*IJF*it is always handled by another editor. In this case, Mike Clements handled the paper and it went through several revisions before it was finally accepted. Just to show the process is unbiased, I have had a paper rejected by the journal during the period I have been Editor-in-Chief.)### Twenty rules for good graphics

graphics

journals

R

One of the things I repeatedly include in referee reports, and in my responses to authors who have submitted papers to the

*International Journal of Forecasting*, are comments designed to include the quality of the graphics. Recently someone asked on stats.stackexchange.com aboutbest practices for producing plots. So I thought it might be helpful to collate some of the answers given there and add a few comments of my own taken from things I’ve written for authors.### More StackExchange sites

computing

LaTeX

R

StackExchange

writing

The StackExchange site on Statistical Analysis is about to go into private beta testing. This is your last chance to commit if you want to be part of the private beta testing. Don’t worry if you miss out — it will only be a week before it is then open to the public.

### The falling standard of English in research

journals

writing

It seems that most journals no longer do any serious copy-editing, and the standard of English is falling. Today I was reading an article from the

*European Journal of Operational Research*, which is supposedly a good OR journal (current impact factor over 2). Take this for an example from the first page of this paper:### Academic citations in the popular press

forecasting

IJF

references

It is very unusual for a newspaper article to cite an academic paper, unless it is in

*Nature*,*Science*or the*Lancet*. Mostly, what we write is too technical and assumes too much background knowledge for it to be accessible to anyone but specialists. So I was pleasantly surprised to find a reference to the*International Journal of Forecasting*in a recent*Wall Street Journal*article. It is a citation of a 1996 article, so in terms of scientific research it is a bit like quoting the*Magna Carta*, but a citation nevertheless.### Update on a StackExchange site for statistical analysis

computing

StackExchange

statistics

About six weeks ago, I proposed that there should be a Stack Exchange site for questions on data analysis, statistics, data mining, machine learning, etc. I can finally report that there has been substantial progress on this.

### Google scholar alerts

computing

journals

references

A couple of weeks ago, Google scholar added a facility to provide email alerts on new articles associated with specific search queries. First do the search, then click the envelope at top left of screen. For example, here is a search on “exponential smoothing” since 2000.

### Online mathematical resources

computing

mathematics

For nearly 50 years, a standard reference in mathematical work has been Abramowitz and Stegun’s (1964)

*Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables*. It has provided a marvellous collection of results and tables that have been indispensable for a generation of mathematicians. I’ve used it to look up computationally efficient methods for calculating Bessel functions or gamma functions, or to find one of those trigonometric identities I learned in high school and no longer remember. Apparently nearly 1 million copies of the handbook have been printed and it has also been scanned and put online.### A StackExchange site for statistical analysis?

computing

StackExchange

statistics

Regular readers of this site will know I’m a fan of using Stack Overflow for questions about LaTeX, R and other areas of programming. Now the people who produce Stack Overflow are planning on setting up several new sites for asking questions about other topics, and are seeking proposals. I have proposed that there should be a site for questions on data analysis, statistics, data mining, machine learning, etc.

### Making a poster in beamer

beamer

LaTeX

This week, I made my first poster. Although I’ve been an academic for more than 20 years, I’ve never had to make a poster before. Some of my coauthors have made posters about our joint research, and two of them have even won prizes (although I can’t take any credit for them). But this week, our department is displaying posters from all research staff about our recent work.

### My standard LaTeX preamble

LaTeX

When I was a PhD student, I found I needed a lot of LaTeX functionality that did not then exist. So I wrote my own package which has served me well for about 20 years. It is called HyTeX.sty (the name being a shameless take-off of LaTeX from Leslie Lamport as well as a homonym of High-Tech). The advantage of having my own package is that almost every file starts with

### Writing a referee report

IJF

journals

refereeing

As an editor, I like to see referee reports comprising three sections:

### Replications and reproducible research

reproducible research

One of the best ways to get started with research in a new area is to try to replicate some existing research. In doing so, you will usually gain a much better understanding of the topic, and you will often discover some problems with the research, or develop ideas that will lead to a new research paper.

### Controlling figure and table placement in LaTeX

LaTeX

It can be frustrating trying to get your figures and tables to appear where you want them in a LaTeX document. Sometimes, they just seem to float off onto another page of their own accord. Here is a collection of tools and ideas that help you get control of those pesky floats.

### “Elements of Statistical Learning” now online

references

In the past couple of days, the authors of several blogs have noted that the wonderful book The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani and Friedman (2nd ed., 2009) is now available for

**free download in pdf format**.### Attending research seminars

seminars

Most research students don’t seem to attend seminars. When asked, they usually say the seminars are not on their topic, or they don’t understand them, or they find them boring, or some other similar reason. I think this is because students don’t understand the purpose of research seminars, and have not learned how to listen to them.

### Squeezing space with LaTeX

LaTeX

writing

I’ve been writing a grant application with a 10-page limit, and as usual it is difficult to squeeze everything in. No, I can’t just change the font as it has to be 12 point with at least 2 cm margins on an A4 page. Fortunately, LaTeX is packed full of powerful features that help in squeezing it all in. Here are some of the tips I’ve used over the years.

### Converting eps to pdf

computing

LaTeX

Simply include the package

`epstopdf`

. Then when you use pdflatex, the eps files will be automatically converted to pdf at compile time. (The conversion only happens the first time you process the file, and is skipped if there is already a pdf file with the same name.)
### The 7 secrets of highly successful PhD students

organization

phd

productivity

progress

seminars

It seems everyone has 7 secrets to success, and now someone has hopped on the 7-secrets bandwagon with something for PhD students. Thinkwell is an Australian company offering a seminar and associated work book on “The 7 secrets of highly successful PhD students”. I bought the book out of curiosity, but “book” is a gross exaggeration – only eleven pages of fairly simplistic advice. I hope the seminar has more substance. For what it’s worth, here are the so-called seven secrets.

### Writing an abstract

writing

The abstract is probably the most important part of a paper. Many readers will not read anything else, so you need to grab their attention and get your main message across as clearly and succinctly as possible. It is not meant to be an introduction to the paper, but a summary of the paper. In a single paragraph, a reader can learn the purpose of the research, your general approach to the problem, your main results, and the most important conclusions. Write as if you have one minute to explain the paper to an interested colleague, assuming that she will not read the paper herself.

### Workflow in R

productivity

R

This came up recently on StackOverflow. One of the answers was particularly helpful and I thought it might be worth mentioning here. The idea presented there is to break the code into four files, all stored in your project directory. These four files are to be processed in the following order.

### Statistics education journals

journals

teaching

In many research universities, there can be a tension that arises when great teachers don’t publish much. I believe there is a place for excellent teachers who do limited research within a strong research university, but their contribution is considerably enhanced if they share their teaching insights. There are at least three reputable research journals for publishing articles on statistics education:

### Mathematical research and the internet

computing

technology

On Monday night I attended a lecture by Terry Tao on “Mathematical research and the internet”. Terry is Australia’s most famous mathematician, our only Field’s medalist, and one of the most active mathematical bloggers in the world. He has been described as the “Mozart of mathematics” for his remarkable precocity and prolific output. The slides of his talk are available on his blog site.

### How good are economic forecasts?

forecasting

I wrote last week that “macroeconomic forecasts are little better than shooting blindfold”. I don’t know if it was connected or not, but on the same day a journalist (Richard Pullin) from Reuters phoned me to ask about assessing some economic forecasts. He wanted to compare the accuracy of several economic forecasts for Japan and he wasn’t sure how to go about it. I helped him to calculate the MASE for the different forecasts and the results have now been published.

### Research supervision workshop

research team

supervision

Today I gave a workshop for supervisors of postgraduate students. Mostly I talked about creating a team environment for postgraduate students rather than the traditional model (at least in statistics and econometrics) of each student working in isolation.

### Seek help when it’s needed

welfare

I don’t think I’ve had a research student who did not think about giving up at some point. It was part through my second year when I felt like giving up. I felt I was not going to be able to finish my thesis, and that I would be better off throwing in the towel and doing something else. Fortunately, I couldn’t think of anything better to do, plus I hate giving up on anything, so I persevered and it turned out ok. I was also fortunate to have a very supportive wife and a great associate supervisor in Gary Grunwald who kept me going.

### Why I don’t like statistical tests

forecasting

statistics

It may come as a shock to discover that a statistician does not like statistical tests. Isn’t that what statistics is all about? Unfortunately, in some disciplines statistical analysis does seem to consist almost entirely of hypothesis testing, and therein lies the problem.

### R help on StackOverflow

computing

R

Ever since I began using R about ten years ago, the best place to find R help was on the R-help mailing list. But it is time-consuming searching through the archives trying to find something from a long time ago, and there is no way to sort out the good advice from the bad advice.

### Backing up

computing

productivity

Ever since I deleted the only copy of my honours thesis, one week before it was due to be handed in, I’ve been obsessive about backups, often to the amusement of my family and colleagues. But every time one of them loses a file or has a hard-disk fail, the smiles fade and they ask for advice.

### Forecasting the recession

forecasting

Forecasters are under the pump with a recession that many didn’t see coming. As I don’t do any macroeconomic forecasting, I can sit back and smile smugly at some of my colleagues while I work on simpler problems such as forecasting in epidemiology, demography and energy demand.

### Maintaining local LaTeX files

computing

LaTeX

If you use LaTeX, then you probably have a bib file — a data base of all the papers and books that you have cited. It is much more efficient to keep one database in one location, than have multiple copies of it floating around your hard drive. (Or even worse, have different bib files created for different papers.) You might also have a few of your own style files, and again it is best to keep these in a central location and not have duplicates all over the place. So you need a central place to store these files where LaTeX will find them.

### Songs of Statistics

statistics

If you love statistics (don’t we all?) and can write Chinese (which rules me out), you might like to contribute to the Chinese National Bureau of Statistics celebrations of the 60th anniversary of the “founding of New China”. They are calling for submissions of prose, poetry or song which will “enhance people’s patriotic feelings, statistics and confidence”. Here is an English translation of the page.

### Writing responses to referee reports

journals

writing

I’ve been spending time writing response letters lately. I’ve also been reading lots of response letters from authors wanting their stuff published in the International Journal of Forecasting. I thought it might be useful to collate a few thoughts on the subject.

### Managing a bibliographic database

computing

LaTeX

references

All researchers need to maintain a database of papers they have read, cited, or simply noted for later reference. For those of us using LaTeX, the database is in the BibTeX format and is stored as a simple text file (a bib file) that can be edited using a text editor such as WinEdt.

### Why Word is a bad choice for academic writing

LaTeX

writing

For years I’ve been telling everyone who would listen that MS-Word may sometimes be useful for short notes or for making a “Back in 5 minutes” sign to stick on your door, but if you want to write a serious document like an academic paper, a book or a thesis, then you should use a serious tool such as LaTeX. For those who are not yet convinced, Ben Klemens has a nice article entitled “Why Word is a terrible program”. It’s well worth reading.

### Searching the research literature

journals

references

Most students seem to go to Google first. This is not a good strategy. Google Scholar is much better as it filters out all the junk. Scopus is another engine that aims to do a similar thing. It is better organized but not so complete. ISI WOK is also not as complete as Google Scholar but is particularly good at tracking citations.

### Clive Granger (1934-2009)

forecasting

obituary

Sir Clive Granger has died at the age of 74. There are some nice obituaries in the New York Times and the Daily Telegraph. Also, his Wikipedia page has some good information. I met Clive on several occasions and he was “a scholar and a gentleman”, a remarkably humble man given his outstanding achievements and someone who was always willing to help young researchers. The world of forecasting will miss him.

### Neil Postman on technological change

technology

Neil Postman was Professor of Communication at New York University until his death in 2003. He wrote many wonderfully insightful and thought-provoking articles and books about television, education, technology and childhood. I recently came across a speech he gave in 1998 on “Five things we need to know about technological change”. Here is an online transcript. The five things are:

### Time series packages on R

R

time series

There is now an official CRAN Task View for Time Series. This will replace my earlier list of time series packages for R, and provide a more visible and useful entry point for people wanting to use R for time series analysis. If I have missed anything on the list, please let me know.

### Tracking changes in LaTeX files

computing

LaTeX

When I write a paper, it usually goes through many versions before being submitted to a journal. I keep track of the different versions by renaming the file when I’m about to make major changes, or when I receive a new version from a coauthor. The files are named

`file1.tex`

, `file2.tex`

, etc. where “`file`

” is replaced by something more meaningful.
### Tracking changes in text files

computing

A common issue that arises with text files (e.g., R code) is to identify changes that have been made between versions. I usually number my R files as file1.R, file2.R, etc. (with “file” replaced by something more meaningful),with the number indicating the version of the file. Version numbers change whenever I send the file to someone else to modify, or whenever I make major changes myself.

### Supervision award

supervision

prizes

Last night I received the Vice-Chancellor’s postgraduate supervision award at a function at Government House. I am deeply honoured that my students thought to nominate me for the award. I think I was as surprised as anyone to win, and some people have asked me what I did to deserve it. Actually, I’m not sure that I did deserve it, but I can tell you what I told the award committee who chose me.

### Creating a BibTeX file from a Google Library

references

As you will have seen if you poke around these pages, I have a Google Library of books in statistics and forecasting. This is intended to be a complete copy of what is on the shelves in my office (about 400 books), plus books that I would like on my shelves if I had more space.

### Dodgy forecasting

forecasting

A few years ago I did some forecasting work for a commonwealth government department and found that they were forecasting a $5 billion budget using the FORECAST command in Excel. Worse, they were fitting a regression through only three observations and they were not even the most recent observations.

No matching items