Subject ▸ Computing

Finding distinct rows of a tibble

I’ve been using R or its predecessors for about 30 years, so I tend to I know a lot about R, but I don’t necessarily know how to use modern R tools. Lately, I’ve been teaching my students the tidyverse approach to data analysis, which means that I need to unlearn some old approaches and to re-learn them using new tools. But old dogs and new tricks… Yesterday, I was teaching a class where I needed to extract some rows of a data set.

Read More…

Converting to blogdown

This website has gone through several major updates over the years. It began in 1993 as some handcrafted html files, transitioned to Joomla and later to Wordpress. Then it slowly grew into a collection of ten connected Wordpress installations that became increasingly difficult to maintain, and rather slow. So I’ve now converted the entire site to Blogdown/Hugo. Nearly 700 pages of wordpress content have been translated to markdown. I decided to drop a few parts of the site, notably the pages for my 1998 forecasting textbook.

Read More…

Hadley Wickham Master R Developer course coming to Melbourne

Hadley Wickham’s popular R developer course is coming to Melbourne on 12-13 December 2016. Bookings can be made via Eventbrite.Hadley, of course, is the developer of the wonderful tidyverse set of R packages including ggplot2, dplyr, tidyr, readr, purrr, tibble, and many more. He is the author of several books including the new “R for Data Science”, he is the chief scientist at RStudio, and a fellow cocktail enthusiast. From the course blurb:

Read More…

Sample quantiles 20 years later

Almost exactly 20 years ago I wrote a paper with Yanan Fan on how sample quantiles are computed in statistical software. It was cited 43 times in the first 10 years, and 457 times in the next 10 years, making it my third paper to receive 500+ citations. So what happened in 2006 to suddenly increase the citations? I think it was a combination of things: I wrote a new quantile() function (with Ivan Frohne) which made it into R core v2.

Read More…

Starting a career in data science

I received this email from one of my undergraduate students: I’m writing to you asking for advice on how to start a career in Data Science. Other professions seem a bit more straight forward, in that accountants for example simply look for Internships and ways into companies from there. From my understanding, the nature of careers in data science seem to be on a project-to-project basis. I’m not sure how to get my foot stuck in the door.

Read More…

RStudio just keeps getting better

RStudio has been a life-changer for the way I work, and for how I teach data analysis. I still have a couple of minor frustrations with it, but they are slowly disappearing as RStudio adds features. I use dual monitors and I like to code on one monitor and have the console and plots on the other monitor. Otherwise I see too little context, and long lines get wrapped making the code harder to read.

Read More…

Who's downloading the forecast package?

The github page for the forecast package currently shows the following information Note the downloads figure: 264K/month. I know the package is popular, but that seems crazy. Also, the downloads figure on github only counts the downloads from the RStudio mirror, and ignores downloads from the other 125 mirrors around the world.Here are the top ten downloaded packages from the last month: library(cranlogs) cran_top_downloads(when='last-month') rank package count from to 1 zoo 308290 2015-11-09 2015-12-08 2 forecast 263797 2015-11-09 2015-12-08 3 Rcpp 260636 2015-11-09 2015-12-08 4 lmtest 258810 2015-11-09 2015-12-08 5 fpp 244989 2015-11-09 2015-12-08 6 expsmooth 244179 2015-11-09 2015-12-08 7 fma 243556 2015-11-09 2015-12-08 8 tseries 243172 2015-11-09 2015-12-08 9 stringi 199384 2015-11-09 2015-12-08 10 ggplot2 192072 2015-11-09 2015-12-08 OK, that is very weird.

Read More…

The hidden benefits of open-source software

I’ve been having discussions with colleagues and university administration about the best way for universities to manage home-grown software. The traditional business model for software is that we build software and sell it to everyone willing to pay. Very often, that leads to a software company spin-off that has little or nothing to do with the university that nurtured the development. Think MATLAB, S-Plus, Minitab, SAS and SPSS, all of which grew out of universities or research institutions.

Read More…

forecast package v6.2

It is a while since I last updated the CRAN version of the forecast package, so I uploaded the latest version (6.2) today. The github version remains the most up-to-date version and is already two commits ahead of the CRAN version. This update is mostly bug fixes and additional error traps. The full ChangeLog is listed below. Many unit tests added using testthat. Fixed bug in ets() when very short seasonal series were passed in a data frame.

Read More…

Upcoming talks in California

I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis. Optimal forecast reconciliation for big time series data Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of sale, retail outlet, product type, package size, and so on. As a result, there can be millions of individual time series to forecast at the most disaggregated level, plus additional series to forecast at higher levels of aggregation.

Read More…