Why God never received tenure

  1. He had only one major publication.
  2. It was in Hebrew.
  3. It had no references.
  4. It wasn’t pub­lished in a ref­er­eed journal.
  5. Some even doubt he wrote it by himself.
  6. It may be true that he cre­ated the world, but what has he done since then?
  7. The sci­en­tific com­mu­nity has had a hard time repli­cat­ing his results.
  8. He never applied to the ethics board for per­mis­sion to use human subjects.
  9. When one exper­i­ment went awry he tried to cover it by drown­ing his subjects.
  10. When sub­jects didn’t behave as pre­dicted, he deleted them from the sample.
  11. He rarely came to class, just told stu­dents to read the book.
  12. Some say he had his son teach the class.
  13. He expelled his first two stu­dents for learning.
  14. Although there were only 10 require­ments, most of his stu­dents failed his tests.
  15. His office hours were infre­quent and often held on lim­ited access moun­tain tops.
  16. No record of work­ing well with colleagues.

This list must have appeared on thou­sands of sites and I’ve not been able to track down the source. In fact, a search on the phrase yields over 43,000 results on Google. There are another 3,700 where it is titled “Why God never received a PhD”. If any­one knows the orig­i­nal source, please post a com­ment. (After all, this is a research site and we have to credit sources appropriately.)

  • Share/Bookmark

Tags: ,

How to fail a PhD

I read an inter­est­ing post today by Matt Might on “10 rea­sons PhD stu­dents fail”, and I thought it might be help­ful to reflect on some of the bar­ri­ers to PhD com­ple­tion that I’ve seen. Matt’s ideas are not all rel­e­vant to Aus­tralian PhDs, so I have come up with my own list below.  Here are the seven steps to failure.

1. Wait for your super­vi­sor to tell you what to do

A good super­vi­sor will not tell you what to do. PhD stu­dents are not meant to be research assis­tants, and a PhD is not an extended under­grad­u­ate assign­ment. So wait­ing to be told what to do next will usu­ally get you nowhere.

By the time you grad­u­ate with a PhD, you are sup­posed to be an inde­pen­dent researcher. That means hav­ing your own ideas, set­ting your own research direc­tions, and choos­ing what to do your­self. In prac­tice, your super­vi­sor will usu­ally need to tell you what to do for the first year, but even­tu­ally you need to set the research agenda your­self. By the third year you should cer­tainly know more about your topic than your super­vi­sor, and so are in a bet­ter posi­tion to know what to do next.

2. Wait for inspiration

Sit­ting around wait­ing for great ideas to pop into your ahead is unlikely to work. Most of my best ideas come after a lot of work try­ing dif­fer­ent things and becom­ing totally immersed in the problem.

A good way to start is often to try to repli­cate some­one else’s research, or apply someone’s method on a dif­fer­ent data set. In the process you might notice some­thing that doesn’t quite work, or you might think of a bet­ter way to do it. At the very least you will have a deeper under­stand­ing of what they have done than you will get by sim­ply read­ing their paper.

Research often involves dead-ends, wrong turns, and fail­ures. It’s a lit­tle like explor­ing a pre­vi­ously unmapped part of the world. You have no idea what you’ll find there, but unless you start wan­der­ing around you’ll never dis­cover anything.

3. Aim for perfection

Per­fec­tion takes for­ever, and so stu­dents who are aim­ing for per­fec­tion never fin­ish. Instead they spend years try­ing to make the the­sis that lit­tle bit bet­ter, pol­ish­ing every sen­tence until it gleams. Every researcher needs to accept that research involves mak­ing mis­takes, often pub­licly. That’s the nature of the activity.

Don’t wait until your paper or the­sis is per­fect. Work through a few drafts, and then stop, rec­og­niz­ing that there are prob­a­bly still some errors remaining.

4. Aim too high

Many stu­dents imag­ine they will write a the­sis that will rev­o­lu­tionise the field and lead to wide acclaim and a bril­liant aca­d­e­mic career. Occa­sion­ally that does hap­pen, but extremely rarely. A PhD is an appren­tice­ship in research, and like all appren­tice­ships, you are learn­ing the craft, mak­ing mis­takes, and you are unlikely to pro­duce your best work at such an early stage in your research career.

It really doesn’t mat­ter what your topic is pro­vided you find it inter­est­ing and that you find some­thing to say about it. Your PhD is a demon­stra­tion that you know how to do research, but your most impor­tant and high impact research will prob­a­bly come later.

My own PhD research was on sto­chas­tic non­lin­ear dif­fer­en­tial equa­tions and I haven’t touched them since. It showed I could do high level research, but I’d lost inter­est by the time I fin­ished and I’ve moved onto other things. Few peo­ple ever cite the research that came out of my PhD, but it served its purpose.

5. Aim too low

My rule-of-thumb for an Aus­tralian PhD is about three to four pieces of pub­lish­able work. They don’t have to actu­ally be pub­lished, but the exam­in­ers like to see enough mate­r­ial to make up three papers that would be accept­able in a rep­utable schol­arly jour­nal. Just writ­ing 200 pages is not enough if the mate­r­ial is not suf­fi­ciently orig­i­nal or inno­v­a­tive to be pub­lish­able in a jour­nal. Point­ing out errors in every­one else’s work is usu­ally not enough either, as most jour­nals will expect you to have some­thing to say your­self in addi­tion to what­ever cri­tiques you make of pre­vi­ous work.

6. Fol­low every side issue

Just because you use a max­i­mum like­li­hood method, doesn’t mean you have to read the entire like­li­hood lit­er­a­ture. Of course you will learn some­thing if you do, but that isn’t the point. The pur­pose of a PhD is not so that you can learn as much as you can about every­thing. A PhD is train­ing in research, and researchers need to be able to pub­lish their find­ings with­out hav­ing to be expert in every area that is some­how related to their cho­sen topic.

Of course, you do need to read as much of the rel­e­vant lit­er­a­ture as pos­si­ble. A key skill in research is learn­ing what is rel­e­vant and what is not. Ask your super­vi­sor if you are not sure.

7. Leave all the writ­ing to the end

In some fields it seems to be stan­dard prac­tice to have a “writ­ing up” phase after doing the research. Per­haps that works in exper­i­men­tal sci­ences, but it doesn’t work in the math­e­mat­i­cal sci­ences. You haven’t a hope of remem­ber­ing all the good ideas you had in first and sec­ond year if you don’t attempt to write them down until near the end of your third year.

I encour­age all my stu­dents to start writ­ing from the first week. In the first year, write a series of notes sum­ma­riz­ing what you’ve learned and what research ideas you’ve had. It can be help­ful to use these notes to show your super­vi­sor what you’ve been up to each time you meet. In the sec­ond year, you should have fig­ured out your spe­cific topic and have a rough idea of the table of con­tents. So start writ­ing the parts you can. You should be able to turn some of your first-year notes into sec­tions of the rel­e­vant chap­ters. By the third year you are fill­ing in the gaps, adding sim­u­la­tion results, tidy­ing up proofs, etc.


  • Share/Bookmark

Tags: , , ,

Econometrics and R

Econo­me­tri­cians seem to be rather slow to adopt new meth­ods and new tech­nol­ogy (com­pared to other areas of sta­tis­tics), but slowly the use of R is spread­ing. I’m now receiv­ing requests for ref­er­ences show­ing how to use R in econo­met­rics, and so I thought it might be help­ful to post a few sug­ges­tions here.

A use­ful on-line and free resource is “Econo­met­rics in R” by Grant Farnsworth. It cov­ers some com­mon econo­met­ric meth­ods includ­ing het­eroskedas­tic­ity in regres­sion, pro­bit and logit mod­els, tobit regres­sion, and quan­tile regres­sion. In the time series area, it cov­ers ARIMA, ARFIMA, ARCH and GARCH mod­els, as well as a few of the stan­dard tests for unit roots and auto­cor­re­la­tion. It’s brief but it does pro­vide code that will help peo­ple famil­iar with econo­met­rics to get started using R.
If you are pre­pared to pay, an excel­lent book is Kleiber and Zeilis’s Applied Econo­met­rics with R. It cov­ers sim­i­lar ground to Farnsworth but in more detail. This is the book I usu­ally rec­om­mend to any­one with an econo­met­rics back­ground who is want­ing to get started with R. It would also be very suit­able for some­one study­ing econo­met­rics at about upper under­grad­u­ate level. Achim Zeileis is a well-known expert in R pro­gram­ming, so you can be sure the code in this book is effi­cient and well-written.
Another use­ful book is Pfaff’s Analy­sis of Inte­grated and Coin­te­grated Time Series with R which cov­ers unit root tests, coin­te­gra­tion, VECM mod­els, etc.
Vinod’s Hands-On Inter­me­di­ate Econo­met­rics Using R con­tains a lot of exam­ples and code-snippets which can be very help­ful. Unfor­tu­nately, the exam­ples do not always show the best prac­tice in R coding.
More detailed case stud­ies using R are pro­vided in Advances in Social Sci­ence Research Using R, edited by H.D. Vinod. Many of the case stud­ies are from econo­met­rics includ­ing an excel­lent chap­ter by Bruce McCul­lough on econo­met­ric computing.

There are of course dozens of books on R with a more sta­tis­ti­cal per­spec­tive, includ­ing sev­eral on time series. But I will leave them for another post.

  • Share/Bookmark

Tags: ,

Job advertisements

Employ­ers often con­tact me ask­ing how to find a good sta­tis­ti­cian, econo­me­tri­cian or fore­caster for their orga­ni­za­tion. Stu­dents also ask me how to go about find­ing a job when they fin­ish their degree. This post is for both groups, hope­fully mak­ing it eas­ier for them to pair up appropriately.

First, the main­stream media out­lets are not usu­ally good places to adver­tise. It seems that few peo­ple read printed news­pa­pers anymore.

The gen­eral online job sites such as seek.com.au are ok, but job-seekers can find it hard to find the rel­e­vant open­ings because job titles are so var­ied. In the gen­eral area of sta­tis­tics, a job can appear under the titles “sta­tis­ti­cian”, “ana­lyst”, “data miner”, “data man­ager”, “finan­cial engi­neer” and a few dozen other labels. Many employ­ers don’t place the job in the best cat­e­gory, often because they don’t under­stand what skills are required to do the job. Nev­er­the­less, if I was look­ing for a job, I would cer­tainly set up some auto­mated searches on these sites.

In sta­tis­tics, there are well-established job web­sites that are the best places for both employ­ers and poten­tial employ­ees to meet up.

  • Aus­tralia & New Zealand: www.statsci.org/jobs. This is a fan­tas­tic ser­vice from the Sta­tis­ti­cal Soci­ety of Aus­tralia and includes a lot of jobs, par­tic­u­larly those requir­ing higher degrees.
  • United States:  amstat.org/jobweb. This is a sim­i­lar ser­vice from the Amer­i­can Sta­tis­ti­cal Asso­ci­a­tion for jobs in the USA.

Unfor­tu­nately, there is no sim­i­lar ser­vice in the UK, and I do not know what is pro­vided in other countries.

There is a list of econo­met­ric jobs sites at econo­met­ri­clinks.

There are e-mail lists that are widely sub­scribed and often con­tain job postings.

A lot more email lists are men­tioned on econo­met­ri­clinks, some of which may be appro­pri­ate for job advertisements.

If I’ve missed any good places to adver­tise jobs, please add them in the comments.

  • Share/Bookmark

Tags:

Benchmarks for forecasting

Every week I reject papers sub­mit­ted to the Inter­na­tional Jour­nal of Fore­cast­ing because they present new meth­ods with­out ever attempt­ing to demon­strate that the new meth­ods are bet­ter than exist­ing meth­ods. It is a pol­icy of the jour­nal that every new method must be com­pared to stan­dard bench­marks and exist­ing meth­ods before the paper will even be con­sid­ered for publication.

For uni­vari­ate time series meth­ods, it is not dif­fi­cult. As a min­i­mum, com­par­isons should be made against a naive method and a stan­dard method such as an ARIMA model.

  1. The naive method for non-seasonal data is based on a ran­dom walk — all fore­casts are equal to the last obser­va­tion. For sea­sonal data, the best naive method is to use the last obser­va­tion from the same sea­son. That is, for monthly data, fore­casts for Feb­ru­ary are all equal to the last Feb­ru­ary observation.
  2. Com­par­isons with ARIMA mod­els used to be prob­lem­atic because some authors did not have suf­fi­cient exper­tise to fit a good ARIMA model, and so com­par­isons were some­times made, for exam­ple, against a non-seasonal AR model when the data were obvi­ously sea­sonal. This should no longer be a prob­lem as there are now good auto­matic ARIMA algo­rithms such as auto.arima() in the fore­cast pack­age for R.

For mul­ti­vari­ate time series, the same uni­vari­ate bench­marks can be used.

For meth­ods involv­ing covari­ates, a stan­dard lin­ear regres­sion can often pro­vide a basic bench­mark. Authors some­times argue that lin­ear regres­sion is not appro­pri­ate for their data (e.g., because of non-linear rela­tion­ships or cor­re­la­tions), but that is not the point. I don’t care if the lin­ear regres­sion is appro­pri­ate — I just want them to be able to show that their method pro­vides bet­ter pre­dic­tions than a stan­dard and sim­ple bench­mark. If it can’t beat a sim­ple stan­dard regres­sion, espe­cially if it is inap­pro­pri­ate, there is not much point proceeding.

The best bench­marks are those that are already pub­lished. For exam­ple, new uni­vari­ate time series meth­ods can be com­pared with the M-competition or M3 com­pe­ti­tion data where there are already pub­lished eval­u­a­tions on large num­bers of obser­va­tions.  In this case, authors do not even have to imple­ment the bench­marks them­selves. All they have to do is use the same test sets and com­pare their MAPE or sMAPE val­ues with those pub­lished for other methods.

Just beat­ing the bench­marks is not, of itself, jus­ti­fi­ca­tion for pub­li­ca­tion, but it helps. It is also nec­es­sary to be able to describe your new method in enough detail and clar­ity that oth­ers could imple­ment it. It is usu­ally also nec­es­sary to show that the method works on more than one data set. It is rel­a­tively easy to find a method that out­per­forms the bench­marks on a sin­gle data set; but that is no rea­son to think it will be use­ful on other data sets. The M-competitions are use­ful as they pro­vide a large set of data for com­par­isons. If a method does well on 1001 or 3003 time series, then I know it is not a fluke.

Sim­i­larly, not being able to beat the bench­marks does not, of itself, mean the paper is dead. It may be that the new method is not far behind the bench­marks but has other advan­tages. Or the new method may be par­tic­u­larly good in some cir­cum­stances or for a small sub­set of problems.

The job of the author is to care­fully and per­sua­sively present the case for their pro­posed method. As an edi­tor, I am look­ing for authors to con­vince me of the value of their ideas. Papers propos­ing new fore­cast­ing meth­ods must include com­par­isons with stan­dard bench­marks, and should involve large scale empir­i­cal evaluations.

  • Share/Bookmark

Tags:

Transforming data with zeros

I’m cur­rently work­ing with a hydrol­o­gist and he raised a ques­tion that occurs quite fre­quently with real data — what do you do when the data look like they need a log trans­for­ma­tion, but there are zero values?

I asked the ques­tion on stats.stackexchange.com and received some use­ful sug­ges­tions. What fol­lows is a sum­mary based on these answers, my own expe­ri­ence, plus a few papers I dis­cov­ered that deal with the topic. In gen­eral, the most appro­pri­ate course of action depends on the model and the con­text. Zeros can arise for sev­eral dif­fer­ent rea­sons each of which may have to be treated differently.

Box-Cox (BC) transformations

There is a two-parameter ver­sion of the Box-Cox trans­for­ma­tion that allows a shift before transformation:

g(y;\lambda_{1}, \lambda_{2}) =<br />
\begin{cases}<br />
\frac {(y+\lambda_{2})^{\lambda_1} - 1} {\lambda_{1}} & \mbox{when } \lambda_{1} \neq 0 \\\ \log (y + \lambda_{2}) & \mbox{when } \lambda_{1} = 0<br />
\end{cases}.

The usual Box-Cox trans­for­ma­tion sets \lambda_2=0. One com­mon choice with the two-parameter ver­sion is \lambda_1=0 and \lambda_2=1 which has the neat prop­erty of map­ping zero to zero. There is even an R func­tion for this: log1p().  More gen­er­ally, both para­me­ters can be esti­mated. In R, the boxcox.fit() func­tion in pack­age geoR will fit the parameters.

Alter­na­tively, when \lambda_1=0, it has been sug­gested that \lambda_2 should be approx­i­mately one half of the small­est, non-zero value. Another sug­ges­tion is that \lambda_2 should be the square of the first quar­tile divided by the third quar­tile (Sta­hel,  2002).

I’ve used func­tions like this sev­eral times includ­ing in Hyn­d­man & Grun­wald (2000) where we used \log(y+\lambda_2) applied to daily rain­fall data.

One sim­ple spe­cial case is the square root where \lambda_2=0 and \lambda_1=0.5. This works fine with zeros (although not with neg­a­tive val­ues). How­ever, often the square root is not a strong enough trans­for­ma­tion to deal with the high lev­els of skew­ness seen in real data.

Inverse hyper­bolic sine (IHS) transformation

An alter­na­tive trans­for­ma­tion fam­ily was pro­posed by John­son (1949) and is defined by

f(y,\theta) = \text{sinh}^{-1}(\theta y)/\theta = \log(\theta y + (\theta^2y^2+1)^{1/2})/\theta,

where \theta>0. For any value of \theta, zero maps to zero. There is also a two para­me­ter ver­sion allow­ing a shift, just as with the two-parameter BC trans­for­ma­tion. Bur­bidge, Magee and Robb (1988) also dis­cuss the IHS trans­for­ma­tion includ­ing esti­ma­tion of \theta.

The IHS trans­for­ma­tion works with data defined on the whole real line includ­ing neg­a­tive val­ues and zeros. For large val­ues of y it behaves like a log trans­for­ma­tion, regard­less of the value of \theta (except 0). As \theta\rightarrow0, f(y,\theta)\rightarrow y.

Mixed mod­els

For con­tin­u­ous data, there can be a dis­crete spike at zero which can be asso­ci­ated with the sen­si­tiv­ity of the mea­sure­ments. For exam­ple in wind energy, wind below 2m/s is often recorded as zero and the dis­tri­b­u­tion of wind energy pro­duced is con­tin­u­ous with a spike at zero.

With rain­fall data, there is a spike at zero for a dif­fer­ent rea­son — it didn’t rain. These are gen­uine zeros (rather than inde­tectably small values).

With insur­ance data, a sim­i­lar phe­nom­e­non occurs — the dis­tri­b­u­tion of claims is con­tin­u­ous with a large spike at zero.

A fourth exam­ple might be income data — zero if some­one is not in paid work, but a con­tin­u­ous pos­i­tive value otherwise.

In each of these cases, a mix­ture model is prob­a­bly the most appro­pri­ate where part of the model deter­mines the prob­a­bil­ity of a zero, and the other part of the model deter­mines the dis­tri­b­u­tion of the data when it is pos­i­tive. We also used some­thing like this in Hyn­d­man and Grun­wald (2000).

  • Share/Bookmark

Tags:

The tourism forecasting competition

Recently I wrote a paper enti­tled “The tourism fore­cast­ing com­pe­ti­tion” in which we (i.e., George Athana­sopou­los, Haiyan Song, Doris Wu and I) com­pared var­i­ous fore­cast­ing meth­ods on a rel­a­tively large set of tourism-related time series. The paper has been accepted for pub­li­ca­tion in the Inter­na­tional Jour­nal of Fore­cast­ing. (When I sub­mit a paper to the IJF it is always han­dled by another edi­tor. In this case, Mike Clements han­dled the paper and it went through sev­eral revi­sions before it was finally accepted. Just to show the process is unbi­ased, I have had a paper rejected by the jour­nal dur­ing the period I have been Editor-in-Chief.)

We are now open­ing up the com­pe­ti­tion to any­one who thinks they can do bet­ter than the best meth­ods we imple­mented in the paper. Meth­ods will be eval­u­ated based on the small­est MASE (Mean Absolute Scaled Error) — see Hyn­d­man & Koehler (2006) for details of this statistic.

To make it inter­est­ing, there is a prize. The over­all win­ner will col­lect $AUD500 and will be invited to con­tribute a dis­cus­sion paper to the Inter­na­tional Jour­nal of Fore­cast­ing describ­ing their method­ol­ogy and giv­ing their results, pro­vided either the monthly MASE results are bet­ter than 1.38, the quar­terly results are bet­ter than 1.43 or the yearly results are bet­ter than 2.28. These thresh­olds are the best per­form­ing meth­ods in the analy­sis of these data described in Athana­sopou­los et al (2010).  In other words, the win­ner has to beat the best results in this paper for at least one of the three sets of series. It will also be nec­es­sary that the win­ner be able to describe their method clearly, in suf­fi­cient detail to enable repli­ca­tion and in a form suit­able for the Inter­na­tional Jour­nal of Fore­cast­ing. The paper would appear in the April 2011 issue of the IJF.

The com­pe­ti­tion is being hosted by the inno­v­a­tive folks at kaggle.com. Head over to kaggle.com/tourism1 to get the data and enter the competition.

The com­pe­ti­tion will be in two stages. Stage 1 involves only the annual data — 518 time series. You need to sub­mit fore­casts of the next four obser­va­tions for each series before 20 Sep­tem­ber 2010. Stage 2 will involve the monthly and quar­terly data and will begin after Stage 1 closes.

Good luck!

  • Share/Bookmark

Tags:

Twenty rules for good graphics

One of the things I repeat­edly include in ref­eree reports, and in my responses to authors who have sub­mit­ted papers to the Inter­na­tional Jour­nal of Fore­cast­ing, are com­ments designed to include the qual­ity of the graph­ics. Recently some­one asked on stats.stackexchange.com about best prac­tices for pro­duc­ing plots. So I thought it might be help­ful to col­late some of the answers given there and add a few com­ments of my own taken from things I’ve writ­ten for authors.

The fol­low­ing “rules” are in no par­tic­u­lar order.

  1. Use vec­tor graph­ics such as eps or pdf. These scale prop­erly and do not look fuzzy when enlarged. Do not use jpeg, bmp or png files as these will look fuzzy when enlarged, or if saved at very high res­o­lu­tions will be enor­mous files. Jpegs in par­tic­u­lar are designed for pho­tographs not sta­tis­ti­cal graphics.
  2. Use read­able fonts. For graph­ics I pre­fer sans-serif fonts such as Hel­vetica or Arial. Make sure the font size is read­able after the fig­ure is scaled to what­ever size it will be printed.
  3. Avoid clut­tered leg­ends. Where pos­si­ble, add labels directly to the ele­ments of the plot rather than use a leg­end at all. If this won’t work, then keep the leg­end from obscur­ing the plot­ted data, and make it small and neat.
  4. If you must use a leg­end, move it inside the plot, in a blank area.
  5. No dark shaded back­grounds. Excel always adds a nasty dark gray back­ground by default, and I’m always ask­ing authors to remove it. Graph­ics print much bet­ter with a white back­ground. The ggplot for R also uses a gray back­ground (although it is lighter than the Excel default). I don’t mind the ggplot ver­sion so much as it is used effec­tively with white grid lines. Nev­er­the­less, even the light gray back­ground doesn’t lend itself to printing/photocopying. White is better.
  6. Avoid dark, dom­i­nat­ing grid lines (such as those pro­duced in Excel by default). Grid lines can be use­ful, but they should be in the back­ground (light gray on white or white on light gray).
  7. Keep the axis lim­its sen­si­ble. You don’t have to include a zero (even if Excel wants you to). The defaults in R work well. The basic idea is to avoid lots of white space around the plot­ted data.
  8. Make sure the axes are scaled prop­erly. Another Excel prob­lem is that the hor­i­zon­tal axis is some­times treated cat­e­gor­i­cally instead of numer­i­cally. If you are plot­ting a con­tin­u­ous numer­i­cal vari­able, then the hor­i­zon­tal axis should be prop­erly scaled for the numer­i­cal variable.
  9. Do not for­get to spec­ify units.
  10. Tick inter­vals should be at nice round numbers.
  11. Axes should be prop­erly labelled.
  12. Use linewidths big enough to read. 1pt lines tend to dis­ap­pear if plots are shrunk.
  13. Avoid over­lap­ping text on plot­ting char­ac­ters or lines.
  14. Fol­low Tufte’s prin­ci­ples by remov­ing chart junk and keep­ing a high data-ink ratio.
  15. Plots should be self-explanatory, so included detailed captions.
  16. Use a sen­si­ble aspect ratio. I think width:height of about 1.6 works well for most plots.
  17. Pre­pare graph­ics in the final aspect ratio to be used in the pub­li­ca­tion. Dis­torted fonts look awful.
  18. Use points not lines if ele­ment order is not relevant.
  19. When prepar­ing plots that are meant to be com­pared, use the same scale for all of them. Even bet­ter, com­bine plots into a sin­gle graph if they are related.
  20. Avoid pie-charts. Espe­cially 3d pie-charts. Espe­cially 3d pie-charts with explod­ing wedges. I promise all my stu­dents an instant fail if I ever see any­thing so appalling.

The clas­sic books on graph­ics are:

These are both highly rec­om­mended. (If you can’t see the books above, turn off your ad-blocker.)

  • Share/Bookmark

Tags: , ,

Statistical Analysis StackExchange site now available

The Q&A site for sta­tis­ti­cal analy­sis, data min­ing, data visu­al­iza­tion, and every­thing else to do with data analy­sis has finally been launched. Please head over to

stats.StackExchange.com

and start ask­ing and answer­ing questions.

Also, spread the word to every­one else who may be inter­ested — work col­leagues, stu­dents, etc. The more peo­ple who use the site, the bet­ter it will be. There are already 170 ques­tions, 513 answers and 387 users.

Even­tu­ally the site will move to a dif­fer­ent domain name and have its own logo, etc.  For now it is in “pub­lic beta” which means that it is fully func­tional, but we are still work­ing out some of the details (such as what it will be called, who will be the mod­er­a­tors, etc.).

R ques­tions are allowed on this new site as well as on the orig­i­nal StackOverflow.com. We are still fig­ur­ing out how to avoid the prob­lem of hav­ing answers on two sites. For now, more sta­tis­ti­cal ques­tions should be directed to stats.StackExchange.com and more programming-oriented ques­tions should go to StackOverflow.com.

  • Share/Bookmark

Tags: , , ,

More StackExchange sites

The Stack­Ex­change site on Sta­tis­ti­cal Analy­sis is about to go into pri­vate beta test­ing. This is your last chance to com­mit if you want to be part of the pri­vate beta test­ing. Don’t worry if you miss out — it will only be a week before it is then open to the public.

There is also a Stack­Ex­change site pro­posal for TeX, LaTeX and friends. Pre­sum­ably that means that most of the LaTeX ques­tions on Stack­Over­flow will then move to this new site. It still needs a cou­ple of hun­dred more peo­ple to com­mit before it can be launched, so if you are inter­ested in LaTeX, please com­mit to being part of it.

Another site pro­posal that may be of inter­est to read­ers of this blog is the one on Eng­lish lan­guage usage.

A few pro­pos­als are already open to the pub­lic for beta test­ing. One that I’ve been using a lit­tle is Web Apps which is use­ful for ques­tions on Gmail, Google reader, Word­Press, etc.

  • Share/Bookmark

Tags: , , , ,