Posts tagged computing

Statistical Analysis StackExchange site now available

The Q&A site for sta­tis­ti­cal analy­sis, data min­ing, data visu­al­iza­tion, and every­thing else to do with data analy­sis has finally been launched. Please head over to

stats.StackExchange.com

and start ask­ing and answer­ing questions.

Also, spread the word to every­one else who may be inter­ested — work col­leagues, stu­dents, etc. The more peo­ple who use the site, the bet­ter it will be. There are already 170 ques­tions, 513 answers and 387 users.

Even­tu­ally the site will move to a dif­fer­ent domain name and have its own logo, etc.  For now it is in “pub­lic beta” which means that it is fully func­tional, but we are still work­ing out some of the details (such as what it will be called, who will be the mod­er­a­tors, etc.).

R ques­tions are allowed on this new site as well as on the orig­i­nal StackOverflow.com. We are still fig­ur­ing out how to avoid the prob­lem of hav­ing answers on two sites. For now, more sta­tis­ti­cal ques­tions should be directed to stats.StackExchange.com and more programming-oriented ques­tions should go to StackOverflow.com.

  • Share/Bookmark

Tags: , , ,

More StackExchange sites

The Stack­Ex­change site on Sta­tis­ti­cal Analy­sis is about to go into pri­vate beta test­ing. This is your last chance to com­mit if you want to be part of the pri­vate beta test­ing. Don’t worry if you miss out — it will only be a week before it is then open to the public.

There is also a Stack­Ex­change site pro­posal for TeX, LaTeX and friends. Pre­sum­ably that means that most of the LaTeX ques­tions on Stack­Over­flow will then move to this new site. It still needs a cou­ple of hun­dred more peo­ple to com­mit before it can be launched, so if you are inter­ested in LaTeX, please com­mit to being part of it.

Another site pro­posal that may be of inter­est to read­ers of this blog is the one on Eng­lish lan­guage usage.

A few pro­pos­als are already open to the pub­lic for beta test­ing. One that I’ve been using a lit­tle is Web Apps which is use­ful for ques­tions on Gmail, Google reader, Word­Press, etc.

  • Share/Bookmark

Tags: , , , ,

Stack exchange for statistical analysis needs you!

The pro­posal to cre­ate a Stack­Ex­change site for sta­tis­ti­cal analy­sis is steadily mov­ing for­ward. We have now com­pleted the scop­ing stage which involved find­ing enough peo­ple will­ing to express an inter­est in the idea, and vot­ing on some exam­ple ques­tions to define what is allowed and what is not allowed on the site. The on-topic ques­tions that have been selected are these:

  1. What is a ‘stan­dard deviation’?
  2. Which of the fol­low­ing three graph­ics best dis­plays this data set? Why?
  3. What’s the best way to iden­tify an out­lier in mul­ti­vari­ate data?
  4. Can you give an exam­ple of where I might pre­fer to use a z-test vs a t-test?
  5. What are the dif­fer­ences between Bayesian and Fre­quen­tist reasoning?

Exam­ples of ques­tions con­sid­ered off-topic are:

  1. How do I win in Poker?
  2. I have two chil­dren. One is a boy born on a Tues­day. What is the prob­a­bil­ity I have two boys?
  3. Joe is 8 years old, Mike is 10 years old, and Alice is 13. What is their MEDIAN age?
  4. Where can I access NASA’s data archives?
  5. How much should I expect to pay for a SAS licence?

The next phase is to get peo­ple to com­mit to con­tribut­ing to the site. Many read­ers of this blog have already reg­is­tered as “fol­low­ers” — now you have to make a com­mit­ment to be a con­trib­u­tor as well. The site won’t launch until there are enough peo­ple com­mit­ted to being part of it.

Just go to the site and indi­cate that you are will­ing to be an active par­tic­i­pant once it launches.

If you’re won­der­ing what this is all about, and why this is a much bet­ter approach than the var­i­ous usenet and email help groups, there’s a nice sum­mary on Tal Galili’s blog.

  • Share/Bookmark

Tags: , , ,

Use fake data and real data

When devel­op­ing new sta­tis­ti­cal meth­ods, it is very use­ful to test them on both fake data (i.e., sim­u­la­tions) and real data.

Test­ing on fake data is use­ful because then you know the “true” answer and can check the pro­ce­dure under ideal con­di­tions. If your method doesn’t work when the data are designed for the task, it is unlikely to work in real con­di­tions. Fake data also enables you to test the robust­ness of your method when the con­di­tions aren’t per­fect — for exam­ple, try adding some nasty out­liers and see if the method still works. With fake data, you can gen­er­ate as many sam­ples as you need, thus ensur­ing that what you see is real (sta­tis­ti­cally sig­nif­i­cant) rather than just an odd example.

A fur­ther advan­tage of fake data is that any­one can repro­duce your work and check (or extend) your results. Some­times real data can­not be dis­trib­uted due to restric­tions imposed by the owner of the data. But there are never restric­tions on fake data. You just have to make sure you explain the data gen­er­at­ing process suf­fi­ciently clearly that other peo­ple can repli­cate what you’ve done.

Test­ing on real data is use­ful because it gives some indi­ca­tion of whether your method will be use­ful in real­ity and not just in theory.

Yeas­min Khan­dakar and I once devel­oped a neat method for select­ing the order of an ARIMA model which worked won­der­fully well on fake data that were gen­er­ated from ARIMA processes, but failed on any real data. The prob­lem seemed to be that it was par­tic­u­larly sen­si­tive to model mis-specification. So when the data had any fea­tures that were not typ­i­cal of ARIMA processes, the method failed. No real data are gen­uinely ARIMA processes, and so the method is not par­tic­u­larly use­ful (and has never been published).

On the other hand, damped expo­nen­tial smooth­ing works bet­ter than you would expect, even on data that come from processes for which damped expo­nen­tial smooth­ing is far from the­o­ret­i­cally opti­mal. In chap­ter 7 of my expo­nen­tial smooth­ing book, we showed (with real data) that using a damped expo­nen­tial smooth­ing model for all series gives results that are almost as good as those obtained after a com­pu­ta­tion­ally inten­sive search for an opti­mal model over the entire model space.

  • Share/Bookmark

Tags: ,

Backing up Gmail

I rec­om­mend Gmail to every­one who asks, and many who don’t, as it is far supe­rior to every other email plat­form around. But being para­noid, I don’t like all that valu­able email in some­one else’s hands. What if Google goes bust one day? Or the Aus­tralian government’s inter­net fil­ter stops gmail? Or I move to China? So I need a local backup just in case. I also need the backup to be pain­less and not require much attention.

The solu­tion is Thun­der­bird, but there is a bit of set­ting up to do at first, then you can sit back and let it do its work. The instruc­tions are here. You need to fol­low them — sim­ply set­ting up Thun­der­bird to access your gmail is not enough as Thun­der­bird won’t down­load your mail for local stor­age by default.

Once you’ve set up Thun­der­bird to down­load every­thing, all you need to do is open Thun­der­bird every few weeks and leave it to do it’s stuff.

If that’s too much work, you can always have Thun­der­bird open auto­mat­i­cally at start up but stay min­i­mized to the tray.

  • Share/Bookmark

Tags: , ,

Recommended freeware

Today my new Win­dows note­book arrived and I have gone through the process of rein­stalling all my software.  Mostly I use free­ware, not just because it is free but also because most of this soft­ware is bet­ter than any­thing avail­able com­mer­cially. I thought it would be use­ful to update my post on what I’ve installed and what I recommend.

Ninite

    The fastest way to down­load and install free­ware is via Ninite. It doesn’t include every­thing, but does cover a wide range of soft­ware which it auto­mat­i­cally down­loads and installs (using defaults) with­out any user inter­ven­tion. It is amaz­ing how much faster this makes it. I installed the fol­low­ing programs.

  • Chrome (my favourite browser — super fast)
  • Fire­fox (the other browser I occa­sion­ally use)
  • Skype (for long-distance conversations)
  • Thun­der­bird (which I only use as a backup for my gmail account)
  • iTunes (for music and podcasts)
  • VLC (for play­ing video)
  • Audac­ity (for edit­ing sound files)
  • Picasa (for photos)
  • GIMP (for edit­ing images)
  • Inkscape (for cre­at­ing line draw­ings using vec­tor graphics)
  • OpenOf­fice (so I can read the files some peo­ple send to me)
  • Adobe Reader (which I still use for most pdf reading)
  • Avast (my pre­ferred virus checker)
  • Flash (for all those web­sites that use it)
  • Java (for all those web­sites that use it)
  • Google Earth (my very favourite way of wast­ing time)
  • CCleaner (for clean­ing up my old files, unin­stalling unwanted pro­grams, edit­ing what pro­grams run at start up, etc.)
  • Recuva (just in case I delete some­thing by mistake)
  • Filezilla (for mov­ing files to one of my websites)
  • Notepad++ (an excel­lent sim­ple text editor)

All that down­loaded and installed in about 15 min­utes with­out need­ing any of my attention!

Then I installed the fol­low­ing pro­grams which are not part of Ninite.

R

  • R. The stan­dard com­put­ing plat­form for almost all applied sta­tis­ti­cal research these days.
  • Rtools. All the tools needed to develop your own R packages.

LaTeX

  • Mik­TeX. I can­not under­stand why any­one who writes about math­e­mat­ics uses any­thing other than a LaTeX sys­tem. This is the sim­plest install for Windows.
  • WinEdt. The best Win­dows text edi­tor for LaTeX and it inter­faces seam­lessly with MiK­TeX. (Actu­ally, this is share­ware rather than free­ware.) I’m now using WinEdt 6.0. See this post for some help­ful con­fig­u­ra­tion instructions.
  • JabRef. For man­ag­ing Bib­TeX databases.
  • Suma­traPDF. For view­ing PDFs cre­ated by Mik­TeX. The big advan­tage over Adobe Reader is the pdfs have for­wards and back­wards sync­ing with the TeX file and the pro­gram doesn’t com­plain when the pdf file is updated.

Com­puter management

  • Google pack. Lots of use­ful util­i­ties includ­ing Google desk­top (for find­ing files), etc. Many of these are on Ninite, but Gdesk­top isn’t so I still need it.

Bible

  • e-sword. For those want­ing an elec­tronic Bible, this is a great resource with zil­lions of add-ons. The only has­sle is you have to install every add-on separately.

Graph­ics

Util­i­ties

  • Share/Bookmark

Tags:

Update on a StackExchange site for statistical analysis

About six weeks ago, I pro­posed that there should be a Stack Exchange site for ques­tions on data analy­sis, sta­tis­tics, data min­ing, machine learn­ing, etc. I can finally report that there has been sub­stan­tial progress on this.

The for­mal pro­posal is now at Area 51 where the scope of the new site is being devel­oped and voted on in a demo­c­ra­tic way. The site has been in a pri­vate beta state for a week or so, but is now open for any­one to join in.

So if you’re inter­ested in this pro­posed site for questions/answers on sta­tis­ti­cal analy­sis, please head over to Area 51 and join in the dis­cus­sion and vot­ing on what the site should cover. It would be a good idea to first read the FAQ so you under­stand how the sys­tem works.

  • Share/Bookmark

Tags: , ,

Google scholar alerts

A cou­ple of weeks ago, Google scholar added a facil­ity to pro­vide email alerts on new arti­cles asso­ci­ated with spe­cific search queries. First do the search, then click the enve­lope at top left of screen. For exam­ple, here is a search on “expo­nen­tial smooth­ing” since 2000.

Note the enve­lope at the top marked New! Click it to get the fol­low­ing screen.

Those results show some of the flaws in Google Scholar — the dates are not always cor­rect (the first paper listed above appeared in 2004) and there are unre­solved duplicates.

Despite the prob­lems, if you’re want­ing to keep an eye out for new papers on par­tic­u­lar top­ics, this looks like it could be use­ful. Unfor­tu­nately, there is no RSS feed available.

  • Share/Bookmark

Tags: , ,

Online mathematical resources

DLMF

For nearly 50 years, a stan­dard ref­er­ence in math­e­mat­i­cal work has been Abramowitz and Stegun’s (1964) Hand­book of Math­e­mat­i­cal Func­tions with For­mu­las, Graphs, and Math­e­mat­i­cal Tables. It has pro­vided a mar­vel­lous col­lec­tion of results and tables that have been indis­pens­able for a gen­er­a­tion of math­e­mati­cians. I’ve used it to look up com­pu­ta­tion­ally effi­cient meth­ods for cal­cu­lat­ing Bessel func­tions or gamma func­tions, or to find one of those trigono­met­ric iden­ti­ties I learned in high school and no longer remem­ber. Appar­ently nearly 1 mil­lion copies of the hand­book have been printed and it has also been scanned and put online.

Lately, the hand­book has fallen out of favour a lit­tle, partly because there is not such a need for it. We no longer need tables for trigono­met­ric func­tions or log­a­rithms, and a lot of func­tions are built into R, includ­ing Bessel func­tions and vari­a­tions on the gamma func­tion. Another rea­son for its declin­ing pop­u­lar­ity has been the rise of online resources: if you want to know some­thing about orthog­o­nal poly­no­mi­als, there is a good chance it is cov­ered in the Wikipedia arti­cle.

Now the hand­book has been reis­sued as the NIST Hand­book of Math­e­mat­i­cal Func­tions (Cam­bridge Uni­ver­sity Press) with a free web edi­tion called the NIST Dig­i­tal Library of Math­e­mat­i­cal Func­tions (DLMF). It has been updated to include colour graph­ics, point­ers to rec­om­mended soft­ware, and lots of new top­ics to reflect work from the last 50 years.

Wol­fra­mAl­pha

Wol­fra­mAl­pha is now a year old and it has become a remark­able resource for some things. It was orig­i­nally com­pared to Google which is inap­pro­pri­ate — they are intended for dif­fer­ent pur­poses. Google indexes the web, while Wol­fra­mAl­pha is a knowl­edge engine.

Recently I needed to find the inte­gral of $latex 2\tan(2x)\sec^6(2x)$. Typ­ing integral 2tan(2x)sec^6(2x) gave me the result straight away. Of course, I could use Math­e­mat­ica or Maple for this, but it is much eas­ier to use my browser. It also means such alge­braic results are avail­able to every­one with­out need­ing spe­cial­ist sym­bolic alge­bra software.

A few days later, I was work­ing on a project involv­ing mod­el­ling elec­tric­ity demand as a func­tion of tem­per­a­ture. The tem­per­a­ture data looked odd and I sus­pected it was all out by one day. To check, I typed melbourne temperature 21 February 2010 into Wol­fra­mAl­pha and it promptly gave me the tem­per­a­ture data for Mel­bourne Air­port for that day, and with one more click of the mouse I had the data for the whole week, con­firm­ing my suspicion.

For the sorts of things that Wol­fra­mAl­pha is good at, see the exam­ples page.

Wikipedia

Wikipedia needs no intro­duc­tion and it is sur­pris­ingly good in some areas of math­e­mat­ics (e.g., prob­a­bil­ity dis­tri­b­u­tions) but not very good for some areas of sta­tis­tics (e.g., see the arti­cle on ARIMA mod­els or the one on Cronbach’s alpha). The good news is that the sta­tis­tics arti­cles are improv­ing and is now start­ing to be usable as a first port of call when look­ing up an unfa­mil­iar method.

  • Share/Bookmark

Tags: ,

A StackExchange site for statistical analysis?

Reg­u­lar read­ers of this site will know I’m a fan of using Stack Over­flow for ques­tions about LaTeX, R and other areas of pro­gram­ming. Now the peo­ple who pro­duce Stack Over­flow are plan­ning on set­ting up sev­eral new sites for ask­ing ques­tions about other top­ics, and are seek­ing pro­pos­als. I have pro­posed that there should be a site for ques­tions on data analy­sis, sta­tis­tics, data min­ing, machine learn­ing, etc.

It is more likely that my idea will turn into a func­tion­ing site if peo­ple who agree with me vote for it. So if you agree, please head over to meta.stackexchange.com and vote! (You will need to reg­is­ter first, but that’s free.)

If you dis­agree or have any com­ments about the idea, I’d also like to hear from you. But please add your com­ments to my pro­posal rather than here.

  • Share/Bookmark

Tags: , ,