A blog by Rob J Hyndman 

Twitter Gplus RSS

Makefiles for R/​LaTeX projects

Published on 31 October 2012

Updated: 21 Novem­ber 2012

Make is a mar­vel­lous tool used by pro­gram­mers to build soft­ware, but it can be used for much more than that. I use make when­ever I have a large project involv­ing R files and LaTeX files, which means I use it for almost all of the papers I write, and almost of the con­sult­ing reports I produce.

If you are using a Mac or Linux, you will already have make installed. If you are using Win­dows and have Rtools installed, then you will also have make. Oth­er­wise, Win­dows users will need to install it. One imple­men­ta­tion is in GnuWin.

A typ­i­cal project of mine will include sev­eral R files con­tain­ing code that fit some mod­els, and gen­er­ate tables and graphs. I try to set things up so I can re-​​create all the results by sim­ply run­ning the R files. Then I will have a LaTeX file which con­tains the paper or report I am writ­ing. The tables and graphs pro­duced by R are pulled in to the LaTeX file. Con­se­quently, all I need to do is run all the R files, and then process the tex file, and the paper/​report is generated.

Make relies on a Makefile to deter­mine what it must do. Essen­tially, a Makefile spec­i­fies what files must be gen­er­ated first, and how to gen­er­ate them. So I need a Makefile that spec­i­fies that all the R files must be processed first, and then the LaTeX file.

The beauty of a Makefile is that it will only process the files that have been updated. It is smart enough not to re-​​run code if it has already been run. So if noth­ing has changed, run­ning make does noth­ing. If only the tex file changes, run­ning make will re-​​compile the tex doc­u­ment. If the R code has changed, run­ning make will re-​​run the R code to gen­er­ate the new tables and graphs, and then re-​​compile the tex doc­u­ment. All I do is type make and it fig­ures out what is required.

A Make­file for LaTeX

It is easy to tell if the latex doc­u­ment needs com­pil­ing — make sim­ply has to check that the pdf ver­sion of the doc­u­ment is older than the tex ver­sion of the doc­u­ment. Here is a sim­ple Makefile that will just han­dle a LaTeX document.

TEXFILE= paper
$(TEXFILE).pdf: $(TEXFILE).tex
	latexmk -pdf -quiet $(TEXFILE)

The first line spec­i­fies the name of my file, in this case paper.tex. The sec­ond line spec­i­fies that the pdf file must be cre­ated from the tex file, and the last line explains how to do that. Mik­TeX users might pre­fer pdftexify instead of latexmk.

To use the above Makefile, copy the code into a plain text file called Makefile and store it in the same direc­tory as your tex file. Change the first line so the name of your tex file (with­out the exten­sion) is used. Then type make from a com­mand prompt within the same direc­tory as the tex file, and it should do what­ever is nec­es­sary to con­vert your tex to pdf.

Of course, you wouldn’t nor­mally bother with a Makefile if that is all it did. But throw in a whole lot of R files, and it becomes very worthwhile.

A Make­file for R and LaTeX

We need a way to allow make to be able to tell if an R file has been run. If the R files are run using

R CMD BATCH file.R

then the out­put is saved as file.Rout. Then make only has to check if file.Rout is older than file.R.

I also like to strip out all the white space from the pdf fig­ures cre­ated in R before I put them in a LaTeX doc­u­ment. There is a nice com­mand pdfcrop which does that. (You should already have it on a Mac or Linux, and also on Win­dows pro­vided you are using Mik­TeX.) So I also want my Makefile to crop all images if they have not already been done. Once an image is cropped, an empty file of the form file.pdfcrop is cre­ated to indi­cate that file.pdf has already been cropped.

OK, now we are ready for my mar­vel­lous Makefile.

# Usually, only these lines need changing
TEXFILE= paper
RDIR= .
FIGDIR= ./figs
 
# list R files
RFILES := $(wildcard $(RDIR)/*.R)
# pdf figures created by R
PDFFIGS := $(wildcard $(FIGDIR)/*.pdf)
# Indicator files to show R file has run
OUT_FILES:= $(RFILES:.R=.Rout)
# Indicator files to show pdfcrop has run
CROP_FILES:= $(PDFFIGS:.pdf=.pdfcrop)
 
all: $(TEXFILE).pdf $(OUT_FILES) $(CROP_FILES)
 
# May need to add something here if some R files depend on others.
 
# RUN EVERY R FILE
$(RDIR)/%.Rout: $(RDIR)/%.R $(RDIR)/functions.R
	R CMD BATCH $<
 
# CROP EVERY PDF FIG FILE
$(FIGDIR)/%.pdfcrop: $(FIGDIR)/%.pdf
	pdfcrop $< $< && touch $@
 
# Compile main tex file and show errors
$(TEXFILE).pdf: $(TEXFILE).tex $(OUT_FILES) $(CROP_FILES)
	latexmk -pdf -quiet $(TEXFILE)
 
# Run R files
R: $(OUT_FILES)
 
# View main tex file
view: $(TEXFILE).pdf
	evince $(TEXFILE).pdf &
 
# Clean up stray files
clean:
	rm -fv $(OUT_FILES) 
	rm -fv $(CROP_FILES)
	rm -fv *.aux *.log *.toc *.blg *.bbl *.synctex.gz
	rm -fv *.out *.bcf *blx.bib *.run.xml
	rm -fv *.fdb_latexmk *.fls
	rm -fv $(TEXFILE).pdf
 
.PHONY: all clean

Down­load the file here. For most projects I copy this file into the main direc­tory of my project, then all I have to do is mod­ify the first few lines. RDIR spec­i­fies where the R files are kept and FIGDIR spec­i­fies where the fig­ures are kept. Nor­mally I keep these together, but some­times they might be in sep­a­rate directories.

Now make will do every­thing nec­es­sary — run the R files, crop the pdf graph­ics, and process the latex doc­u­ment. But it won’t do any steps that don’t need doing.

make R will only process the R files.

make view will run the pdf viewer, after updat­ing the pdf file if necessary.

make clean will delete all the files gen­er­ated by latex or by make, so that the entire process must be run again at the next make command.

Notice that my R files all depend on functions.R. This is a file that con­tains project-​​specific func­tions. If this file is updated, all the other R files will need updat­ing also.

For many projects, some R files will depend on some oth­ers hav­ing already run. For exam­ple, read.R may read in the data and refor­mat it for analy­sis, while plot.R might pro­duce some graphs assum­ing that read.R has already run. To ensure make knows about this depen­dency, we need to add a line

$(RDIR)/plot.Rout: $(RDIR)/plot.R $(RDIR)/functions.R $(RDIR)/read.R
	R CMD BATCH $<

This should be inserted where I have the com­ment # May need to add something here if some R files depend on others.

This Makefile works on Linux. Mac and Win­dows users will need to replace evince by what­ever pdf viewer they pre­fer.


Related Posts:


 
17 Comments  comments 
  • Anony­mous

    If I may sug­gest, I’d replace the tex invo­ca­tion with “rub­ber –pdf” and rubber-​​info. That should cleanly take care of a lot of machinery.

    • http://yihui.name/ Yihui Xie

      or “texi2pdf –c”

    • http://robjhyndman.com Rob J Hyndman

      Thanks. That’s much bet­ter. I’ve updated the file.

      • http://twitter.com/bioinformatics Michael Bar­ton

        Rub­ber in no longer actively main­tained as far as I am aware. I pre­fer to use latexmk — http://www.phys.psu.edu/~collins/software/latexmk-jcc/

        • Peter Baker

          I sec­ond that — even though I’ve just started using latexmk in my R make­files, it seems more portable too. Nice sum­mary Rob

  • MakeLover

    This make­file should be called GNU­Make­file not “make­file” it relies on non-​​portable GNU exten­sions :(

  • szan­taii
  • Simon

    I’m won­der­ing why you have cho­sen to do it this way rather than using sweave or to use some­thing like “projects” in RStudio?

    • http://robjhyndman.com Rob J Hyndman

      I choose not to use sweave or knitr because I like to keep my R code and my LaTeX file sep­a­rate. This is partly because of co-​​authors. Often I will write a doc­u­ment with a co-​​author and it’s hard enough to get them to use LaTeX instead of Word, with­out try­ing to get them to use sweave. Also, I just think it is neater to sep­a­rate out the R code into sep­a­rate files rather than stuff every­thing into one bloated file.

      I haven’t tried RStu­dio projects yet. I’ll prob­a­bly blog about them at some point if I like them.

  • Dam­jan Vukcevic

    You might want to use “R CMD BATCH –no-​​save” so that R doesn’t save the work­space after pro­cess­ing each script. Oth­er­wise when sub­se­quent scripts are exe­cuted they will load the pre­vi­ously saved work­space (includ­ing any vari­ables, data and func­tions defined in them), which is prob­a­bly not what you want.

    • http://robjhyndman.com Rob J Hyndman

      Actu­ally, I do want to load the work­space. For exam­ple, one R file usu­ally reads the data cleans it, and sets up the rel­e­vant objects in R. The next one will take those objects and do some sta­tis­ti­cal mod­el­ling. At the end of each R file I remove all objects that I don’t need any longer, keep­ing only the objects required for later use.

  • Vivi

    Your talk (R meetup) made me want to learn make, but in googling it I found out about Scons (Python based con­struct tool). Have you heard of it? Do you have any “feel­ings” about Scons vs Make?

    • http://robjhyndman.com Rob J Hyndman

      Yes, I’ve heard of Scons, but never used it. As make is avail­able on all sys­tems, and my needs are fairly sim­ple, I fig­ured it was bet­ter to stick with it.

  • Pingback: Automatically Setup New R and LaTeX Projects | You Study Politics, Right?

  • Pingback: Hyndsight - Removing white space around R figures

  • Jim

    I was able to make depen­den­cies work with a slight mod­i­fi­ca­tion. I had to change .R to .Rout. For exam­ple, I would change:

    (RDIR)/plot.Rout:(RDIR)/plot.R (RDIR)/read.R 	R CMD BATCH<

    to

    (RDIR)/plot.Rout:(RDIR)/plot.R (RDIR)/read.Rout R CMD BATCH<

    I do get an error about a cir­cu­lar depen­dency, but at least it’s work­ing for me now. Before that change, all files were load­ing in alpha­bet­i­cal order and seem­ingly ignor­ing the dependencies.

  • Pingback: Hyndsight - Reflections on UseR! 2013