Thursday, 4 July 2013

My ProjectTemplate/Markdown/RStudio/knitr Routine

In January of this year, I posted a summary of my EssentialPhD Toolkit. In that post, I discussed the trifecta of RStudio, Markdown, and knitr. I also mentioned Project Template, an R package that helps keep files organized and can automate some analyses. I learned about and started using all of these tools at the same time, so I’m not always clear on how to use them independently... That said, I learned an integrated routine from a colleague and have adopted it for myself. I’ll share that routine with you in this post. I’m sure it can be improved, so please feel free to leave some tips in the comments. Also, ask me if anything is unclear. 

Pre-steps: Install R, RStudio, and packages
I'm assuming you have a basic understanding of R. Install RStudio and then the ProjectTemplate package. I don't recall if you need to install knitr or if it's automatic, but maybe install it too just in case.

Step 1: Create the project
Within RStudio, run the following lines of code. Change "MyProjectName" to whatever you would like your project to be called.
This will create a new folder inside "yourdirectory", filled with several sub-folders, like so:
Folder structure created by ProjectTemplate

Step 2: Open the project within RStudio
Within RStudio, click the Project menu, and select "Create Project".
Within the pop-up window, select "Existing Directory".
Using the Browse... button, find the folder you just created using create.project(), above. In this example, it's called "MyProjectName".
Click "Create Project".
Select "Create Project" from the Project menu, then "Existing Directory" from this windowBrowse for the folder you just created in Step 1
The project will load, as you can see in the top right-hand and left-hand corners: "MyProjectName" is the currently loaded project.

Using Markdown and knitr with ProjectTemplate
Open a new Markdown file by clicking File > New > R Markdown. I won't talk about the syntax of Markdown. For that, you can click on the "MD" button next to the "Knit HTML" button. Or you can Google for Markdown syntax.

I also won't dwell on the specifics and mechanics of ProjectTemplate - for that, visit the website, which contains much more detail than I will give here. Here I'm simply providing the extra bits of code I add to ensure that my scripts run smoothly without any working directory issues.

ProjectTemplate relies on the folder structure it created to automate some scripts and analyses. I don't take advantage of this as much as I should, but I do benefit from auto-loading packages and auto-loading any basic scripts (i.e. custom functions I wrote for this project). In order for scripts to be run automatically, they must be saved in specific places and the working directories must be specified in specific ways.

I keep my analysis scripts in my "src" folder. Any data-manipulation and pre-processing scripts are in the "munge" folder. My custom scripts are in the "lib" folder. The first chunk of any of my scripts is a Startup script, which specifies which folder the file is saved in (e.g. "src" or "munge") and then sets the directory one level higher - this step is necessary to load the project using ProjectTemplate.
``` {r startup}
if(basename(getwd()) == "src") setwd("..")
Depending on how your configuration file is set up, the load.project() function will load packages specified in your config file, load any files within the data folder, load any scripts in the lib folder, and run any scripts in the munge folder. I usually keep my munging and data-loading turned off because my files are big and my munging can take hours.  I prefer to have more customized control.

To my understanding, knitr re-sets the working directory to the "src" (or "munge") folder for each chunk, so I try to add a conditional working directory change at the start of each one. This conditional setwd() allows me to run the script within RStudio AND to let knitr knit the script, without confusing working directories. So, for example:
``` {r load_files}
if(basename(getwd()) == "MyProjectName") setwd("src")
Note: My colleague recently updated me on how to specify working directories. He said that instead of if(basename(getwd()) == "src") setwd(".."), I should use require(knitr); opts_knit$set(root.dir=".."). To be honest, my strategy works for me and I'm comfortable with it so I likely won't be changing... but I wanted to make sure you're aware of it.

That's the basics of how I start up a new project using ProjectTemplate, and how I navigate some of the issues with working directories using Markdown/knitr/RStudio. If you have any questions, leave a comment. I'm not an expert, but I'll try my best to address your question.