Advanced Course: R programming and development

15-16. January 2015 (Thursday-Friday), EMBL Heidelberg

Teachers: Laurent Gatto, Robert Stojnic (University of Cambridge/Dataprogrammers)
Organiser: Wolfgang Huber (EMBL)

R is a language and computational environment for statistics, advanced data analysis, machine learning and visualisation; it is the most widely used programming language in bioinformatics.

This course is designed for users who have experience with writing R scripts, and who now want to advance one step further, into producing more durable and robust software projects and code that is usable by others.

The first part of the course will introduce object-oriented programming using R's S3 and S4 system and describe how to define classes, generic functions and methods. It will also present how to create Bioconductor-compliant R packages and document them. The second part will focus on various advanced topics in R programming such as unit testing, debugging, profiling and calling C/C++ code. It will also describe how to write efficient and elegant code using vectorisation, parallelization, and the functional programming paradigm. Finally, creating web applications with shiny will be discussed.

These topics will be illustrated using a small real-life bioinformatic case study. Participants will produce, at the end of the course, a fully fledged Bioconductor compliant R package.

Pre-requisites: Good knowledge of R and active programming experience. Familiarity with object-oriented programming and LaTeX is helpful, but not essential.

The course will take place on Thursday and Friday 15-16. January 2015 at EMBL Heidelberg, Courtyard A+B room (travel information). Participants are expected to bring their own laptop with R version >=3.1.0 installed.

Registration

The number of participants is limited to 35 and will be processed on a first-come first-serve basis. Registration deadline is 19 December 2014.

The course is fully booked and registration is closed.

This course is now finished. We wish to thanks all the participants!

Course content

  • Introduction
  • R programming:
    • The S3 and S4 OO systems.
    • More details about the S4 system: defining classes, virtual classes, generics and methods, accessors and replacement methods, object validity, defining prototype instances, object initialisation.
    • Functional programming: Reduce, Map, Filter, Negate.
    • Overview of the *apply/sweep/replicate et al. functions.
    • Vectorization.
    • Parallelisation.
  • Documentation:
    • Documenting functions, methods and data, providing executable examples
    • In-source documentation
    • Vignettes and reproducible research
  • Writing R packages:
    • Minimum package structure
    • Building, checking and installing packages
    • Best practices
  • Other programming topics
    • Testing/debugging code, unit testing.
    • Profiling and debugging R code.
    • Calling C[++] code from R; debugging C code.
    • Building web interfaces with shiny.

Course material

Course slides are on github.

Programme

Day 1

09:15 - 10:30Introduction
Break
11:00 - 12:30S3/S4 object-oriented programming
Lunch
13:30 - 15:30Package development
Break
16:00 - 17:30Advanced issues in package development and documentation
19:00 - Social event / dinner (optional)

Day 2

09:15 - 10:30Unit testing and debugging
Break
11:00 - 12:30Profiling and calling C/C++ code
Lunch
13:30 - 15:30Vectorisation, functional programming and parallelisation
Break
16:00 - 17:30Building web interfaces: shiny; wrap-up