Advanced Data Handling with R
1. Write reproducible code
Individual study: Here students will practice how to write reproducible code. Students will consult style guides and then be given a simple exercise using one of two sample data sets. The idea here is that the functions used in the exercise should already be familiar to the students, but that the students will write the code in a reproducible way. Students will create a new (or use an existing) GitHub account to upload their work to a course project site. Students will then be assigned into groups of two and will have to use each other’s code to first re-run the original exercise, and then use the same code to complete the exercise with the other data set. Students will discuss together how their code could be improved to become more understandable and reproducible.
Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.
2. Confidently manipulate data and R-objects
Individual study: Study material will teach the students different ways to manipulate large and heterogeneous datasets in a reproducible way. Students will then be given one of two data sets and be asked to produce a set of specific figures. As in the first theme, students will be assigned to groups of two and check and comment on each other’s code for clarity and reproducibility.
Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.
Syllabus and other information
Syllabus
PNG0096 Advanced Data Handling with R, 2.0 Credits
Subjects
Mathematical StatisticsEducation cycle
Postgraduate levelGrading scale
Language
EnglishPrior knowledge
A basic knowledge of the R language, enough to use it as a statistical tool for research. For example, being able to read/write files, simple manipulation and indexing of R objects, basic analyses, plotting data.Objectives
After completing the course, students should be able to:
- Write reproducible code
a) Write code that can be reused yourself or used and modified by a complete stranger
b) Standardize code and data structure
c) Basic use of Github
2.Confidently manipulate data and R-objects (never touch excel ever again)
a) Understand data manipulation tools such as the apply family and and the plyr package
b) Index, group and aggregate data using the above functions
c) Use advanced techniques for string manipulation
- Check for and fix errors in data and code
a) Data cleaning and error checking (tests such as look for outliers, NAs, patterns)
b) Diagnostic plotting
c) Basic functions for debugging and most common errors
Content
The course will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials (online video and example code) and a task to complete. There will be a non-compulsory online question and answer session for each theme, before a class-wide workshop where students will present their work.
- Write reproducible code
Individual study: Here students will practice how to write reproducible code. Students will consult style guides and then be given a simple exercise using one of two sample data sets. The idea here is that the functions used in the exercise should already be familiar to the students, but that the students will write the code in a reproducible way. Students will create a new (or use an existing) GitHub account to upload their work to a course project site. Students will then be assigned into groups of two and will have to use each other’s code to first re-run the original exercise, and then use the same code to complete the exercise with the other data set. Students will discuss together how their code could be improved to become more understandable and reproducible.
Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.
- Confidently manipulate data and R-objects
Individual study: Study material will teach the students different ways to manipulate large and heterogeneous datasets in a reproducible way. Students will then be given one of two data sets and be asked to produce a set of specific figures. As in the first theme, students will be assigned to groups of two and check and comment on each other’s code for clarity and reproducibility.
Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.
Additional information
The course will follow the current recommendations concerning the public health and Covid-19 pandemic from the Public Health Agency of Sweden (FHM) and SLU.Scheduled course meetings November 4, 11 and December 2.
The course is organized in collaboration with the postgraduate research schools Ecology -basics and applications and Focus on Soils and Water
Responsible department
Department of Ecology