New search

PNG0096

Advanced Data Handling with R

The course will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials (online video and example code) and a task to complete. There will be a non-compulsory online question and answer session for each theme, before a class-wide workshop where students will present their work.

1. Write reproducible code

Individual study: Here students will practice how to write reproducible code. Students will consult style guides and then be given a simple exercise using one of two sample data sets. The idea here is that the functions used in the exercise should already be familiar to the students, but that the students will write the code in a reproducible way. Students will create a new (or use an existing) GitHub account to upload their work to a course project site. Students will then be assigned into groups of two and will have to use each other’s code to first re-run the original exercise, and then use the same code to complete the exercise with the other data set. Students will discuss together how their code could be improved to become more understandable and reproducible.

Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.

2. Confidently manipulate data and R-objects

Individual study: Study material will teach the students different ways to manipulate large and heterogeneous datasets in a reproducible way. Students will then be given one of two data sets and be asked to produce a set of specific figures. As in the first theme, students will be assigned to groups of two and check and comment on each other’s code for clarity and reproducibility.

Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.

Syllabus and other information

PNG0096 Advanced Data Handling with R, 2.0 Credits

Subjects

Mathematical Statistics

Education cycle

Postgraduate level

Grading scale

Pass / Failed

Language

English

Prior knowledge

A basic knowledge of the R language, enough to use it as a statistical tool for research. For example, being able to read/write files, simple manipulation and indexing of R objects, basic analyses, plotting data.

Objectives

After completing the course, students should be able to:

Write reproducible code

a) Write code that can be reused yourself or used and modified by a complete stranger

b) Standardize code and data structure

c) Basic use of Github

2.Confidently manipulate data and R-objects (never touch excel ever again)

a) Understand data manipulation tools such as the apply family and and the plyr package

b) Index, group and aggregate data using the above functions

c) Use advanced techniques for string manipulation

Check for and fix errors in data and code

a) Data cleaning and error checking (tests such as look for outliers, NAs, patterns)

b) Diagnostic plotting

c) Basic functions for debugging and most common errors

Content

Write reproducible code

Individual study: Here students will practice how to write reproducible code. Students will consult style guides and then be given a simple exercise using one of two sample data sets. The idea here is that the functions used in the exercise should already be familiar to the students, but that the students will write the code in a reproducible way. Students will create a new (or use an existing) GitHub account to upload their work to a course project site. Students will then be assigned into groups of two and will have to use each other’s code to first re-run the original exercise, and then use the same code to complete the exercise with the other data set. Students will discuss together how their code could be improved to become more understandable and reproducible.

Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.

Confidently manipulate data and R-objects

Individual study: Study material will teach the students different ways to manipulate large and heterogeneous datasets in a reproducible way. Students will then be given one of two data sets and be asked to produce a set of specific figures. As in the first theme, students will be assigned to groups of two and check and comment on each other’s code for clarity and reproducibility.

Workshop: Each pair will present the results of their exercise, and teachers will lead a discussion based on the students’ experiences.

Additional information

The course will follow the current recommendations concerning the public health and Covid-19 pandemic from the Public Health Agency of Sweden (FHM) and SLU.

Scheduled course meetings November 4, 11 and December 2.

The course is organized in collaboration with the postgraduate research schools Ecology -basics and applications and Focus on Soils and Water

Responsible department

Department of Ecology

Subject: Mathematical Statistics

Course code: PNG0096 Location: Uppsala Distance course: Yes Language: English Responsible department: Department of Ecology Pace: 100%

Uppsala
Mathematical Statistics
2.0 credits
2021-03-01 - 2021-03-05
English

Please note that further information on course location, type of teaching and other relevant information can be found under “Additional information” in the course Syllabus.

Coordinator

alistair.auffret@slu.se lorenzo.menichetti@slu.se

Loading…