Ny sökning
PNG0100

Data Handling with R

The course will start with a short introductory workshop. Thereafter the content will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials and a task to complete. Students will have the opportunity to communicate online among themselves and with the teachers using asynchronous tools while working on each theme, before a class-wide online workshop where students will discuss the assignments and work on additional tasks. If the situation allows, it might be possible to organise an in-person, student-led session on the Ultuna campus before each workshop where a teacher can be present.



Introduction: The course will start with a short online introduction where students and teachers will get to know each other and the structure of the course will be explained. Time will then be given for students to install and connect to the relevant software that will be used during the course.



1. Write reproducible code

Individual study: Here students will acquaint themselves with the very basics of using GitHub for code backup, archiving, sharing and editing, uploading their work to a course project site. Working in small groups, students will practice writing reproducible code using one of two sample data sets. The idea is that, while the functions used in the exercise should already be familiar to the students the students will write the code in a reproducible way that with very little editing allows the same code to be used on an alternative data set by a stranger.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.



2. Confidently manipulate data and R-objects

Individual study: Study material will teach the students different ways to manipulate large and heterogeneous data sets in a reproducible way. Students will then be given one of two data sets and be asked to organise the data in a specific way. As in the first theme, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.



3. Check for and fix errors in data and code

Individual study: In this theme, students will learn some ways to identify and deal with errors in code and/or data sets. They will then receive a ‘buggy’ data set (optional: own data set), and using these skills and the knowledge gained in the rest of the course to clean and restructure the data in order to produce a set of specified figures. Again, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.

Kursplan

PNG0100 Data Handling with R, 3,0 Hp

Ämnen

Matematisk statistik

Utbildningens nivå

Forskarnivå

Förkunskapskrav

A basic knowledge of the R language, enough to use it as a statistical tool for research. For example, being able to read/write files, simple manipulation and indexing of R objects, basic analyses, plotting data. Admitted to PhD-studies.

Mål

The course aims at improving the effectiveness of the scientific code you write by tapping into generally less utilized capabilities of R and its software ecosystem. In particular the course will teach you how to write more ordered code that can be easily reused and incorporated into other projects and to deal with the automation of data and code handling tasks. This will allow you to save time and handle bigger data-sets.



Learning outcomes

After completing the course, students should be able to:

Write reproducible code

•Basic use of GitHub and Rmarkdown.

•Write code that can be reused yourself or used and modified by a complete stranger

•Standardize code and data structure



Confidently manipulate data and R-objects (never touch excel ever again)

•Understand data manipulation tools such as the apply family

•Index, group and aggregate data

•Write simple loops and functions



Check for and fix errors in data and code

•Data cleaning and error checking (tests such as look for outliers, NAs, patterns)

•Diagnostic plotting

•Basic functions for debugging and most common errors

Innehåll

The course will start with a short introductory workshop. Thereafter the content will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials and a task to complete. Students will have the opportunity to communicate online among themselves and with the teachers using asynchronous tools while working on each theme, before a class-wide online workshop where students will discuss the assignments and work on additional tasks. If the situation allows, it might be possible to organise an in-person, student-led session on the Ultuna campus before each workshop where a teacher can be present.



Introduction: The course will start with a short online introduction where students and teachers will get to know each other and the structure of the course will be explained. Time will then be given for students to install and connect to the relevant software that will be used during the course.



1. Write reproducible code

Individual study: Here students will acquaint themselves with the very basics of using GitHub for code backup, archiving, sharing and editing, uploading their work to a course project site. Working in small groups, students will practice writing reproducible code using one of two sample data sets. The idea is that, while the functions used in the exercise should already be familiar to the students the students will write the code in a reproducible way that with very little editing allows the same code to be used on an alternative data set by a stranger.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.



2. Confidently manipulate data and R-objects

Individual study: Study material will teach the students different ways to manipulate large and heterogeneous data sets in a reproducible way. Students will then be given one of two data sets and be asked to organise the data in a specific way. As in the first theme, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.



3. Check for and fix errors in data and code

Individual study: In this theme, students will learn some ways to identify and deal with errors in code and/or data sets. They will then receive a ‘buggy’ data set (optional: own data set), and using these skills and the knowledge gained in the rest of the course to clean and restructure the data in order to produce a set of specified figures. Again, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.

Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.

Examinationsformer och fordringar för godkänd kurs

To receive course credits, students are required to have completed all individual exercises and played an active part in all workshops.

Ytterligare information

Apply for the course no later than 30 November 2021 by sending an email to Alistair Auffret: alistair.auffret@slu.se, who will lead the course together with Lorenzo Menichetti: lorenzo.menichetti@slu.se.

Time table 2022:

11 January, Kick-off workshop

20 January, Workshop: Theme 1

4 February, Workshop: Theme 2

21 February, Workshop: Theme 3

Ansvarig institution/motsvarande

Institutionen för Ekologi

Kursfakta

Ämne: Matematisk statistik
Kurskod: PNG0100 Distanskurs: Nej Ansvarig avdelning: Institutionen för Ekologi