New search
PNG0100
Data Handling with R
The course will start with a short introductory workshop. Thereafter the content will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials and a task to complete. Students will have the opportunity to communicate online among themselves and with the teachers using asynchronous tools while working on each theme, before a class-wide online workshop where students will discuss the assignments and work on additional tasks. If the situation allows, it might be possible to organise an in-person, student-led session on the Ultuna campus before each workshop where a teacher can be present.
Introduction: The course will start with a short online introduction where students and teachers will get to know each other and the structure of the course will be explained. Time will then be given for students to install and connect to the relevant software that will be used during the course.
1. Write reproducible code
Individual study: Here students will acquaint themselves with the very basics of using GitHub for code backup, archiving, sharing and editing, uploading their work to a course project site. Working in small groups, students will practice writing reproducible code using one of two sample data sets. The idea is that, while the functions used in the exercise should already be familiar to the students the students will write the code in a reproducible way that with very little editing allows the same code to be used on an alternative data set by a stranger.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
2. Confidently manipulate data and R-objects
Individual study: Study material will teach the students different ways to manipulate large and heterogeneous data sets in a reproducible way. Students will then be given one of two data sets and be asked to organise the data in a specific way. As in the first theme, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
3. Check for and fix errors in data and code
Individual study: In this theme, students will learn some ways to identify and deal with errors in code and/or data sets. They will then receive a ‘buggy’ data set (optional: own data set), and using these skills and the knowledge gained in the rest of the course to clean and restructure the data in order to produce a set of specified figures. Again, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
Introduction: The course will start with a short online introduction where students and teachers will get to know each other and the structure of the course will be explained. Time will then be given for students to install and connect to the relevant software that will be used during the course.
1. Write reproducible code
Individual study: Here students will acquaint themselves with the very basics of using GitHub for code backup, archiving, sharing and editing, uploading their work to a course project site. Working in small groups, students will practice writing reproducible code using one of two sample data sets. The idea is that, while the functions used in the exercise should already be familiar to the students the students will write the code in a reproducible way that with very little editing allows the same code to be used on an alternative data set by a stranger.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
2. Confidently manipulate data and R-objects
Individual study: Study material will teach the students different ways to manipulate large and heterogeneous data sets in a reproducible way. Students will then be given one of two data sets and be asked to organise the data in a specific way. As in the first theme, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
3. Check for and fix errors in data and code
Individual study: In this theme, students will learn some ways to identify and deal with errors in code and/or data sets. They will then receive a ‘buggy’ data set (optional: own data set), and using these skills and the knowledge gained in the rest of the course to clean and restructure the data in order to produce a set of specified figures. Again, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
Syllabus and other information
Syllabus
PNG0100 Data Handling with R, 3.0 Credits
Subjects
Mathematical StatisticsEducation cycle
Postgraduate levelGrading scale
Pass / Failed
The requirements for attaining different grades are described in the course assessment criteria which are contained in a supplement to the course syllabus. Current information on assessment criteria shall be made available at the start of the course.
Prior knowledge
A basic knowledge of the R language, enough to use it as a statistical tool for research. For example, being able to read/write files, simple manipulation and indexing of R objects, basic analyses, plotting data. Admitted to PhD-studies.Objectives
The course aims at improving the effectiveness of the scientific code you write by tapping into generally less utilized capabilities of R and its software ecosystem. In particular the course will teach you how to write more ordered code that can be easily reused and incorporated into other projects and to deal with the automation of data and code handling tasks. This will allow you to save time and handle bigger data-sets.Learning outcomes
After completing the course, students should be able to:
Write reproducible code
•Basic use of GitHub and Rmarkdown.
•Write code that can be reused yourself or used and modified by a complete stranger
•Standardize code and data structure
Confidently manipulate data and R-objects (never touch excel ever again)
•Understand data manipulation tools such as the apply family
•Index, group and aggregate data
•Write simple loops and functions
Check for and fix errors in data and code
•Data cleaning and error checking (tests such as look for outliers, NAs, patterns)
•Diagnostic plotting
•Basic functions for debugging and most common errors
Content
The course will start with a short introductory workshop. Thereafter the content will be split across the three themes of the learning outcomes above. In each theme, the students will be given some learning materials and a task to complete. Students will have the opportunity to communicate online among themselves and with the teachers using asynchronous tools while working on each theme, before a class-wide online workshop where students will discuss the assignments and work on additional tasks. If the situation allows, it might be possible to organise an in-person, student-led session on the Ultuna campus before each workshop where a teacher can be present.Introduction: The course will start with a short online introduction where students and teachers will get to know each other and the structure of the course will be explained. Time will then be given for students to install and connect to the relevant software that will be used during the course.
1. Write reproducible code
Individual study: Here students will acquaint themselves with the very basics of using GitHub for code backup, archiving, sharing and editing, uploading their work to a course project site. Working in small groups, students will practice writing reproducible code using one of two sample data sets. The idea is that, while the functions used in the exercise should already be familiar to the students the students will write the code in a reproducible way that with very little editing allows the same code to be used on an alternative data set by a stranger.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
2. Confidently manipulate data and R-objects
Individual study: Study material will teach the students different ways to manipulate large and heterogeneous data sets in a reproducible way. Students will then be given one of two data sets and be asked to organise the data in a specific way. As in the first theme, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
3. Check for and fix errors in data and code
Individual study: In this theme, students will learn some ways to identify and deal with errors in code and/or data sets. They will then receive a ‘buggy’ data set (optional: own data set), and using these skills and the knowledge gained in the rest of the course to clean and restructure the data in order to produce a set of specified figures. Again, students will work via GitHub in small groups, checking and commenting on each other’s code for clarity and reproducibility.
Workshop: Students and teachers will discuss the students’ experiences in writing reproducible code and additional tasks will be provided.
Formats and requirements for examination
To receive course credits, students are required to have completed all individual exercises and played an active part in all workshops. If a student fails a test, the examiner may give the student a supplementary assignment, provided this is possible and there is reason to do so.If a student has been granted targeted study support because of a disability, the examiner has the right to offer the student an adapted test, or provide an alternative form of assessment.
If this course is discontinued, SLU will decide on transitional provisions for the examination of students admitted under this syllabus who have not yet been awarded a Pass grade.
For the assessment an independent project (degree project), the examiner may also allow a student to add supplemental information after the deadline for submission. For more information, please refer to the Education Planning and Administration Handbook.
Additional information
Apply for the course no later than 30 November 2021 by sending an email to Alistair Auffret: alistair.auffret@slu.se, who will lead the course together with Lorenzo Menichetti: lorenzo.menichetti@slu.se.Time table 2022:
11 January, Kick-off workshop
20 January, Workshop: Theme 1
4 February, Workshop: Theme 2
21 February, Workshop: Theme 3
Responsible department
Department of Ecology
Course facts
Subject:
Mathematical Statistics
Course code: PNG0100 Distance course: No
Responsible department: Department of Ecology