Ny sökning
PVG0033

Big data och maskininlärning

ntroduction to big data

• Databases using NoSQL with a focus on MongoDB

• The Hadoop framework for distributed storage and computing

• The Spark cluster computing environment

• Introduction to R

• Unsupervised machine learning in R

• Supervised machine learning in R



During the course the participants will practice data management and analyses on their own data with plenty of opportunities to discuss the methodology and gain useful advice. Each day starts with lectures regarding theory, methods and their application on biological data. In addition to the lectures the course participants will work with data analysis of either provided data sets or using their own data. These laboratory exercises provide an important part of the course and students are required to submit lab reports that will be evaluated by the teachers. Before the course participants will have read several papers and texts regarding big data and machine learning as well as organizing their own data that they will be using during the course.

Kursplan

PVG0033 Big data och maskininlärning, 3,0 Hp

Ämnen

Matematisk statistik

Utbildningens nivå

Forskarnivå

Förkunskapskrav

Admitted to a postgraduate program in animal science, biology, veterinary medicine, informatics or related subjects, or to a residency program in veterinary science.

The course is primarily intended for graduate students, but post-doctoral researchers are also welcome to attend.

Mål

After the course the participants will be familiar with the concepts of big data and machine learning. The big data part covers databases, distributed storage, and parallel and cloud computing. Students will practice on information retrieval from large scale databases and become familiar with state-of-the-art tools for distributed computing. The machine learning part covers methods for unsupervised and supervised learning. Students will implement machine learning models mainly in the R programming language and use existing machine learning algorithms in order to analyze large and/or complex datasets, make predictions and estimate the uncertainty of these predictions.

The participants will gain an ability to

A. Collect and store big data.

B. Manage databases and computational frameworks for large scale analysis.

C. Use and understand up-to-date methods for unsupervised and supervised machine learning.

D. Apply the methods to relevant biological data.

Innehåll

ntroduction to big data

• Databases using NoSQL with a focus on MongoDB

• The Hadoop framework for distributed storage and computing

• The Spark cluster computing environment

• Introduction to R

• Unsupervised machine learning in R

• Supervised machine learning in R



During the course the participants will practice data management and analyses on their own data with plenty of opportunities to discuss the methodology and gain useful advice. Each day starts with lectures regarding theory, methods and their application on biological data. In addition to the lectures the course participants will work with data analysis of either provided data sets or using their own data. These laboratory exercises provide an important part of the course and students are required to submit lab reports that will be evaluated by the teachers. Before the course participants will have read several papers and texts regarding big data and machine learning as well as organizing their own data that they will be using during the course.

Examinationsformer och fordringar för godkänd kurs

Active participation in minimum 80% of the course activities.

Approved computer-lab exercises.

Ytterligare information

The applicants are, after approval to attend the course, requested to send a short summary with a project description focusing on type of data and goals of the studies.

Ansvarig institution/motsvarande

Institutionen för Husdjursgenetik