New search
PVG0033
Big data and machine learning (BDML)
ntroduction to big data
• Databases using NoSQL with a focus on MongoDB
• The Hadoop framework for distributed storage and computing
• The Spark cluster computing environment
• Introduction to R
• Unsupervised machine learning in R
• Supervised machine learning in R
During the course the participants will practice data management and analyses on their own data with plenty of opportunities to discuss the methodology and gain useful advice. Each day starts with lectures regarding theory, methods and their application on biological data. In addition to the lectures the course participants will work with data analysis of either provided data sets or using their own data. These laboratory exercises provide an important part of the course and students are required to submit lab reports that will be evaluated by the teachers. Before the course participants will have read several papers and texts regarding big data and machine learning as well as organizing their own data that they will be using during the course.
• Databases using NoSQL with a focus on MongoDB
• The Hadoop framework for distributed storage and computing
• The Spark cluster computing environment
• Introduction to R
• Unsupervised machine learning in R
• Supervised machine learning in R
During the course the participants will practice data management and analyses on their own data with plenty of opportunities to discuss the methodology and gain useful advice. Each day starts with lectures regarding theory, methods and their application on biological data. In addition to the lectures the course participants will work with data analysis of either provided data sets or using their own data. These laboratory exercises provide an important part of the course and students are required to submit lab reports that will be evaluated by the teachers. Before the course participants will have read several papers and texts regarding big data and machine learning as well as organizing their own data that they will be using during the course.
Syllabus and other information
Syllabus
PVG0033 Big data and machine learning (BDML), 3.0 Credits
Subjects
Mathematical StatisticsEducation cycle
Postgraduate levelGrading scale
Pass / Failed
Prior knowledge
Admitted to a postgraduate program in animal science, biology, veterinary medicine, informatics or related subjects, or to a residency program in veterinary science.The course is primarily intended for graduate students, but post-doctoral researchers are also welcome to attend.
Objectives
After the course the participants will be familiar with the concepts of big data and machine learning. The big data part covers databases, distributed storage, and parallel and cloud computing. Students will practice on information retrieval from large scale databases and become familiar with state-of-the-art tools for distributed computing. The machine learning part covers methods for unsupervised and supervised learning. Students will implement machine learning models mainly in the R programming language and use existing machine learning algorithms in order to analyze large and/or complex datasets, make predictions and estimate the uncertainty of these predictions. The participants will gain an ability to A. Collect and store big data. B. Manage databases and computational frameworks for large scale analysis. C. Use and understand up-to-date methods for unsupervised and supervised machine learning. D. Apply the methods to relevant biological data.Content
ntroduction to big data • Databases using NoSQL with a focus on MongoDB • The Hadoop framework for distributed storage and computing • The Spark cluster computing environment • Introduction to R • Unsupervised machine learning in R • Supervised machine learning in R During the course the participants will practice data management and analyses on their own data with plenty of opportunities to discuss the methodology and gain useful advice. Each day starts with lectures regarding theory, methods and their application on biological data. In addition to the lectures the course participants will work with data analysis of either provided data sets or using their own data. These laboratory exercises provide an important part of the course and students are required to submit lab reports that will be evaluated by the teachers. Before the course participants will have read several papers and texts regarding big data and machine learning as well as organizing their own data that they will be using during the course.Additional information
The applicants are, after approval to attend the course, requested to send a short summary with a project description focusing on type of data and goals of the studies.Responsible department
Department of Animal Breeding and Genetics