Ny sökning
PVG0045

Introduction to Python for data science

In the big data era, programming skills are required in order to efficiently handle datasets. Scientists in natural sciences are commonly exposed to big datasets facing a most demanding task. It is most common that SLU PhD students have to analyze datasets that require different skillsets to traditional tools like Excel. Moreover, a common situation requires datasets to be transformed in various formats in order to be analyzed by specialized software. The latter in the case of big data can be achieved only programmatically. Python is currently the most popular programming language for data science. The latter could not be possible without the Pandas library which greatly facilitates a wide range of operations needed for data analysis, like transforming data format, combining data stored in different files and producing insightful summaries regarding data quality and interpretation. Moreover, the extensive graphic-related ecosystem of Python like the Seaborn library offers tremendous possibilities for constructing informative graphs both for facilitating data interpretation and for publication purposes.



The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar.



Expected study time

Total: 54 hours

Own study prior to course: 10 hours

Lectures: 14 hours

Computer assignments: 30 hours

Kursplan

PVG0045 Introduction to Python for data science, 2,0 Hp

Ämnen

Husdjursvetenskap

Utbildningens nivå

Forskarnivå

Förkunskapskrav

Admitted to a PhD or residency program in biology, medicine, nursing, veterinary medicine, animal science, food science, nutrition or similar topics. No prior programming experience is required.

Mål

After completing this course, the students should be able to:



• Explain basic Python syntax

• Write simple functions in Python

• Transform data in various formats using the Pandas library

• Combine data sources stored in different files using the Pandas library

• Produce insightful summaries of datasets using the Pandas library

• Produce publication quality plots using the Seaborn library



Innehåll

In the big data era, programming skills are required in order to efficiently handle datasets. Scientists in natural sciences are commonly exposed to big datasets facing a most demanding task. It is most common that SLU PhD students have to analyze datasets that require different skillsets to traditional tools like Excel. Moreover, a common situation requires datasets to be transformed in various formats in order to be analyzed by specialized software. The latter in the case of big data can be achieved only programmatically. Python is currently the most popular programming language for data science. The latter could not be possible without the Pandas library which greatly facilitates a wide range of operations needed for data analysis, like transforming data format, combining data stored in different files and producing insightful summaries regarding data quality and interpretation. Moreover, the extensive graphic-related ecosystem of Python like the Seaborn library offers tremendous possibilities for constructing informative graphs both for facilitating data interpretation and for publication purposes.



The course format will include morning lectures that will be followed by practical exercises. The interactive development environment of Jupyter (www.jupyter.org) will be used throughout the course. Basic Python syntax will be introduced and thereafter students will gradually build core data science related skills. In particular, the students will be introduced to the Pandas library and practice data manipulation and aggregation techniques in large datasets. Finally, the students will gain experience in producing informative graphs using the Seaborn library or similar.



Expected study time

Total: 54 hours

Own study prior to course: 10 hours

Lectures: 14 hours

Computer assignments: 30 hours

Examinationsformer och fordringar för godkänd kurs

Approved computer assignments

Ytterligare information

The course will take place in the form of distance learning using Zoom or a similar platform. The course consists of five full-day meetings comprised of lectures and computer exercises. In addition, the students are expected to do individual work before the start of the course and in between meetings.

Ansvarig institution/motsvarande

Institutionen för Husdjursgenetik