# Data manipulation, analysis and visualisation in Python

## Introduction

The handling of data is a recurring task for data analysts. Reading in experimental data, checking its properties,
and creating visualisations are crucial steps in the research process. Hence, increasing the efficiency in this process is beneficial for professionals
handling data. Spreadsheet-based software lacks the ability to properly support this process, due to the lack of automation and repeatability.
The usage of a high-level scripting language such as Python is ideal for these tasks.

This course trains participants to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning of tabular data,
explorative analysis and visualisation using important packages such as Pandas, Matplotlib and Seaborn.

The course does not cover statistics, data mining, machine learning, or predictive modelling. It aims to provide participants the means to effectively
tackle commonly encountered data handling tasks in order to increase the overall efficiency. These skills are both useful for data cleaning as well as
feature engineering.

The course has been developed as a course for the Transferable Skills course - Doctoral School of Ghent University, but can be taught to others upon request.

## Course info

### Aim & scope

This course is intended for researchers that have at least basic programming skills. A basic (scientific) programming course that is part of
the regular curriculum should suffice. For those who have experience in another programming language (e.g. Matlab, R, ...), following a Python
tutorial prior to the course is advised.

The course is intended for professionals who wish to enhance their general data manipulation and visualization skills in Python, with a specific
focus on tabular data. The course is NOT intended to be a course on statistics or machine learning.

### Program

After setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter
notebook environment, the data analysis package Pandas and the plotting packages Matplotlib and Seaborn are introduced. Advanced usage of Pandas
for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world
data sets. Applications include time series handling, categorical data, merging data, tidy data,...

The course closes with a discussion on the scientific Python ecosystem and the visualisation landscape learning
participants to create interactive charts.

## Getting started

The course uses Python 3 and some data analysis packages such as Pandas, Seaborn, Numpy and Matplotlib. To install the required libraries,
we recommend Anaconda or miniconda ([https://www.anaconda.com/download/](https://www.anaconda.com/download/)) or another Python distribution that
includes the scientific libraries (this recommendation applies to all platforms, so for both Window, Linux and Mac).

For detailed instructions to get started on your local machine, see the [setup instructions](./setup.html).

In case you do not want to install everything and just want to try out the course material, use the environment setup by
Binder [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/plovercode/DS-python-data-analysis/HEAD) and open de notebooks
rightaway (inside the `notebooks` directory).

## Slides

For the course slides, click [here](https://plovercode.github.io/DS-python-data-analysis/slides.html).

## Contributing

Found any typo or have a suggestion, see [how to contribute](./contributing.html).

## Meta

Authors: Joris Van den Bossche, Stijn Van Hoey

With the support of the Flemish Government.

<img src="./static/img/logo_flanders+richtingmorgen.png" width="79%">
<img src="./static/img/doctoralschoolsprofiel_hq_rgb_web.png" width="20%">

