0% found this document useful (0 votes)
302 views

Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo

The document describes a chapter from a DataCamp course on genomics data visualization. It includes the title "Demo Matplotlib", a description of the chapter which teaches various Matplotlib plot types and customization, and links to additional slides. The chapter content introduces line plots in Matplotlib using sample code and exercises students to plot world population data from 1950-2100 and GDP/life expectancy data for different countries in 2007. Scatter plots are identified as better for assessing correlations between variables.

Uploaded by

Andrea Đokić
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
302 views

Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo

The document describes a chapter from a DataCamp course on genomics data visualization. It includes the title "Demo Matplotlib", a description of the chapter which teaches various Matplotlib plot types and customization, and links to additional slides. The chapter content introduces line plots in Matplotlib using sample code and exercises students to plot world population data from 1950-2100 and GDP/life expectancy data for different countries in 2007. Scatter plots are identified as better for assessing correlations between variables.

Uploaded by

Andrea Đokić
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

datacamp / courses-jhu-genomics-demo

master courses-jhu-genomics-demo / chapter1.md Find file Copy path

westonstearns Update chapter1.md 2b82c1f on Jun 8 2016

1 contributor

1286 lines (976 sloc) 45.7 KB

title : Demo Matplotlib description : Data Visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to
create meaningful and insightful plots. In this chapter, you will learn to build various types of plots and to customize them to
make them more visually appealing and interpretable. attachments : slides_link :
https://s3.amazonaws.com/assets.datacamp.com/course/intermediate_python/intermediate_python_ch1_slides.pdf
free_preview : TRUE

--- type:NormalExercise lang:python xp:100 skills:2 key:bd57d4ff35

Line plot (1)


With matplotlib, you can create a bunch of different plots in Python. The most basic plot is the line plot. A general recipe is
given here.

import matplotlib.pyplot as plt


plt.plot(x,y)
plt.show()

In the video, you already saw how much the world population has grown over the past years. Will it continue to do so? The
world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a
list called year , and the corresponding populations as a list called pop .

*** =instructions

• print() the last item from both the year and the pop list to see what the predicted population for the year 2100 is.
Use two print() functions.
• Before you can start, you should import matplotlib.pyplot as plt . pyplot is a sub-package of matplotlib , hence the
dot.
• Use plt.plot() to build a line plot. year should be mapped on the horizontal axis, pop on the vertical axis. Don't
forget to finish off with the show() function to actually display the plot.

*** =hint

• To access the last element of a regular Python list, use [-1] .


• You'll need matplotlib.pyplot in your import call.
• Use plt.plot() and plt.show() to build the line plot.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()
year = list(range(1950, 2101))
pop =
[2.53,2.57,2.62,2.67,2.71,2.76,2.81,2.86,2.92,2.97,3.03,3.08,3.14,3.2,3.26,3.33,3.4,3.47,3.54,3.62,3.69,3.77,3.84,3.92,4.,4.07,4.15,4.22,4.3,4

*** =sample_code
# Print the last item from year and pop

# Import matplotlib.pyplot as plt

# Make a line plot: year on the x-axis, pop on the y-axis

*** =solution

# Print the last item from years and populations


print(year[-1])
print(pop[-1])

# Import matplotlib.pyplot as plt


import matplotlib.pyplot as plt

# Make a line plot: year on the x-axis, pop on the y-axis


plt.plot(year, pop)
plt.show()

*** =sct

test_function("print", 1,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `year`, `year[-1]`.")
test_function("print", 2,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `pop`, `pop[-1]`.")

test_import("matplotlib.pyplot",
not_imported_msg = "You can import pyplot by using `import matplotlib.pyplot`.",
incorrect_as_msg = "You should set the correct alias for `matplotlib.pyplot`, import it `as plt`.")

msg = "Use `plt.plot(year, pop)` to plot what's instructed."


test_function("matplotlib.pyplot.plot",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Great! Let's interpret the plot you just created.")

--- type:MultipleChoiceExercise lang:python xp:50 skills:2 key:3d440a66c5

Line Plot (2): Interpretation


Have another look at the plot you created in the previous exercise; it's shown on the right. What is the first year in which there
will be more than ten billion human beings on this planet?

*** =instructions

• 2041
• 2062
• 2083
• 2094
*** =hint You can check the population for a particular year by checking out the plot. If you want the exact result, use
population[year.index(2030)] , to get the population for 2030, for example.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()
year = list(range(1950, 2101))
pop =
[2.53,2.57,2.62,2.67,2.71,2.76,2.81,2.86,2.92,2.97,3.03,3.08,3.14,3.2,3.26,3.33,3.4,3.47,3.54,3.62,3.69,3.77,3.84,3.92,4.,4.07,4.15,4.22,4.3,4
plt.plot(year,pop)
plt.show()

*** =sct

msg1 = "In 2041 the population wil be around 9 billion."


msg2 = "Correct! Time to take your data visualization skills to the next level!"
msg3 = "By 2083 the world population will have exceeded 10 billion for a long time already."
msg4 = "By 2094 the world population will have exceeded 10 billion for a long time already."
test_mc(2, msgs = [msg1, msg2, msg3, msg4])

--- type:NormalExercise lang:python xp:100 skills:2 key:6b262ab724

Line plot (3)


Now that you've built your first line plot, let's start working on the data that professor Hans Rosling used to build his beautiful
bubble chart. It was collected in 2007. Two lists are available for you:

• life_exp which contains the life expectancy for each country and
• gdp_cap , which contains the GDP per capita, for each country expressed in US Dollar.

GDP stands for Gross Domestic Product. It basically represents the size of the economy of a country. Divide this by the
population and you get the GDP per capita.

matplotlib.pyplot is already imported as plt , so you can get started straight away.

*** =instructions

• Print the last item from both the list gdp_cap , and the list life_exp ; it is information about Zimbabwe.
• Build a line chart, with gdp_cap on the x-axis, and life_exp on the y-axis. Does it make sense to plot this data on a line
plot?
• Don't forget to finish off with a plt.show() command, to actually display the plot.

*** =hint

• Use [-1] to get the last item of a list.


• As before, use plt.plot() . The first argument is gdp_cap , the second argumet is life_exp .
• Write plt.show() at the end of your script to actually display the plot.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()

df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)

*** =sample_code
# Print the last item of gdp_cap and life_exp

# Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis

# Display the plot

*** =solution

# Print the last item of gdp_cap and life_exp


print(gdp_cap[-1])
print(life_exp[-1])

# Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis


plt.plot(gdp_cap, life_exp)

# Display the plot


plt.show()

*** =sct

test_function("print", 1,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `gdp_cap`, `gdp_cap[-1]`.")
test_function("print", 2,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `life_exp`, `life_exp[-1]`.")
msg = "Use `plt.plot(gdp_cap, life_exp)` to plot what's instructed."
test_function("matplotlib.pyplot.plot",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."
test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)
success_msg("Well done, but this doesn't look right. Let's build a plot that makes more sense.")

--- type:NormalExercise lang:python xp:100 skills:2 key:90b213e819

Scatter Plot (1)


When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you're trying
to assess if there's a correlation between two variables, for example, the scatter plot is the better choice. Below is an example
of how to build a scatter plot.

import matplotlib.pyplot as plt


plt.scatter(x,y)
plt.show()

Let's continue with the gdp_cap versus life_exp plot, the GDP and life expectancy data for different countries in 2007.
Maybe a scatter plot will be a better alternative?

Again, the matploblib.pyplot package is available as plt .

*** =instructions

• Change the line plot that's coded in the script to a scatter plot.
• A correlation will become clear when you display the GDP per capita on a logarithmic scale. Add the line plt.xscale
('log') .
• Finish off your script with plt.show() to display the plot.

*** =hint

• In the plt.plot() command, change plot by scatter .


• Put plt.xscale('log') between the plt.scatter() and the plt.show() call.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()

df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)

*** =sample_code

# Change the line plot below to a scatter plot


plt.plot(gdp_cap, life_exp)

# Put the x-axis on a logarithmic scale

# Show plot

*** =solution

# Change the line plot below to a scatter plot


plt.scatter(gdp_cap, life_exp)

# Put the x-axis on a logarithmic scale


plt.xscale('log')

# Show plot
plt.show()

*** =sct

msg = "Use `plt.scatter(gdp_cap, life_exp)` to make a scatter plot."


test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use `plt.xscale('log')` to put the x-axis on a log scale."


test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Great! That looks much better!")

--- type:NormalExercise lang:python xp:100 skills:2 key:e11cc0f7ef


Scatter plot (2)
In the previous exercise, you saw that that the higher GDP usually corresponds to a higher life expectancy. In other words,
there is a positive correlation.

Do you think there's a relationship between population and life expectancy of a country? The list life_exp from the previous
exercise is already available. In addition, now also pop is available, listing the corresponding populations for the countries in
2007. The populations are in millions of people.

*** =instructions

• Start from scratch: import matplotlib.pyplot as plt .


• Build a scatter plot, where pop is mapped on the horizontal axis, and life_exp is mapped on the vertical axis.
• Finish the script with plt.show() to actually display the plot. Do you see a correlation?

*** =hint

• Use plt.scatter() , with the correct arguments, followed by plt.show() .

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()

df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
pop = list(df['population']/1000000)
life_exp = list(df.life_exp)

*** =sample_code

# Import package

# Build Scatter plot

# Show plot

*** =solution

# Import package
import matplotlib.pyplot as plt

# Build Scatter plot


plt.scatter(pop, life_exp)

# Show plot
plt.show()

*** =sct

test_import("matplotlib.pyplot",
not_imported_msg = "You can import pyplot by using `import matplotlib.pyplot`.",
incorrect_as_msg = "You should set the correct alias for `matplotlib.pyplot`, import it `as plt`.")

msg = "Use `plt.scatter(pop, life_exp)` to make a scatter plot."


test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Nice! There's no clear relationship between population and life expectancy, which makes perfect
sense.")

--- type:VideoExercise lang:python xp:50 skills:2 key:5c0d466cfa

Histograms
*** =video_link //player.vimeo.com/video/148376003

--- type:NormalExercise lang:python xp:100 skills:2 key:1c9335a8ac

Build a histogram (1)


life_exp , the list containing data on the life expectancy for different countries in 2007, is available in your Python shell.

To see how life expectancy in different countries is distributed, let's create a histogram of life_exp .

matplotlib.pyplot is already available as plt .

*** =instructions

• Use plt.hist() to create a histogram of the values in life_exp . Do not specify the number of bins; Python will set the
number of bins to 10 by default for you.
• Add plt.show() to actually display the histogram. Can you tell which bin contains the most observations?

*** =hint

• plt.hist(life_exp) will create a histogram of the data in life_exp .

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
life_exp = list(df.life_exp)

*** =sample_code

# Create histogram of life_exp data

# Display histogram

*** =solution

# Create histogram of life_exp data


plt.hist(life_exp)

# Display histogram
plt.show()

*** =sct

msg = "For the first histogram, use `plt.hist(life_exp1950)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 1, not_called_msg = msg, incorrect_msg = msg)
msg = "Have you included [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to
actually display the plot?"
test_function("matplotlib.pyplot.show", 1, not_called_msg = msg, incorrect_msg = msg)

success_msg("Great job!")

--- type:NormalExercise lang:python xp:100 skills:2 key:e23a209115

Build a histogram (2): bins


In the previous exercise, you didn't specify the number of bins. By default, Python sets the number of bins to 10 in that case.
The number of bins is pretty important. Too little bins oversimplifies reality, which doesn't show you the details. Too much
bins overcomplicates reality and doesn't give the bigger picture.

To control the number of bins to divide your data in, you can set the bins argument.

That's exactly what you'll do in this exercise. You'll be making two plots here. The code in the script already includes plt.show
() and plt.clf() calls; plt.show() displays a plot; plt.clf() cleans it up again so you can start afresh.

As before, life_exp is available and matploblib.pyplot is imported as plt .

*** =instructions

• Build a histogram of life_exp , with 5 bins. Can you tell which bin contains the most observations?
• Build another histogram of life_exp , this time with 20 bins. Is this better?

*** =hint

• In both cases, your call should look like this: plt.hist(life_exp, bins = ___) . Make sure to specify ___ appropriately.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
life_exp = list(df.life_exp)

*** =sample_code

# Build histogram with 5 bins

# Show and clean up plot


plt.show()
plt.clf()

# Build histogram with 20 bins

# Show and clean up again


plt.show()
plt.clf()

*** =solution

# Build histogram with 5 bins


plt.hist(life_exp, bins = 5)

# Show and clear plot


plt.show()
plt.clf()

# Build histogram with 20 bins


plt.hist(life_exp, bins = 20)

# Show and clear plot again


plt.show()
plt.clf()

*** =sct

msg = "For the first histogram, use `plt.hist(life_exp, bins = 5)`."


test_function("matplotlib.pyplot.hist", 1,
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove any [`plt.show()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) functions."
test_function("matplotlib.pyplot.show", 1,
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove the [`plt.clf()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.clf) functions."
test_function("matplotlib.pyplot.clf", 1,
not_called_msg = msg,
incorrect_msg = msg)

msg = "For the second histogram, use `plt.hist(life_exp, bins = 20)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 2,
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove any [`plt.show()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) functions."
test_function("matplotlib.pyplot.show", 2,
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove the [`plt.clf()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.clf) functions."
test_function("matplotlib.pyplot.clf", 2,
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Nice! You can use the buttons to browse through the different plots you've created.")

--- type:NormalExercise lang:python xp:100 skills:2 key:896519cdaa

Build a histogram (3): compare


In the video, you saw population pyramids for the present day and for the future. Because we were using a histogram, it was
very easy to make a comparison.

Let's do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to
a second list now, life_exp1950 , containing similar data for 1950. Can you make a histogram for both datasets?

You'll again be making two plots. The plt.show() and plt.clf() commands to render everything nicely are already
included. Also matplotlib.pyplot is imported for you, as plt .

*** =instructions

• Build a histogram of life_exp with 15 bins.


• Build a histogram of life_exp1950 , also with 15 bins. Is there a big difference with the histogram for the 2007 data?

*** =hint

• plt.hist(life_exp) is not enough: you'll also need to specify the bins argument!
• The code to build the histogram fro the 1950 data is the same as for the 2007 data, except for the name of the list you
want to plot the data for.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import pandas as pd
plt.clf()
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
life_exp = list(df.life_exp)
life_exp1950 =
[28.8,55.23,43.08,30.02,62.48,69.12,66.8,50.94,37.48,68.0,38.22,40.41,53.82,47.62,50.92,59.6,31.98,39.03,39.42,38.52,68.75,35.46,38.09,54.74,4

*** =sample_code

# Histogram of life_exp, 15 bins

# Show and clear plot


plt.show()
plt.clf()

# Histogram of life_exp1950, 15 bins

# Show and clear plot again


plt.show()
plt.clf()

*** =solution

# Histogram of life_exp, 15 bins


plt.hist(life_exp, bins = 15)

# Show and clear plot


plt.show()
plt.clf()

# Histogram of life_exp1950, 15 bins


plt.hist(life_exp1950, bins = 15)

# Show and clear plot again


plt.show()
plt.clf()

*** =sct

msg = "For the first histogram, use `plt.hist(life, bins = 15)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 1, not_called_msg = msg, incorrect_msg = msg)

msg = "You don't have to change or remove any [`plt.show()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) functions."
test_function("matplotlib.pyplot.show", 1, not_called_msg = msg, incorrect_msg = msg)

msg = "You don't have to change or remove the first [`plt.clf()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.clf) functions."
test_function("matplotlib.pyplot.clf", 1, not_called_msg = msg, incorrect_msg = msg)

msg = "For the second histogram, use `plt.hist(life_exp1950, bins = 15)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 2, not_called_msg = msg, incorrect_msg = msg)

msg = "You don't have to change or remove any [`plt.show()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) functions."
test_function("matplotlib.pyplot.show", 2, not_called_msg = msg, incorrect_msg = msg)
msg = "You don't have to change or remove the second [`plt.clf()`]
(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.clf) functions."
test_function("matplotlib.pyplot.clf", 2, not_called_msg = msg, incorrect_msg = msg)

success_msg("Great! Neither one of these histograms is useful to better understand the life expectancy data.")

--- type:MultipleChoiceExercise lang:python xp:50 skills:2 key:c74a9400bf

Choose the right plot (1)


You're a professor teaching Data Science with Python, and you want to visually assess if the grades on your exam follow a
normal distribution. Which plot do you use?

*** =instructions

• Line plot
• Scatter plot
• Histogram

*** =hint The word distribution should ring a bell!

*** =pre_exercise_code

# pec

*** =sct

msg1 = "Incorrect. A line plot won't give you a very clear visual on the distribution of the values in a list."
msg2 = "Incorrect. Although a scatter plot can give you an idea, it's not the best option."
msg3 = "Excellent choice!"
test_mc(3, msgs = [msg1, msg2, msg3])

--- type:MultipleChoiceExercise lang:python xp:50 skills:2 key:6344d3ddd3

Choose the right plot (2)


You're a professor in Data Analytics with Python, and you want to visually assess if longer answers on exam questions lead to
higher grades. Which plot do you use?

*** =instructions

• Line plot
• Scatter plot
• Histogram

*** =hint Do you still remember which type of plot is most suited if you want to see if there's a correlation between two
variables?

*** =pre_exercise_code

# pec

*** =sct

msg1 = "Making a line plot of this data will cause the lines to be all over the place. Do you still remember how
the `gdp_cap` versus `life_exp` plot wasn't a good fit for the line plot?"
msg2 = "Good choice!"
msg3 = "There's two variables involved: time to take the exam, and the corresponding grades. A histogram is not a
suitable option in this case."
test_mc(2, msgs = [msg1, msg2, msg3])

--- type:VideoExercise lang:python xp:50 skills:2 key:ec967d615b

Customization
*** =video_link //player.vimeo.com/video/148376009

--- type:NormalExercise lang:python xp:100 skills:2 key:b376a92c55

Labels
It's time to customize your own plot. This is the fun part, you will see your plot come to life!

You're going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life
expectancy on the y-axis. The code for this plot is available in the script.

As a first step, let's add axis labels and a title to the plot. You can do this with the xlabel() , ylabel() and title()
functions, available in matplotlib.pyplot . This sub-package is already imported as plt .

*** =instructions

• The strings xlab and ylab are already set for you. Use these variables to set the label of the x- and y-axis.
• The string title is also coded for you. Use it to add a title to the plot.
• After these customizations, finish the script with plt.show() to actually display the plot.

*** =hint

• For the first instruction, use plt.xlabel(xlab) and plt.ylabel(ylab) .


• Use plt.title(title) to add a title to your plot.
• Don't forget to finish of with plt.show() to display the plot.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


import importlib
importlib.reload(plt)

plt.clf()

import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)

*** =sample_code

# Basic scatter plot, log scale


plt.scatter(gdp_cap, life_exp)
plt.xscale('log')

# Strings
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'

# Add axis labels

# Add title
# After customizing, display the plot

*** =solution

# Basic scatter plot, log scale


plt.scatter(gdp_cap, life_exp)
plt.xscale('log')

# Strings
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'

# Add axis labels


plt.xlabel(xlab)
plt.ylabel(ylab)

# Add title
plt.title(title)

# After customizing, display the plot


plt.show()

*** =sct

msg = "You don't have to change or remove the predefined scatter plot."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove the predefined `xscale` function."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Add the correct label to the x-axis using `plt.xlabel(...)`. Fill in the dots with a pre-defined variable."
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Add the correct label to the y-axis using `plt.ylabel(...)`. Fill in the dots with a pre-defined variable."
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Add the correct titel to the plot using `plt.title(...)`. Fill in the dots with a pre-defined variable"
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot in


the end."
test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("This looks much better already!")

--- type:NormalExercise lang:python xp:100 skills:2 key:5a5e2de1a1

Ticks
The customizations you've coded up to now are available in the script, in a more concise form.

In the video, Filip has demonstrated how you could control the y-ticks by specifying two arguments:
plt.yticks([0,1,2], ["one","two","three"])

In this example, the ticks corresponding to the numbers 0, 1 and 2 will be replaced by one, two and three, respectively.

Let's do a similar thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000 ,
10000 and 100000 should be replaced by 1k , 10k and 100k . To this end, two lists have already been created for you:
tick_val and tick_lab .

*** =instructions

• Use tick_val and tick_lab as inputs to the xticks() function to make the the plot more readable.
• As usual, display the plot with plt.show() after you've added the customizations.

*** =hint

• The first argument of xticks() is tick_val , the second argument is tick_lab .

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()

import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)

*** =sample_code

# Scatter plot
plt.scatter(gdp_cap, life_exp)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')

# Definition of tick_val and tick_lab


tick_val = [1000,10000,100000]
tick_lab = ['1k','10k','100k']

# Adapt the ticks on the x-axis

# After customizing, display the plot

*** =solution

# Scatter plot
plt.scatter(gdp_cap, life_exp)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')

# Definition of tick_val and tick_lab


tick_val = [1000,10000,100000]
tick_lab = ['1k','10k','100k']

# Adapt the ticks on the x-axis


plt.xticks(tick_val, tick_lab)

# After customizing, display the plot


plt.show()

*** =sct

msg = "You don't have to change or remove the predefined scatter plot."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Don't change any of the customizations that were already coded."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Set the correct ticks using [`plt.xticks()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks). `tick_val` should be the first argument,
`tick_lab` should be the second."
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Great! Your plot is shaping up nicely!")

--- type:NormalExercise lang:python xp:100 skills:2 key:1f075b9f03

Sizes
Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let's change this. Wouldn't it be nice
if the size of the dots corresponds to the population?

To accomplish this, there is a list pop loaded in your workspace. It contains population numbers for each country expressed
in millions. You can see that this list is added to the scatter method, as the argument s , for size.

*** =instructions

• Run the script to see how the plot changes.


• Looks good, but increasing the size of the bubbles will make things stand out more.
◦ Import the numpy package as np .
◦ Use np.array() to create a numpy array from the list pop . Call this Numpy array np_pop .
◦ Double the values in np_pop by assigning np_pop * 2 to np_pop again. Because np_pop is a Numpy array, each
array element will be doubled.
◦ Change the s argument inside plt.scatter() to be np_pop instead of pop .

*** =hint To convert pop to an array, use np.array(pop) .

*** =pre_exercise_code
import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)
import numpy as np
plt.clf()

# data
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
pop = list(df['population']/1e6)

*** =sample_code

# Import numpy as np

# Store pop as a numpy array: np_pop

# Double np_pop

# Update: set s argument to np_pop


plt.scatter(gdp_cap, life_exp, s = pop)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])

# Display the plot


plt.show()

*** =solution

# Import numpy as np
import numpy as np

# Store pop as a numpy array: np_pop


np_pop = np.array(pop)

# Double np_pop
np_pop = np_pop * 2

# Update: set s argument to np_pop


plt.scatter(gdp_cap, life_exp, s = np_pop)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])

# Display the plot


plt.show()

*** =sct

msg = "Have you correctly imported the `numpy` package as `np`?"


test_import("numpy", 1,
not_imported_msg = msg,
incorrect_as_msg = msg)
msg = "You should use `np.array(pop)` to create a Numpy array from `pop`."
test_function("numpy.array",
not_called_msg = msg,
incorrect_msg = msg)

msg = 'Make sure you correctly create `np_pop` and multiply it with `2`.'
test_object("np_pop",
undefined_msg = msg,
incorrect_msg = msg)

msg = "Change `plt.scatter(gdp_cap, life_exp, s = ...)`; use `np_pop` instead of `pop`."


test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)

msg = "You don't have to change or remove the customizations that were already coded for you."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Bellissimo! Can you already tell which bubbles correspond to which countries?")

--- type:NormalExercise lang:python xp:100 skills:2 key:6180ea1a99

Colors
The code you've written up to now is available in the script on the right.

The next step is making the plot more colorful! To do this, a list col has been created for you. It's a list with a color for each
corresponding country, depending on the continent the country is part of.

How did we make the list col you ask? The Gapminder data contains a list continent with the continent each country
belongs to. A dictionary is constructed that maps continents onto colors:

dict = {
'Asia':'red',
'Europe':'green',
'Africa':'blue',
'Americas':'yellow',
'Oceania':'black'
}

Nothing to worry about now; you will learn about dictionaries in the next chapter.

*** =instructions

• Add c = col to the arguments of the plt.scatter() function.


• Change the opacity of the bubbles by setting the alpha argument to 0.8 inside plt.scatter() . Alpha can be set from
zero to one, where zero totally transparant, and one is not transparant.

*** =hint

• Add c = color to plt.scatter() .


• Add alpha = 0.8 to plt.scatter() .

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()
import numpy as np
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
pop = list(df['population']/1e6)
cont = list(df.cont)
lut = {
'Asia':'red',
'Europe':'green',
'Africa':'blue',
'Americas':'yellow',
'Oceania':'black'
}
col = [lut[x] for x in cont]

*** =sample_code

# Specify c and alpha inside plt.scatter()


plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Show the plot


plt.show()

*** =solution

# Specify c and alpha inside plt.scatter()


plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Show the plot


plt.show()

*** =sct

msg = "Change `plt.scatter(gdp_cap, life_exp, s = size)` by specifying two more arguments. Add `c = col` and
`alpha = 0.8`."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)
msg = "You don't have to change or remove the customizations that you already coded earlier."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

success_msg("Nice! This is looking more and more like Hans Rosling's plot!")

--- type:NormalExercise lang:python xp:100 skills:2 key:4c5e5e0a8f

Additional Customizations
If you have another look at the script, under # Additional Customizations , you'll see that there are two plt.text() functions
now. They add the words "India" and "China" in the plot.

*** =instructions

• Add plt.grid(True) after the plt.text() calls so that gridlines are drawn on the plot.

*** =hint

• Simply put a new line, plt.grid(True) in the script and hit Submit Answer

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()
import numpy as np
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
pop = list(df['population']/1e6)
cont = list(df.cont)
lut = {
'Asia':'red',
'Europe':'green',
'Africa':'blue',
'Americas':'yellow',
'Oceania':'black'
}
col = [lut[x] for x in cont]

*** =sample_code

# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')

# Add grid() call

# Show the plot


plt.show()

*** =solution

# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])

# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')

# Add grid() call


plt.grid(True)

# Show the plot


plt.show()

*** =sct

msg = "Do not change the [`plt.scatter()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter)


call."
test_function("matplotlib.pyplot.scatter", not_called_msg = msg, incorrect_msg = msg)

msg = "Don't have change or remove the customizations that have already been coded for you."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.text", 1,
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.text", 2,
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)

msg = "Have you correctly added the [`plt.grid()`]


(http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.grid) call?"
test_function("matplotlib.pyplot.grid", not_called_msg = msg, incorrect_msg = msg)

msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."


test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)

--- type:MultipleChoiceExercise lang:python xp:50 skills:2 key:860968968a

Interpretation
If you have a look at your colorful plot, it's clear that people live longer in countries with a higher GDP per capita. No high
income countries have really short life expectancy, and no low income countries have very long life expectancy. Still, there is a
huge difference in life expectancy between countries on the same income level. Most people live in middle income countries
where difference in lifespan is huge between countries; depending on how income is distributed and how it is used.

What can you say about the plot?

*** =instructions

• The countries in blue, corresponding to Africa, have both low life expectancy and a low GDP per capita.
• There is a negative correlation between GDP per capita and life expectancy.
• China has both a lower GDP per capita and low expectancy compared to India.

*** =hint

• There is a clearly positive correlation, so you can rule out the second option.

*** =pre_exercise_code

import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)


plt.clf()
import numpy as np
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
pop = list(df['population']/1e6)
cont = list(df.cont)
lut = {
'Asia':'red',
'Europe':'green',
'Africa':'blue',
'Americas':'yellow',
'Oceania':'black'
}
col = [lut[x] for x in cont]
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
plt.grid(True)
plt.show()

*** =sct

msg1 = "Correct! Up to the next chapter, on dictionaries!"


msg2 = "Incorrect; there is a clearly positive correlation between GDP per capita and life expectancy."
msg3 = "China's GDP per capita and life expectancy is higher than that of India."
test_mc(1, msgs = [msg1, msg2, msg3])

You might also like