Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo
Line Plot (1) : Datacamp Courses-Jhu-Genomics-Demo
1 contributor
title : Demo Matplotlib description : Data Visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to
create meaningful and insightful plots. In this chapter, you will learn to build various types of plots and to customize them to
make them more visually appealing and interpretable. attachments : slides_link :
https://s3.amazonaws.com/assets.datacamp.com/course/intermediate_python/intermediate_python_ch1_slides.pdf
free_preview : TRUE
In the video, you already saw how much the world population has grown over the past years. Will it continue to do so? The
world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a
list called year , and the corresponding populations as a list called pop .
*** =instructions
• print() the last item from both the year and the pop list to see what the predicted population for the year 2100 is.
Use two print() functions.
• Before you can start, you should import matplotlib.pyplot as plt . pyplot is a sub-package of matplotlib , hence the
dot.
• Use plt.plot() to build a line plot. year should be mapped on the horizontal axis, pop on the vertical axis. Don't
forget to finish off with the show() function to actually display the plot.
*** =hint
*** =pre_exercise_code
*** =sample_code
# Print the last item from year and pop
*** =solution
*** =sct
test_function("print", 1,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `year`, `year[-1]`.")
test_function("print", 2,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `pop`, `pop[-1]`.")
test_import("matplotlib.pyplot",
not_imported_msg = "You can import pyplot by using `import matplotlib.pyplot`.",
incorrect_as_msg = "You should set the correct alias for `matplotlib.pyplot`, import it `as plt`.")
*** =instructions
• 2041
• 2062
• 2083
• 2094
*** =hint You can check the population for a particular year by checking out the plot. If you want the exact result, use
population[year.index(2030)] , to get the population for 2030, for example.
*** =pre_exercise_code
*** =sct
• life_exp which contains the life expectancy for each country and
• gdp_cap , which contains the GDP per capita, for each country expressed in US Dollar.
GDP stands for Gross Domestic Product. It basically represents the size of the economy of a country. Divide this by the
population and you get the GDP per capita.
matplotlib.pyplot is already imported as plt , so you can get started straight away.
*** =instructions
• Print the last item from both the list gdp_cap , and the list life_exp ; it is information about Zimbabwe.
• Build a line chart, with gdp_cap on the x-axis, and life_exp on the y-axis. Does it make sense to plot this data on a line
plot?
• Don't forget to finish off with a plt.show() command, to actually display the plot.
*** =hint
*** =pre_exercise_code
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
*** =sample_code
# Print the last item of gdp_cap and life_exp
*** =solution
*** =sct
test_function("print", 1,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `gdp_cap`, `gdp_cap[-1]`.")
test_function("print", 2,
incorrect_msg = "Use [`print()`](https://docs.python.org/3/library/functions.html#print) to print the last
element of `life_exp`, `life_exp[-1]`.")
msg = "Use `plt.plot(gdp_cap, life_exp)` to plot what's instructed."
test_function("matplotlib.pyplot.plot",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Use [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to show the plot."
test_function("matplotlib.pyplot.show",
not_called_msg = msg,
incorrect_msg = msg)
success_msg("Well done, but this doesn't look right. Let's build a plot that makes more sense.")
Let's continue with the gdp_cap versus life_exp plot, the GDP and life expectancy data for different countries in 2007.
Maybe a scatter plot will be a better alternative?
*** =instructions
• Change the line plot that's coded in the script to a scatter plot.
• A correlation will become clear when you display the GDP per capita on a logarithmic scale. Add the line plt.xscale
('log') .
• Finish off your script with plt.show() to display the plot.
*** =hint
*** =pre_exercise_code
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
*** =sample_code
# Show plot
*** =solution
# Show plot
plt.show()
*** =sct
Do you think there's a relationship between population and life expectancy of a country? The list life_exp from the previous
exercise is already available. In addition, now also pop is available, listing the corresponding populations for the countries in
2007. The populations are in millions of people.
*** =instructions
*** =hint
*** =pre_exercise_code
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
pop = list(df['population']/1000000)
life_exp = list(df.life_exp)
*** =sample_code
# Import package
# Show plot
*** =solution
# Import package
import matplotlib.pyplot as plt
# Show plot
plt.show()
*** =sct
test_import("matplotlib.pyplot",
not_imported_msg = "You can import pyplot by using `import matplotlib.pyplot`.",
incorrect_as_msg = "You should set the correct alias for `matplotlib.pyplot`, import it `as plt`.")
success_msg("Nice! There's no clear relationship between population and life expectancy, which makes perfect
sense.")
Histograms
*** =video_link //player.vimeo.com/video/148376003
To see how life expectancy in different countries is distributed, let's create a histogram of life_exp .
*** =instructions
• Use plt.hist() to create a histogram of the values in life_exp . Do not specify the number of bins; Python will set the
number of bins to 10 by default for you.
• Add plt.show() to actually display the histogram. Can you tell which bin contains the most observations?
*** =hint
*** =pre_exercise_code
*** =sample_code
# Display histogram
*** =solution
# Display histogram
plt.show()
*** =sct
msg = "For the first histogram, use `plt.hist(life_exp1950)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 1, not_called_msg = msg, incorrect_msg = msg)
msg = "Have you included [`plt.show()`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show) to
actually display the plot?"
test_function("matplotlib.pyplot.show", 1, not_called_msg = msg, incorrect_msg = msg)
success_msg("Great job!")
To control the number of bins to divide your data in, you can set the bins argument.
That's exactly what you'll do in this exercise. You'll be making two plots here. The code in the script already includes plt.show
() and plt.clf() calls; plt.show() displays a plot; plt.clf() cleans it up again so you can start afresh.
*** =instructions
• Build a histogram of life_exp , with 5 bins. Can you tell which bin contains the most observations?
• Build another histogram of life_exp , this time with 20 bins. Is this better?
*** =hint
• In both cases, your call should look like this: plt.hist(life_exp, bins = ___) . Make sure to specify ___ appropriately.
*** =pre_exercise_code
*** =sample_code
*** =solution
*** =sct
msg = "For the second histogram, use `plt.hist(life_exp, bins = 20)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 2,
not_called_msg = msg,
incorrect_msg = msg)
success_msg("Nice! You can use the buttons to browse through the different plots you've created.")
Let's do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to
a second list now, life_exp1950 , containing similar data for 1950. Can you make a histogram for both datasets?
You'll again be making two plots. The plt.show() and plt.clf() commands to render everything nicely are already
included. Also matplotlib.pyplot is imported for you, as plt .
*** =instructions
*** =hint
• plt.hist(life_exp) is not enough: you'll also need to specify the bins argument!
• The code to build the histogram fro the 1950 data is the same as for the 2007 data, except for the name of the list you
want to plot the data for.
*** =pre_exercise_code
*** =sample_code
*** =solution
*** =sct
msg = "For the first histogram, use `plt.hist(life, bins = 15)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 1, not_called_msg = msg, incorrect_msg = msg)
msg = "For the second histogram, use `plt.hist(life_exp1950, bins = 15)` to plot what's requested."
test_function("matplotlib.pyplot.hist", 2, not_called_msg = msg, incorrect_msg = msg)
success_msg("Great! Neither one of these histograms is useful to better understand the life expectancy data.")
*** =instructions
• Line plot
• Scatter plot
• Histogram
*** =pre_exercise_code
# pec
*** =sct
msg1 = "Incorrect. A line plot won't give you a very clear visual on the distribution of the values in a list."
msg2 = "Incorrect. Although a scatter plot can give you an idea, it's not the best option."
msg3 = "Excellent choice!"
test_mc(3, msgs = [msg1, msg2, msg3])
*** =instructions
• Line plot
• Scatter plot
• Histogram
*** =hint Do you still remember which type of plot is most suited if you want to see if there's a correlation between two
variables?
*** =pre_exercise_code
# pec
*** =sct
msg1 = "Making a line plot of this data will cause the lines to be all over the place. Do you still remember how
the `gdp_cap` versus `life_exp` plot wasn't a good fit for the line plot?"
msg2 = "Good choice!"
msg3 = "There's two variables involved: time to take the exam, and the corresponding grades. A histogram is not a
suitable option in this case."
test_mc(2, msgs = [msg1, msg2, msg3])
Customization
*** =video_link //player.vimeo.com/video/148376009
Labels
It's time to customize your own plot. This is the fun part, you will see your plot come to life!
You're going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life
expectancy on the y-axis. The code for this plot is available in the script.
As a first step, let's add axis labels and a title to the plot. You can do this with the xlabel() , ylabel() and title()
functions, available in matplotlib.pyplot . This sub-package is already imported as plt .
*** =instructions
• The strings xlab and ylab are already set for you. Use these variables to set the label of the x- and y-axis.
• The string title is also coded for you. Use it to add a title to the plot.
• After these customizations, finish the script with plt.show() to actually display the plot.
*** =hint
*** =pre_exercise_code
plt.clf()
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
*** =sample_code
# Strings
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'
# Add title
# After customizing, display the plot
*** =solution
# Strings
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'
# Add title
plt.title(title)
*** =sct
msg = "You don't have to change or remove the predefined scatter plot."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)
msg = "You don't have to change or remove the predefined `xscale` function."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Add the correct label to the x-axis using `plt.xlabel(...)`. Fill in the dots with a pre-defined variable."
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Add the correct label to the y-axis using `plt.ylabel(...)`. Fill in the dots with a pre-defined variable."
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Add the correct titel to the plot using `plt.title(...)`. Fill in the dots with a pre-defined variable"
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
Ticks
The customizations you've coded up to now are available in the script, in a more concise form.
In the video, Filip has demonstrated how you could control the y-ticks by specifying two arguments:
plt.yticks([0,1,2], ["one","two","three"])
In this example, the ticks corresponding to the numbers 0, 1 and 2 will be replaced by one, two and three, respectively.
Let's do a similar thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000 ,
10000 and 100000 should be replaced by 1k , 10k and 100k . To this end, two lists have already been created for you:
tick_val and tick_lab .
*** =instructions
• Use tick_val and tick_lab as inputs to the xticks() function to make the the plot more readable.
• As usual, display the plot with plt.show() after you've added the customizations.
*** =hint
*** =pre_exercise_code
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
*** =sample_code
# Scatter plot
plt.scatter(gdp_cap, life_exp)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
*** =solution
# Scatter plot
plt.scatter(gdp_cap, life_exp)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
*** =sct
msg = "You don't have to change or remove the predefined scatter plot."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)
msg = "Don't change any of the customizations that were already coded."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
Sizes
Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let's change this. Wouldn't it be nice
if the size of the dots corresponds to the population?
To accomplish this, there is a list pop loaded in your workspace. It contains population numbers for each country expressed
in millions. You can see that this list is added to the scatter method, as the argument s , for size.
*** =instructions
*** =pre_exercise_code
import matplotlib.pyplot as plt; import importlib; importlib.reload(plt)
import numpy as np
plt.clf()
# data
import pandas as pd
df = pd.read_csv('http://assets.datacamp.com/course/intermediate_python/gapminder.csv', index_col = 0)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
pop = list(df['population']/1e6)
*** =sample_code
# Import numpy as np
# Double np_pop
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
*** =solution
# Import numpy as np
import numpy as np
# Double np_pop
np_pop = np_pop * 2
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
*** =sct
msg = 'Make sure you correctly create `np_pop` and multiply it with `2`.'
test_object("np_pop",
undefined_msg = msg,
incorrect_msg = msg)
msg = "You don't have to change or remove the customizations that were already coded for you."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)
success_msg("Bellissimo! Can you already tell which bubbles correspond to which countries?")
Colors
The code you've written up to now is available in the script on the right.
The next step is making the plot more colorful! To do this, a list col has been created for you. It's a list with a color for each
corresponding country, depending on the continent the country is part of.
How did we make the list col you ask? The Gapminder data contains a list continent with the continent each country
belongs to. A dictionary is constructed that maps continents onto colors:
dict = {
'Asia':'red',
'Europe':'green',
'Africa':'blue',
'Americas':'yellow',
'Oceania':'black'
}
Nothing to worry about now; you will learn about dictionaries in the next chapter.
*** =instructions
*** =hint
*** =pre_exercise_code
*** =sample_code
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
*** =solution
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
*** =sct
msg = "Change `plt.scatter(gdp_cap, life_exp, s = size)` by specifying two more arguments. Add `c = col` and
`alpha = 0.8`."
test_function("matplotlib.pyplot.scatter",
not_called_msg = msg,
incorrect_msg = msg)
msg = "You don't have to change or remove the customizations that you already coded earlier."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)
success_msg("Nice! This is looking more and more like Hans Rosling's plot!")
Additional Customizations
If you have another look at the script, under # Additional Customizations , you'll see that there are two plt.text() functions
now. They add the words "India" and "China" in the plot.
*** =instructions
• Add plt.grid(True) after the plt.text() calls so that gridlines are drawn on the plot.
*** =hint
• Simply put a new line, plt.grid(True) in the script and hit Submit Answer
*** =pre_exercise_code
*** =sample_code
# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
*** =solution
# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k'])
# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
*** =sct
msg = "Don't have change or remove the customizations that have already been coded for you."
test_function("matplotlib.pyplot.xscale",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.text", 1,
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.text", 2,
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xlabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.ylabel",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.title",
not_called_msg = msg,
incorrect_msg = msg)
test_function("matplotlib.pyplot.xticks",
not_called_msg = msg,
incorrect_msg = msg)
Interpretation
If you have a look at your colorful plot, it's clear that people live longer in countries with a higher GDP per capita. No high
income countries have really short life expectancy, and no low income countries have very long life expectancy. Still, there is a
huge difference in life expectancy between countries on the same income level. Most people live in middle income countries
where difference in lifespan is huge between countries; depending on how income is distributed and how it is used.
*** =instructions
• The countries in blue, corresponding to Africa, have both low life expectancy and a low GDP per capita.
• There is a negative correlation between GDP per capita and life expectancy.
• China has both a lower GDP per capita and low expectancy compared to India.
*** =hint
• There is a clearly positive correlation, so you can rule out the second option.
*** =pre_exercise_code
*** =sct