Data Visualization With Altair
Last Updated :
13 Sep, 2024
Nowadays, Data is an important entity. It should be processed in such a way so that the companies can understand the psychology of the consumers. Data visualization is an important step in processing of data. Altair is a declarative statistical visualization library for Python, built on Vega and Vega-Lite. It offers a user-friendly and efficient way to create high-quality, interactive plots with minimal code. This tutorial will guide you through the core features of Altair and how to use it for data visualization.
Altair is designed with a declarative syntax, which allows you to define what you want to visualize without specifying the underlying computational details. It automatically handles data transformations, scale management, and encodings.
It is a technique used to visualize data in the form of graphs, charts etc. Data visualization is important because:
- It helps to draw conclusions easily.
- It is used to analyze the trends and patterns in the data.
- Making comparisons between the prediction and the target values or focus on old versus new trends.
Installing and Setting Up Altair
To start using Altair, you need to install it. You can do so using pip
:
pip install altair vega_datasets
Creating Basic Charts with Altair
The general syntax to create a chart in Altair is as follows
alt.Chart(data).mark_type().encode(x=val1, y=val2)
- altChart: A chart is an object in Altair. It acts as a placeholder that holds the visualization.
- mark_type(): Marks: is used to define type of the graph in which the data will be displayed. For example: bar, points, line, area etc.
- encode: used to define other properties of the graph like color, size, position, thickness etc.
1. Bar chart
Bar chart is the most commonly used chart that is used to display relationships between two categorical data.
Syntax:
alt.Chart(data).mark_bar().encode(x=val1, y=val2)
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Category': ['A', 'B', 'C', 'D'],
'Value': [4, 7, 1, 5]
})
# Create a bar plot
bar_chart = alt.Chart(data).mark_bar().encode(
x='Category',
y='Value'
).properties(
title='Bar Plot'
)
bar_chart.display()
Output:
Bar plot2. Line Chart
Line chart is the type of chart that is used to display relationship between dependent and independent variables.
Syntax
alt.Chart(data).mark_line().encode(x=val1, y=val2)
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Year': [2000, 2001, 2002, 2003, 2004],
'Value': [10, 15, 8, 12, 18]
})
# Create a line plot
line_chart = alt.Chart(data).mark_line().encode(
x='Year',
y='Value'
).properties(
title='Line Plot'
)
line_chart.display()
Output:
Line ChartScatter plot is used to display relationship between two quantitative variables in point format.
Syntax:
alt.Chart(data).mark_point().encode(x=val1, y=val2)
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'X': [1, 2, 3, 4, 5],
'Y': [10, 15, 13, 17, 19]
})
# Create a scatter plot
scatter_plot = alt.Chart(data).mark_point().encode(
x='X',
y='Y'
).properties(
title='Scatter Plot'
)
scatter_plot.display()
Output:
Scatter PlotHistogram is used to show the trend of any continuous valued variable in bins.
Syntax:
alt.Chart(data).mark_bar().encode(alt.X('Value:O', bin=True), y=val2)
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Value': [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5]
})
# Create a histogram
histogram = alt.Chart(data).mark_bar().encode(
alt.X('Value:O', bin=True),
y='count()'
).properties(
title='Histogram'
)
histogram.display()
Output:
HistogramBoxplot is useful when we want to see the outliers and the trends in the data.
Syntax:
alt.Chart(data).mark_box().encode(alt.X('Value:O', bin=True), y=val2)
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Category': ['A', 'A', 'A', 'B', 'B', 'B'],
'Value': [1, 2, 3, 4, 5, 6]
})
# Create a box plot
box_plot = alt.Chart(data).mark_boxplot().encode(
x='Category',
y='Value'
).properties(
title='Box Plot'
)
box_plot.display()
Output:
Box PlotCustomizing Plots With Altair
Customizing plots is an important step as we need to make our graphs more creative and interactive. Altair provides many features by which we can make our charts look better.
Title is an important part of graph as it provides the description of the chart in short. We can adjust font, color, style etc in Altair.
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Category': ['A', 'B', 'C', 'D'],
'Value': [4, 7, 1, 5]
})
# Create a bar plot with custom configurations
bar_chart = alt.Chart(data).mark_bar().encode(
x='Category',
y='Value'
).properties(
title='Bar Plot with Custom Configurations'
).configure(
title={
"fontSize": 20,
"font": "Arial",
"color": "blue"
},
axis={
"titleFontSize": 14,
"labelFontSize": 12
}
)
bar_chart
Output:
Customizing TitleWe can change the colors of the marks based on a particular column.
Python
import altair as alt
import pandas as pd
# Sample data
data = pd.DataFrame({
'Category': ['A', 'B', 'C', 'D'],
'Value': [4, 7, 1, 5],
'Type': ['X', 'Y', 'X', 'Y']
})
# Create a bar plot with a color scale
bar_chart = alt.Chart(data).mark_bar().encode(
x='Category',
y='Value',
color=alt.Color('Type:N', scale=alt.Scale(domain=['X', 'Y'], range=['#1f77b4', '#ff7f0e']))
).properties(
title='Bar Plot with Color Scale'
)
bar_chart
Output:
Customizing colorThe latest version of Altair does not support themes. Instead we can use different background colors in our graphs.
Python
import altair as alt
import pandas as pd
# Sample Data
data = pd.DataFrame({
'x': ['A', 'B', 'C', 'D'],
'y': [5, 10, 15, 20]
})
# Create the chart
chart = alt.Chart(data).mark_bar().encode(
x='x',
y='y'
).configure(
background='lightgray' # Setting the background color
)
# Display the chart
chart
Output:
Customizing ThemesWe can customize the axes that is the X and Y in the graphs. We can also add gridlines, modify labels, change the angle in which the labels are to be displayed etc.
Python
import altair as alt
import pandas as pd
# Sample Data
data = pd.DataFrame({
'x': ['A', 'B', 'C', 'D'],
'y': [5, 10, 15, 20]
})
# Create the chart with axis customizations
chart = alt.Chart(data).mark_bar().encode(
x=alt.X('x', axis=alt.Axis(
title='Categories', # Title of the x-axis
titleFontSize=15, # Font size for the axis title
labelFontSize=12, # Font size for the axis labels
labelAngle=0, # Angle of the axis labels
labelColor='blue', # Color of the axis labels
titleColor='red' # Color of the axis title
)),
y=alt.Y('y', axis=alt.Axis(
title='Values',
titleFontSize=15,
labelFontSize=12,
grid=True, # Show grid lines
titleAngle=90, # Title angle (default is 90 for y-axis)
titleColor='green' # Color of the y-axis title
))
).properties(
title='Customized Axes'
)
# Display the chart
chart
Output:
Customizing AxesExample Code for Creating Charts with Altair
Here we have used Iris Dataset in which we will be creating charts using Altair. In Iris dataset there five columns: 'sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'species'. So we will establish relationships between different features.
Python
import altair as alt
from sklearn.datasets import load_iris
import pandas as pd
# Load the Iris dataset
iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris_data.target, iris_data.target_names)
print(iris_df.columns)
# Scatter plot with hover (tooltip)
scatter_plot = alt.Chart(iris_df).mark_point().encode(
x=alt.X('sepal length (cm)', axis=alt.Axis(title='Sepal Length (cm)')),
y=alt.Y('sepal width (cm)', axis=alt.Axis(title='Sepal Width (cm)')),
color='species',
tooltip=['species', 'sepal length (cm)', 'sepal width (cm)'] # Tooltip on hover
).properties(
title='Iris Dataset: Sepal Length vs Sepal Width'
)
scatter_plot.display()
# Bar chart to show average petal length per species
bar_chart = alt.Chart(iris_df).mark_bar().encode(
x='species:N',
y='mean(petal length (cm)):Q',
color='species:N'
).properties(
title='Average Petal Length by Species'
)
bar_chart.display()
# Histogram to show distribution of sepal width
histogram = alt.Chart(iris_df).mark_bar().encode(
alt.X('sepal width (cm):Q', bin=True, title='Sepal Width'),
y='count()',
color='species:N'
).properties(
title='Distribution of Sepal Width by Species'
)
histogram.display()
# Box plot for petal length by species
box_plot = alt.Chart(iris_df).mark_boxplot().encode(
x='species:N',
y='petal length (cm):Q',
color='species:N'
).properties(
title='Box Plot of Petal Length by Species'
)
box_plot.display()
# Create a selection object
selection = alt.selection_multi(fields=['species'], bind='legend') # Multi-select based on species
# Scatter plot with hover and linked selection
scatter_plot = alt.Chart(iris_df).mark_point().encode(
x=alt.X('sepal length (cm)', axis=alt.Axis(title='Sepal Length (cm)')),
y=alt.Y('sepal width (cm)', axis=alt.Axis(title='Sepal Width (cm)')),
color=alt.condition(selection, 'species:N', alt.value('lightgray')), # Highlight selected species
tooltip=['species', 'sepal length (cm)', 'sepal width (cm)']
).add_selection(
selection # Add the selection to the scatter plot
).properties(
title='Iris Dataset: Sepal Length vs Sepal Width'
)
# Bar chart with linked selection
bar_chart = alt.Chart(iris_df).mark_bar().encode(
x='species:N',
y='mean(petal length (cm)):Q',
color=alt.condition(selection, 'species:N', alt.value('lightgray')) # Highlight selected species
).properties(
title='Average Petal Length by Species'
).add_selection(
selection # Add the same selection to the bar chart
)
# Combine the charts vertically
combined_chart = alt.hconcat(bar_chart, scatter_plot)
# Display the combined chart
combined_chart.display()
Output:
From the above code, we can see that we have created multiple plots such as bar plot, scatter plot, Histograms, Box plots etc. Lastly we have combined bar plot and scatter plot to see how the length of petal as well as length and width of sepal has influence on the category of flower.
You can create faceted or layered visualizations to compare multiple plots:
1. Faceting
Python
import altair as alt
from vega_datasets import data
# Load the dataset
cars = data.cars()
facet_chart = alt.Chart(cars).mark_point().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin'
).facet(
column='Origin'
)
facet_chart
Output:
Explanation:
.mark_point()
specifies the type of plot (scatter plot)..encode()
maps data columns to visual encodings like x
, y
, and color
..properties()
sets the title and other properties of the chart.
2. Layering
Python
stocks = data.stocks()
line = alt.Chart(stocks).mark_line().encode(
x='date:T',
y='price',
color='symbol'
)
points = alt.Chart(stocks).mark_point().encode(
x='date:T',
y='price',
color='symbol'
)
layered_chart = line + points
layered_chart
Output:
Saving and Exporting Visualizations
You can save Altair visualizations in various formats, including PNG, SVG, and HTML:
# Save as PNG
chart.save('scatter_plot.png')
# Save as HTML
chart.save('scatter_plot.html')
For more, refer to below articles:
Conclusion
Altair offers an intuitive, powerful way to create both simple and complex visualizations in Python. Its declarative syntax and built-in interactivity make it a go-to tool for data scientists and analysts. This tutorial covered the basics, but Altair’s potential goes far beyond what’s shown here. Explore the documentation for advanced topics like data transformations, more complex interactivity, and more chart types.
Similar Reads
Sharing and Publishing Visualizations with Altair
Altair is a powerful, declarative statistical visualization library in Python, designed to enable the creation of sophisticated visualizations with minimal code. This article explores the technical aspects of sharing and publishing visualizations created with Altair, focusing on various formats and
6 min read
Data Visualization in UI
Data visualization has become increasingly dominant in the user interface (UI), and this technique become even more effective in fulfilling these purposes. Through the encapsulation of raw data into visual forms, information even of complex type becomes intuitive to users and the quality of their ex
5 min read
Area Chart with Altair in Python
Prerequisite: Introduction to Altair in Python An Area Graph shows the change in a quantitative quantity with respect to some other variable. It is simply a line chart where the area under the curve is colored/shaded. It is best used to visualize trends over a period of time, where you want to see h
2 min read
Bar chart with Altair in Python
Altair is a declarative statistical visualization library for Python, built on top of the Vega-Lite visualization grammar. It provides a simple and intuitive API for creating a wide range of interactive and informative visualizations, including bar charts. This article will guide you through the pro
2 min read
What is Data Migration ?
In the field of data science and technology, data migration has emerged as an important process for businesses and organizations. As we progress into 2024, the volume and complexity of data have exponentially increased, making the process of transferring data from one system to another a crucial yet
5 min read
What is Automate Data Labeling?
Automated data labeling revolutionizes the way we prepare datasets for machine learning, offering speed, consistency, and scalability. This article delves into the fundamentals of automated data labeling, its techniques, tools, challenges, and best practices, shedding light on how automation is resh
11 min read
What is Data Preparation?
Raw data may or may not contain errors and inconsistencies. Hence, drawing actionable insights is not straightforward. We have to prepare the data to rescue us from the pitfalls of incomplete, inaccurate, and unstructured data. In this article, we are going to understand data preparation, the proces
9 min read
What is Data Analytics?
Data analytics, also known as data analysis, is a crucial component of modern business operations. It involves examining datasets to uncover useful information that can be used to make informed decisions. This process is used across industries to optimize performance, improve decision-making, and ga
9 min read
What is Data Transformation?
Data transformation is an important step in data analysis process that involves the conversion, cleaning, and organizing of data into accessible formats. It ensures that the information is accessible, consistent, secure, and finally recognized by the intended business users. This process is undertak
6 min read
Data Visualization using Turicreate in Python
In Machine Learning, Data Visualization is a very important phase. In order to correctly understand the behavior and features of your data one needs to visualize it perfectly. So here I am with my post on how to efficiently and at the same time easily visualize your data to extract most out of it. B
3 min read