Open In App

Data Visualization With Altair

Last Updated : 13 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Nowadays, Data is an important entity. It should be processed in such a way so that the companies can understand the psychology of the consumers. Data visualization is an important step in processing of data. Altair is a declarative statistical visualization library for Python, built on Vega and Vega-Lite. It offers a user-friendly and efficient way to create high-quality, interactive plots with minimal code. This tutorial will guide you through the core features of Altair and how to use it for data visualization.

Getting Started With Altair

Altair is designed with a declarative syntax, which allows you to define what you want to visualize without specifying the underlying computational details. It automatically handles data transformations, scale management, and encodings.

It is a technique used to visualize data in the form of graphs, charts etc. Data visualization is important because:

  • It helps to draw conclusions easily.
  • It is used to analyze the trends and patterns in the data.
  • Making comparisons between the prediction and the target values or focus on old versus new trends.

Installing and Setting Up Altair

To start using Altair, you need to install it. You can do so using pip:

pip install altair vega_datasets

Creating Basic Charts with Altair

The general syntax to create a chart in Altair is as follows

alt.Chart(data).mark_type().encode(x=val1, y=val2)

  • altChart: A chart is an object in Altair. It acts as a placeholder that holds the visualization.
  • mark_type(): Marks: is used to define type of the graph in which the data will be displayed. For example: bar, points, line, area etc.
  • encode: used to define other properties of the graph like color, size, position, thickness etc.

1. Bar chart

Bar chart is the most commonly used chart that is used to display relationships between two categorical data.

Syntax:

alt.Chart(data).mark_bar().encode(x=val1, y=val2)

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [4, 7, 1, 5]
})

# Create a bar plot
bar_chart = alt.Chart(data).mark_bar().encode(
    x='Category',
    y='Value'
).properties(
    title='Bar Plot'
)

bar_chart.display()

Output:

bar-chart
Bar plot

2. Line Chart

Line chart is the type of chart that is used to display relationship between dependent and independent variables.

Syntax

alt.Chart(data).mark_line().encode(x=val1, y=val2)

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Year': [2000, 2001, 2002, 2003, 2004],
    'Value': [10, 15, 8, 12, 18]
})

# Create a line plot
line_chart = alt.Chart(data).mark_line().encode(
    x='Year',
    y='Value'
).properties(
    title='Line Plot'
)

line_chart.display()

Output:

line
Line Chart

3. Scatter plot

Scatter plot is used to display relationship between two quantitative variables in point format.

Syntax:

alt.Chart(data).mark_point().encode(x=val1, y=val2)

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'X': [1, 2, 3, 4, 5],
    'Y': [10, 15, 13, 17, 19]
})

# Create a scatter plot
scatter_plot = alt.Chart(data).mark_point().encode(
    x='X',
    y='Y'
).properties(
    title='Scatter Plot'
)

scatter_plot.display()

Output:

scatter
Scatter Plot

4. Histogram

Histogram is used to show the trend of any continuous valued variable in bins.

Syntax:

alt.Chart(data).mark_bar().encode(alt.X('Value:O', bin=True), y=val2)

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Value': [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5]
})

# Create a histogram
histogram = alt.Chart(data).mark_bar().encode(
    alt.X('Value:O', bin=True),
    y='count()'
).properties(
    title='Histogram'
)

histogram.display()

Output:

histogram
Histogram

5. Boxplot

Boxplot is useful when we want to see the outliers and the trends in the data.

Syntax:

alt.Chart(data).mark_box().encode(alt.X('Value:O', bin=True), y=val2)

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Category': ['A', 'A', 'A', 'B', 'B', 'B'],
    'Value': [1, 2, 3, 4, 5, 6]
})

# Create a box plot
box_plot = alt.Chart(data).mark_boxplot().encode(
    x='Category',
    y='Value'
).properties(
    title='Box Plot'
)

box_plot.display()

Output:

box
Box Plot

Customizing Plots With Altair

Customizing plots is an important step as we need to make our graphs more creative and interactive. Altair provides many features by which we can make our charts look better.

1. Customizing Title

Title is an important part of graph as it provides the description of the chart in short. We can adjust font, color, style etc in Altair.

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [4, 7, 1, 5]
})

# Create a bar plot with custom configurations
bar_chart = alt.Chart(data).mark_bar().encode(
    x='Category',
    y='Value'
).properties(
    title='Bar Plot with Custom Configurations'
).configure(
    title={
        "fontSize": 20,
        "font": "Arial",
        "color": "blue"
    },
    axis={
        "titleFontSize": 14,
        "labelFontSize": 12
    }
)

bar_chart

Output:

visualization
Customizing Title

2. Customizing color

We can change the colors of the marks based on a particular column.

Python
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'D'],
    'Value': [4, 7, 1, 5],
    'Type': ['X', 'Y', 'X', 'Y']
})

# Create a bar plot with a color scale
bar_chart = alt.Chart(data).mark_bar().encode(
    x='Category',
    y='Value',
    color=alt.Color('Type:N', scale=alt.Scale(domain=['X', 'Y'], range=['#1f77b4', '#ff7f0e']))
).properties(
    title='Bar Plot with Color Scale'
)

bar_chart

Output:

file
Customizing color

3. Customizing Themes

The latest version of Altair does not support themes. Instead we can use different background colors in our graphs.

Python
import altair as alt
import pandas as pd

# Sample Data
data = pd.DataFrame({
    'x': ['A', 'B', 'C', 'D'],
    'y': [5, 10, 15, 20]
})

# Create the chart
chart = alt.Chart(data).mark_bar().encode(
    x='x',
    y='y'
).configure(
    background='lightgray'  # Setting the background color
)


# Display the chart
chart

Output:

background
Customizing Themes

4. Customizing Axes

We can customize the axes that is the X and Y in the graphs. We can also add gridlines, modify labels, change the angle in which the labels are to be displayed etc.

Python
import altair as alt
import pandas as pd

# Sample Data
data = pd.DataFrame({
    'x': ['A', 'B', 'C', 'D'],
    'y': [5, 10, 15, 20]
})

# Create the chart with axis customizations
chart = alt.Chart(data).mark_bar().encode(
    x=alt.X('x', axis=alt.Axis(
        title='Categories',            # Title of the x-axis
        titleFontSize=15,              # Font size for the axis title
        labelFontSize=12,              # Font size for the axis labels
        labelAngle=0,                  # Angle of the axis labels
        labelColor='blue',             # Color of the axis labels
        titleColor='red'               # Color of the axis title
    )),
    y=alt.Y('y', axis=alt.Axis(
        title='Values',
        titleFontSize=15,
        labelFontSize=12,
        grid=True,                     # Show grid lines
        titleAngle=90,                  # Title angle (default is 90 for y-axis)
        titleColor='green'             # Color of the y-axis title
    ))
).properties(
    title='Customized Axes'
)

# Display the chart
chart

Output:

axes
Customizing Axes

Example Code for Creating Charts with Altair

Here we have used Iris Dataset in which we will be creating charts using Altair. In Iris dataset there five columns: 'sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'species'. So we will establish relationships between different features.

Python
import altair as alt
from sklearn.datasets import load_iris
import pandas as pd

# Load the Iris dataset
iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris_data.target, iris_data.target_names)
print(iris_df.columns)
# Scatter plot with hover (tooltip)
scatter_plot = alt.Chart(iris_df).mark_point().encode(
    x=alt.X('sepal length (cm)', axis=alt.Axis(title='Sepal Length (cm)')),
    y=alt.Y('sepal width (cm)', axis=alt.Axis(title='Sepal Width (cm)')),
    color='species',
    tooltip=['species', 'sepal length (cm)', 'sepal width (cm)']  # Tooltip on hover
).properties(
    title='Iris Dataset: Sepal Length vs Sepal Width'
)
scatter_plot.display()

# Bar chart to show average petal length per species
bar_chart = alt.Chart(iris_df).mark_bar().encode(
    x='species:N',
    y='mean(petal length (cm)):Q',
    color='species:N'
).properties(
    title='Average Petal Length by Species'
)
bar_chart.display()


# Histogram to show distribution of sepal width
histogram = alt.Chart(iris_df).mark_bar().encode(
    alt.X('sepal width (cm):Q', bin=True, title='Sepal Width'),
    y='count()',
    color='species:N'
).properties(
    title='Distribution of Sepal Width by Species'
)
histogram.display()

# Box plot for petal length by species
box_plot = alt.Chart(iris_df).mark_boxplot().encode(
    x='species:N',
    y='petal length (cm):Q',
    color='species:N'
).properties(
    title='Box Plot of Petal Length by Species'
)
box_plot.display()

# Create a selection object
selection = alt.selection_multi(fields=['species'], bind='legend')  # Multi-select based on species

# Scatter plot with hover and linked selection
scatter_plot = alt.Chart(iris_df).mark_point().encode(
    x=alt.X('sepal length (cm)', axis=alt.Axis(title='Sepal Length (cm)')),
    y=alt.Y('sepal width (cm)', axis=alt.Axis(title='Sepal Width (cm)')),
    color=alt.condition(selection, 'species:N', alt.value('lightgray')),  # Highlight selected species
    tooltip=['species', 'sepal length (cm)', 'sepal width (cm)']
).add_selection(
    selection  # Add the selection to the scatter plot
).properties(
    title='Iris Dataset: Sepal Length vs Sepal Width'
)

# Bar chart with linked selection
bar_chart = alt.Chart(iris_df).mark_bar().encode(
    x='species:N',
    y='mean(petal length (cm)):Q',
    color=alt.condition(selection, 'species:N', alt.value('lightgray'))  # Highlight selected species
).properties(
    title='Average Petal Length by Species'
).add_selection(
    selection  # Add the same selection to the bar chart
)

# Combine the charts vertically
combined_chart = alt.hconcat(bar_chart, scatter_plot)

# Display the combined chart
combined_chart.display()

Output:

From the above code, we can see that we have created multiple plots such as bar plot, scatter plot, Histograms, Box plots etc. Lastly we have combined bar plot and scatter plot to see how the length of petal as well as length and width of sepal has influence on the category of flower.

Faceting and Combining Charts in Altair

You can create faceted or layered visualizations to compare multiple plots:

1. Faceting

Python
import altair as alt
from vega_datasets import data

# Load the dataset
cars = data.cars()
facet_chart = alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin'
).facet(
    column='Origin'
)

facet_chart

Output:

visualization

Explanation:

  • .mark_point() specifies the type of plot (scatter plot).
  • .encode() maps data columns to visual encodings like x, y, and color.
  • .properties() sets the title and other properties of the chart.

2. Layering

Python
stocks = data.stocks()
line = alt.Chart(stocks).mark_line().encode(
    x='date:T',
    y='price',
    color='symbol'
)

points = alt.Chart(stocks).mark_point().encode(
    x='date:T',
    y='price',
    color='symbol'
)

layered_chart = line + points
layered_chart

Output:

visualization

Saving and Exporting Visualizations

You can save Altair visualizations in various formats, including PNG, SVG, and HTML:

# Save as PNG
chart.save('scatter_plot.png')

# Save as HTML
chart.save('scatter_plot.html')

For more, refer to below articles:

Conclusion

Altair offers an intuitive, powerful way to create both simple and complex visualizations in Python. Its declarative syntax and built-in interactivity make it a go-to tool for data scientists and analysts. This tutorial covered the basics, but Altair’s potential goes far beyond what’s shown here. Explore the documentation for advanced topics like data transformations, more complex interactivity, and more chart types.


Next Article

Similar Reads