How to Access a Column in a DataFrame with Pandas
Last Updated :
13 Jan, 2025
In this article we will explore various techniques to access a column in a dataframe with pandas with concise explanations and practical examples.
Method 1: Accessing a Single Column Using Bracket Notation
Bracket notation is the most straightforward method to access a column. Use the syntax df['column_name']
to retrieve the column as a Pandas Series. This method is quick, intuitive, and works for all valid column names.
Python
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve'],
'Age': [25, 30, 22, 35],
'Salary': [50000, 55000, 40000, 70000]
}
df = pd.DataFrame(data)
# Accessing the 'salary' column using bracket notation
age_column = df['Salary']
print(age_column)
Output0 50000
1 55000
2 40000
3 70000
Name: Salary, dtype: int64
Method 2: Accessing a Single Column Using Dot Notation
In addition to bracket notation, you can also access columns using dot notation (df.column_name). This is a more concise and readable approach but can only be used when the column name is a valid Python attribute.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing the 'Name' column using dot notation
name_column = df.Name
print(name_column)
Output0 Michael
1 Sarah
2 David
3 Emma
Name: Name, dtype: object
Method 3: Accessing Multiple Columns Using Bracket Notation
You can access multiple columns by passing a list of column names inside the brackets. This returns a new DataFrame containing only the specified columns.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing 'Name' and 'Salary' columns using bracket notation
subset_columns = df[['Name', 'Salary']]
print(subset_columns)
Output Name Salary
0 Michael 60000
1 Sarah 62000
2 David 45000
3 Emma 75000
Method 4: Accessing Columns by Index Using iloc
If you don't know the column names but know their positions you can use the iloc indexer to access columns by their integer index. This is particularly useful for large datasets or when you need to access columns programmatically.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing the second column (Age) using iloc
age_column = df.iloc[:, 1]
print(age_column)
Output0 40
1 28
2 33
3 25
Name: Age, dtype: int64
You can refer this article for detailed explanation: Extracting rows using Pandas .iloc[] in Python
Method 5: Accessing Columns by Condition Using Boolean Indexing
You can access columns based on conditions or filters using boolean indexing. This allows to dynamically select rows that meet specific criteria and access their corresponding columns.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing rows where Salary is greater than or equal to 60000
high_salary = df[df['Salary'] >= 60000]
print(high_salary)
Output Name Age Salary
0 Michael 40 60000
1 Sarah 28 62000
3 Emma 25 75000
Method 6: Accessing Columns Using loc for Label-Based Indexing
The loc indexer allows to access rows and columns by their labels. It is more flexible than iloc, as it can be used with both row and column names.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing rows where 'Age' is greater than 30 using loc
age_above_30 = df.loc[df['Age'] > 30]
print(age_above_30)
Output Name Age Salary
0 Michael 40 60000
2 David 33 45000
Method 7: Accessing Columns Dynamically
Sometimes, you may need to access columns dynamically based on variables or user input. You can achieve this by using a variable that stores the column name and accessing it using bracket notation.
Python
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Access column name dynamically
column_name = 'Salary'
salary_column = df[column_name]
print(salary_column)
Output0 60000
1 62000
2 45000
3 75000
Name: Salary, dtype: int64
Hence, For simple column access, bracket or dot notation works best. If you're working with dynamic conditions or large datasets, consider using methods like iloc, loc, or boolean indexing. Experiment with these techniques to find the best approach for your data manipulation tasks.