Reading specific columns of a CSV file using Pandas
Last Updated :
21 Nov, 2024
When working with large datasets stored in CSV (Comma-Separated Values) files, it’s often unnecessary to load the entire dataset into memory. Instead, you can selectively read specific columns using Pandas in Python.
Read Specific Columns From CSV File
Let us see how to read specific columns of a CSV file using Pandas. This can be done with the help of the pandas.read_csv() method. We will pass the first parameter as the CSV file and the second parameter as the list of specific columns in keyword usecols. It will return the data of the CSV file of specific columns.
The usecols
parameter in the read_csv()
function filters the columns to be loaded into the DataFrame. This is particularly useful when:
- The dataset contains hundreds or thousands of columns.
- You are only interested in analyzing a subset of the data.
Below are some examples by which we can read specific columns of a CSV file using Pandas.
Read Entire Columns of a CSV File
In this example, the Pandas library is imported, and the code reads the entire content of the "student_scores2.csv" file into a DataFrame 'df' using Pandas. The printed output displays the entire dataset for further examination.
Python
import pandas as pd
# read specific columns of csv file using Pandas
df = pd.read_csv("student_scores2.csv")
print(df)
Output:

Read Specific Columns of a CSV File Using usecols
In this example, the Pandas library is imported, and the code uses it to read only the 'IQ' and 'Scores' columns from the "student_scores2.csv" file, storing the result in the DataFrame 'df'. The printed output displays the selected columns for analysis.
Python
import pandas as pd
# read specific columns of csv file using Pandas
df = pd.read_csv("student_scores2.csv", usecols=['IQ', 'Scores'])
print(df)
Output:

With another example, the code reads the 'Survived' and 'Pclass' columns from the "titanic.csv" file using Pandas. The resulting DataFrame 'df' displays the selected columns for analysis.
Python
import pandas as pd
# read specific columns of csv file using Pandas
df = pd.read_csv("titanic.csv", usecols = ['Survived','Pclass'])
print(df)
Output:

Selecting Specific Columns by Index
If you don’t know the column names or prefer working with indices, you can pass a list of integers representing column positions:
Python
# Read only the thid and last column (indices 2 and 3)
df = pd.read_csv('student_scores2.csv', usecols=[2, 3])
df
Output:
Scores Pass
0 18 0
1 45 1
2 25 0
3 72 1
4 30 0
5 20 0
6 88 1
..
..
Using Lambda for Dynamic Selection
You can also use functions or regular expressions to dynamically select columns. For example:
Python
# Dynamically select columns containing "Name" or "Salary"
df = pd.read_csv('/content/student_scores2.csv', usecols=lambda col: 'IQ' in col or 'Pass' in col)
df
Output:
IQ Pass
0 80 0
1 80 1
2 70 0
3 90 1
4 70 0
5 80 0
6 100 1
7 90 1
..
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Linear Regression in Machine learning Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea
15+ min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Logistic Regression in Machine Learning Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two po
11 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
K means Clustering â Introduction K-Means Clustering is an Unsupervised Machine Learning algorithm which groups unlabeled dataset into different clusters. It is used to organize data into groups based on their similarity. Understanding K-means ClusteringFor example online store uses K-Means to group customers based on purchase frequ
4 min read
K-Nearest Neighbor(KNN) Algorithm K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for classification but can also be used for regression tasks. It works by finding the "k" closest data points (neighbors) to a given input and makesa predictions based on the majority class (for classification) or th
8 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read