Select Columns that Satisfy a Condition in PySpark Last Updated : 29 Jun, 2021 Summarize Comments Improve Suggest changes Share Like Article Like Report In this article, we are going to select columns in the dataframe based on the condition using the where() function in Pyspark. Let's create a sample dataframe with employee data. Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of employee data data = [[1, "sravan", "company 1"], [2, "ojaswi", "company 1"], [3, "rohith", "company 2"], [4, "sridevi", "company 1"], [1, "sravan", "company 1"], [4, "sridevi", "company 1"]] # specify column names columns = ['ID', 'NAME', 'Company'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display dataframe dataframe.show() Output: The where() method This method is used to return the dataframe based on the given condition. It can take a condition and returns the dataframe Syntax: where(dataframe.column condition)Here dataframe is the input dataframeThe column is the column name where we have to raise a conditionThe select() method After applying the where clause, we will select the data from the dataframe Syntax: dataframe.select('column_name').where(dataframe.column condition)Here dataframe is the input dataframeThe column is the column name where we have to raise a condition Example 1: Python program to return ID based on condition Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of employee data data = [[1, "sravan", "company 1"], [2, "ojaswi", "company 1"], [3, "rohith", "company 2"], [4, "sridevi", "company 1"], [1, "sravan", "company 1"], [4, "sridevi", "company 1"]] # specify column names columns = ['ID', 'NAME', 'Company'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # select ID where ID less than 3 dataframe.select('ID').where(dataframe.ID < 3).show() Output: Example 2: Python program to select ID and name where ID =4. Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of employee data data = [[1, "sravan", "company 1"], [2, "ojaswi", "company 1"], [3, "rohith", "company 2"], [4, "sridevi", "company 1"], [1, "sravan", "company 1"], [4, "sridevi", "company 1"]] # specify column names columns = ['ID', 'NAME', 'Company'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # select ID and name where ID =4 dataframe.select(['ID', 'NAME']).where(dataframe.ID == 4).show() Output: Example 3: Python program to select all column based on condition Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of employee data data = [[1, "sravan", "company 1"], [2, "ojaswi", "company 1"], [3, "rohith", "company 2"], [4, "sridevi", "company 1"], [1, "sravan", "company 1"], [4, "sridevi", "company 1"]] # specify column names columns = ['ID', 'NAME', 'Company'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # select all columns e where name = sridevi dataframe.select(['ID', 'NAME', 'Company']).where( dataframe.NAME == 'sridevi').show() Output: Comment More infoAdvertise with us Next Article Select Columns that Satisfy a Condition in PySpark S sravankumar_171fa07058 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads Python Tutorial - Learn Python Programming Language Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly. It'sA high-level language, used in web development, data science, automation, AI and more.Known fo 10 min read Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth 15+ min read Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co 11 min read Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p 11 min read Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list 10 min read Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test 9 min read Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co 11 min read Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance 10 min read Python Introduction Python was created by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was designed with focus on code readability and its syntax allows us to express concepts in fewer lines of code.Key Features of PythonPythonâs simple and readable syntax makes it beginner-frien 3 min read Python Data Types Python Data types are the classification or categorization of data items. It represents the kind of value that tells what operations can be performed on a particular data. Since everything is an object in Python programming, Python data types are classes and variables are instances (objects) of thes 9 min read Like