Create PySpark DataFrame from list of tuples Last Updated : 30 May, 2021 Summarize Comments Improve Suggest changes Share Like Article Like Report In this article, we are going to discuss the creation of a Pyspark dataframe from a list of tuples. To do this, we will use the createDataFrame() method from pyspark. This method creates a dataframe from RDD, list or Pandas Dataframe. Here data will be the list of tuples and columns will be a list of column names. Syntax: dataframe = spark.createDataFrame(data, columns) Example 1: Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of tuples of college data data = [("sravan", "IT", 80), ("jyothika", "CSE", 85), ("harsha", "ECE", 60), ("thanmai", "IT", 65), ("durga", "IT", 91)] # giving column names of dataframe columns = ["Name", "Branch", "Percentage"] # creating a dataframe dataframe = spark.createDataFrame(data, columns) # show data frame dataframe.show() Output: Example 2: Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of tuples of plants data data = [("mango", "AP", "Guntur"), ("mango", "AP", "Chittor"), ("sugar cane", "AP", "amaravathi"), ("paddy", "TS", "adilabad"), ("wheat", "AP", "nellore")] # giving column names of dataframe columns = ["Crop Name", "State", "District"] # creating a dataframe dataframe = spark.createDataFrame(data, columns) # show data frame dataframe.show() Output: Example 3: Python code to count the records (tuples) in the list Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving # an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() #list of tuples of plants data data = [("mango", "AP", "Guntur"), ("mango", "AP", "Chittor"), ("sugar cane", "AP", "amaravathi"), ("paddy", "TS", "adilabad"), ("wheat", "AP", "nellore")] # giving column names of dataframe columns = ["Crop Name", "State", "District"] # creating a dataframe dataframe = spark.createDataFrame(data, columns) #count records in the list dataframe.count() Output: 5 Comment More infoAdvertise with us Next Article Creating a Pandas dataframe using list of tuples S sravankumar_171fa07058 Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads PySpark - Create DataFrame from List In this article, we are going to discuss how to create a Pyspark dataframe from a list. To do this first create a list of data and a list of column names. Then pass this zipped data to spark.createDataFrame() method. This method is used to create DataFrame. The data attribute will be the list of da 2 min read Convert PySpark dataframe to list of tuples In this article, we are going to convert the Pyspark dataframe into a list of tuples. The rows in the dataframe are stored in the list separated by a comma operator. So we are going to create a dataframe by using a nested list Creating Dataframe for demonstration: Python3 # importing module import p 2 min read How to create a PySpark dataframe from multiple lists ? In this article, we will discuss how to create Pyspark dataframe from multiple lists. ApproachCreate data from multiple lists and give column names in another list. So, to do our task we will use the zip method. zip(list1,list2,., list n) Pass this zipped data to spark.createDataFrame() method data 2 min read How to create an empty PySpark DataFrame ? In PySpark, an empty DataFrame is one that contains no data. You might need to create an empty DataFrame for various reasons such as setting up schemas for data processing or initializing structures for later appends. In this article, weâll explore different ways to create an empty PySpark DataFrame 4 min read Creating a Pandas dataframe using list of tuples A Pandas DataFrame is an important data structure used for organizing and analyzing data in Python. Converting a list of tuples into a DataFrame makes it easier to work with data. In this article we'll see ways to create a DataFrame from a list of tuples.1. Using pd.DataFrame()The simplest method to 2 min read Create PySpark dataframe from nested dictionary In this article, we are going to discuss the creation of Pyspark dataframe from the nested dictionary. We will use the createDataFrame() method from pyspark for creating DataFrame. For this, we will use a list of nested dictionary and extract the pair as a key and value. Select the key, value pairs 2 min read Like