Open In App

How to Check PySpark Version

Last Updated : 10 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Knowing the version of PySpark you're working with is crucial for compatibility and troubleshooting purposes. In this article, we will walk through the steps to check the PySpark version in the environment.

What is PySpark?

PySpark is the Python API for Apache Spark, a powerful distributed computing system that allows for large-scale data processing. It enables data scientists and analysts to harness Spark's capabilities using Python, making it easier to build and deploy data pipelines and machine learning models.

Setting Up PySpark

Before setting up Pyspark on the system, make sure that the latest version of Python is installed on the system. Now let us see how we can set up Pyspark on our systems.

Install JDK

The first step required to use Pyspark is the Java Development Kit (JDK) from this Oracle Website. Run the installer and follow the instruction on the screen to install Java on your system.

Set JDK Environment Variable

Once it is installed and set up on the system, make sure to add the JDK installation path to the system environment variables. You can do this by going to the System Properties. Then under the Advanced tab, click on Environment variables. Under the section System Variables, click on new and enter "JAVA_HOME" variable name and value the location where you'd JDK is installed on your system and then press OK.

Setting up JDK Environment Path
Setting up JDK Environment Path

Verify Java Installation

Once this is done, you can check if java is properly installed on your system by typing the following command in your command prompt and press enter. This command tells the Java version installed on the system.

Verifying Java Installation
Verifying Java Installation

Install PySpark

Next, install PySpark on your system by opening the terminal and typing the following command in it and press enter.

pip install pyspark
PySpark Installation
PySpark Installation

Set PySpark Environment Variable

Similar to setting up JDK environment variable, set "SPARK_HOME" in environment variables for Pyspark as well.

Checking PySpark Version

Once you are done with all the necessary installations and setting up environment variables for the system, you can now check and verify the PySpark installation and version. Let us see a few different ways to check the PySpark version.

Using Command Prompt

The simple and straightforward way to check ant software's version is by using the command prompt or the terminal of your system. For checking Pyspark version, you can simply type the following command in the terminal. As soon as you press enter, it will tell you the Pyspark version installed on the system.

pyspark --version
PySpark Version
PySpark Version

Using pyspark.version

This method requires to tu run a basic Python script. The pyspark.__version__ method tells the current PySpark version installed on the system.

Python
# import pyspark module
import pyspark

# Get the version of PySpark
pyspark_version = pyspark.__version__
print(f"PySpark Version: {pyspark_version}")

Output:

PySpark Version: 3.5.1



Next Article
Article Tags :
Practice Tags :

Similar Reads