100% found this document useful (1 vote)

121 views

Databricks Pyspark 1712042928

Uploaded by

Lipsa Priyadarsini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

121 views

Databricks Pyspark 1712042928

Uploaded by

Lipsa Priyadarsini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

13/01/2024, 21:08 24 December - Databricks

24 December
(https://databricks.com)

%sql
PySpark the python library for Apache Spark , offers two fundamental data structures that serve as the building blocks of distributed
DataFrames.
Resilient Distributed Datasets:-

At the core of pyspark data model lies RDD immutable distributed collections of objects that can be processed in parallel across
tolerance through lineage information, enabling efficient re-computation of lost data partitions. RDD's are perfect for low leve
making then ideal for complex data manipulations and custom computations.
Example :-

rdd = sc.parallelize([1,2,3,4,5])
-- perform a Square
squared_rdd = rdd.map(lmbda x:x**2)
result = squared_rdd.collect()
print(result)

%sql
dataFrames :-

Dataframes provide a higher-level abstracion in PySpark, offering a more user-friendly way to work with distributed data.
PySpark dataframe organize data into named columns, making querying and manipulation a breeze. They leverage Spark's Catalys
queries and seamless integration with populaer data formats like JSON, Parquet and CSV. Dataframes are well-suited for struc
learning tasks and data exploration.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

data = [{'name':'Nitya', 'Age': 25},

{'name':'Nityaa', 'Age': 35},
{'name':'Nityaaa', 'Age': 34}
]
df = spark.createDataFrame(data)
filtered_df = df.filter(df.Age>30)
filtered_df.show()

+---+-------+
|Age| name|
+---+-------+
| 35| Nityaa|
| 34|Nityaaa|
+---+-------+

Spark Architecture :-
At the head of Spark efficiency lies it's powerful architecture, designed to handled complex big data loads seamle

Cluster Manager :-
Spark Architecture operates on a master- slave model, where a central manager oversees the distribution of tasks a
cluster manager ensures fault tolerance, load balancing and resource allocation, making it the backbone of Spark p

file:///C:/Users/Asus/Downloads/24 December (1).html 1/29

13/01/2024, 21:08 24 December - Databricks

Transformation:-

PySpark RDD Transformations are lazy evaluation and is used to transform/update from one RDD into another. When executed on RDD, it re
new RDD. transformations always create a new RDD without updating an existing one hence, a chain of RDD transformations creates an RDD

RDD Transformations are Lazy:-

RDD Transformations are lazy operations meaning none of the transformations get executed until you call an action on PySpark RDD. Sinc
transformations on it result in a new RDD leaving the current one unchanged.

Narrow Transformation:

Narrow transformations are the result of map() and filter() functions and these compute data that live on a single partition meaning t
movement between partitions to execute narrow transformations.

Wider Transformation:

Wider transformations are the result of groupByKey() and reduceByKey() functions and these compute data that live on many partitions m
movements between partitions to execute wider transformations. Since these shuffles the data, they also called shuffle transformations

Use of StructType:-
It acts as a blueprint for creating structured data. It allows us to define a schema by specifying a sequence of StructField objects.
Each StructField represents a column with name, data-type and a optional flag indicating nullability.

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark= SparkSession.builder.appName("Demo").getOrCreate()
schema = StructType([StructField("id", IntegerType(), False),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)])
df = spark.createDataFrame([], schema)
df.show()

+---+----+---+
| id|name|age|
+---+----+---+
+---+----+---+

Use of StructField:

Column specification StructField helps us specify the characteristics of each column. Here's how we can use it to defi

from pyspark.sql import SparkSession

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession.builder.appName("demo").getOrCreate()
name_field = StructField("name", StringType(), True)
schema = StructType([name_field])
df = spark.createDataFrame([], schema)
df.show()

+----+
|name|
+----+
+----+

file:///C:/Users/Asus/Downloads/24 December (1).html 2/29

13/01/2024, 21:08 24 December - Databricks

VACUUM:-

Vacuum is more than just tidying up; it reclaims storage space by physically removing files that are no longer needed due to d
delta Lake tables, this process is known as data compaction, helps keep your storage efficient and query performance snappy.

Where to use it:

Version Retention:- Delta lake retains multiple versions of data for auditing and time travel. However, over time unus
can be your friend here by removing older versions that are no longer relevant.

Small File CleanUp :-

As data evolves, small files can accumulate leading to overhead and performance issues.VACUUM consolidates small f
efficiency.

Deleted data Cleanup:-

When data is deleted or updated, the old files are retained for a long time. USE VACUUM to clean up these files.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("VacuumDemo").getOrCreate()
spark.sql("VACUUM <table_name>")

Handling row dublication in pyspark

Dropping dublicate:-
The simplest approach is to drop dublicate rows based on a subset of columns. this can be done using the
dropDublicates() method.

from pyspark.sql import SparkSession

data = [("Alice", 25), ("Bob", 30), ("Alice", 25), ("Katy", 35)]
columns = ["Name", "Age"]
spark = SparkSession.builder.appName("handling").getOrCreate()
df = spark.createDataFrame(data, columns)
df.show()
# dropping dublicate based on a name and age
dff = df.dropDuplicates(['Name', 'Age'])
dff.show()

# using distinct
df.distinct().show()
| Bob| 30|
|Alice| 25|
| Katy| 35|
+-----+---+

+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
| Katy| 35|
+-----+---+

file:///C:/Users/Asus/Downloads/24 December (1).html 3/29

13/01/2024, 21:08 24 December - Databricks

from pyspark.sql import SparkSession

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
data = [("Alice", 25), ("Bob", 30), ("Alice", 25), ("Katy", 35)]
columns = ["Name", "Age"]
spark = SparkSession.builder.appName("handling").getOrCreate()
df = spark.createDataFrame(data, columns)
window_spec = Window.partitionBy("Name", "Age").orderBy("Name")

# adding a row number column

df_with_row_number = df.withColumn("row_number", row_number().over(window_spec))

# filter out duplicate row

deduplicated_windowfn_df = df_with_row_number.filter(df_with_row_number.row_number == 1).drop("row_number")

deduplicated_windowfn_df.show()

+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
| Katy| 35|
+-----+---+

Pyspark UDF
from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType
data = [("Kolkata", 19),
("Mumbai", 25),
("Delhi", 30)]
columns = ["City", "Temperatures"]
df = spark.createDataFrame(data, columns)
def fahre_to_Celsisu(fareh_temp):
celsisu_temp = (fareh_temp - 32)*5/9
return round(celsisu_temp, 2)

# register_udf
convert_to_celsius_udf = udf(fahre_to_Celsisu, FloatType())

# apply UDF to create a new columns

df_with_celsius = df.withColumn("Temp_C",convert_to_celsius_udf("Temperatures"))
df_with_celsius.show()

+-------+------------+------+
| City|Temperatures|Temp_C|
+-------+------------+------+
|Kolkata| 19| -7.22|
| Mumbai| 25| -3.89|
| Delhi| 30| -1.11|
+-------+------------+------+

file:///C:/Users/Asus/Downloads/24 December (1).html 4/29

13/01/2024, 21:08 24 December - Databricks

Unveiling the Power of PySpark Writer API and Its Dynamic Options!
PySpark, the Python library behind Apache Spark's magic, has completely transformed the landscape of big data processing. The Writer A
offers an elegant solution for writing data to diverse storage systems, while granting you an array of dynamic options to fine-tune yo

1. Adaptable Data Formats: The Writer API effortlessly handles an array of formats - think Parquet, Avro, JSON, and more. It's like a
storage possibilities!
2. Optimized Performance: Engineered for speed, this API lets you optimize performance with features like partitioning, compression, a
sluggish data writes and hello to precision!
3. Dynamic Partitioning: Forget the limitations of static partitioning. With the Writer API, you can dynamically partition data based
storage efficiency and query performance.
4. Flexible Schema Evolution: Embrace changing data structures with grace. The PySpark Writer API seamlessly adapts to evolving schema
remains robust as your information grows.
5. Transactional Confidence: Ensure data integrity with transactional writes. The API ensures that either the entire write operation s
maintaining the integrity of your precious data.

🔹
🔹
mode: Command the writing behavior - choose 'overwrite', 'append', 'ignore', or 'error', based on your needs.

🔹
compression: Compress data like a pro. Opt for codecs such as 'snappy', 'gzip', or 'none' to optimize space and performance.

🔹
partitionBy: Embrace dynamic data partitioning by columns, streamlining organization and boosting query efficiency.

🔹
bucketBy: Distribute data into buckets for a smooth querying experience in Hive-based systems.
dateFormat: Define date and timestamp formats for consistent and structured data representation.

from pyspark.sql import SparkSession

# Initialize a Spark session

spark = SparkSession.builder.appName("PySparkWriterExample").getOrCreate()

# Sample data
data = [("Alice", 28), ("Bob", 22), ("Charlie", 24)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Write data using PySpark Writer API with dynamic options

df.write.mode("overwrite") \
.format("parquet") \
.option("compression", "snappy") \
.option("partitionBy", "Age") \
.save("/path/to/output")

when we can use selectExpr in PySpark?

''' selectExpr() comes in handy when you need to select particular columns while at the same time you also need to apply some sort of
column(s) '''

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("selectExprExamples").getOrCreate()

data = [(1, "Alice", "2021-01-15", 100),

(2, "Bob", "2021-03-20", 200),
(3, "Charlie", "2021-02-10", 150)]

columns = ["id", "name", "birthdate", "salary"]

df = spark.createDataFrame(data, columns)
df.show()

+---+-------+----------+------+
| id| name| birthdate|salary|
+---+-------+----------+------+

file:///C:/Users/Asus/Downloads/24 December (1).html 5/29

13/01/2024, 21:08 24 December - Databricks

| 1| Alice|2021-01-15| 100|
| 2| Bob|2021-03-20| 200|
| 3|Charlie|2021-02-10| 150|
+---+-------+----------+------+

#🔹 Selecting Columns with Alias:

df.selectExpr("name AS full_name", "salary * 1.1 AS updated_salary").show()

#🔹 Mathematical Transformations:
df.selectExpr("salary", "salary * 1.5 AS increased_salary").show()

#🔹 String Manipulation:
df.selectExpr("name", "substring(birthdate, 1, 4) AS birth_year", "concat(name, ' - ', birth_year) AS name_year").show()

#🔹 Conditional Expressions:
df.selectExpr("name", "CASE WHEN salary > 150 THEN 'High Salary' ELSE 'Low Salary' END AS salary_category").show()

#🔹 Type Casting:
df.selectExpr("name", "cast(salary AS double) AS double_salary").show()
| Alice| 2021| Alice - 2021|
| Bob| 2021| Bob - 2021|
|Charlie| 2021|Charlie - 2021|
+-------+----------+--------------+

+-------+-------------+
| name|double_salary|
+-------+-------------+
| Alice| 100.0|
| Bob| 200.0|
|Charlie| 150.0|
+-------+-------------+

String Manipulation in PySpark

from pyspark.sql import SparkSession

from pyspark.sql.functions import trim

# Create a Spark session

spark = SparkSession.builder.appName("TrimDemo").getOrCreate()

# Sample data
data = [(" Apple ",), (" Banana ",), (" Cherry ",)]
df = spark.createDataFrame(data, ["fruits"])
df.show()

+--------+
| fruits|
+--------+
| Apple |
| Banana |
| Cherry |
+--------+

file:///C:/Users/Asus/Downloads/24 December (1).html 6/29

13/01/2024, 21:08 24 December - Databricks

# 1. Using trim.
from pyspark.sql.functions import trim
# Trim leading and trailing spaces
df = df.withColumn("cleaned_data", trim(df["fruits"]))
df.show()

# 2. ltrim: Leading Character Whisperer

# ltrim specializes in removing those pesky leading characters (spaces or any you specify) from your strings.

from pyspark.sql.functions import ltrim

# Remove leading spaces

df = df.withColumn("cleaned_data", ltrim(df["fruits"]))
df.show()

from pyspark.sql.functions import rtrim

df = df.withColumn("cleaned_data", rtrim(df["fruits"]))
df.show()

file:///C:/Users/Asus/Downloads/24 December (1).html 7/29

13/01/2024, 21:08 24 December - Databricks

In a PySpark DataFrame named "orders_df" with columns (OrderID, CustomerID

from pyspark.sql import SparkSession

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

# Create a Spark session

spark = SparkSession.builder.appName("DeduplicateOrders").getOrCreate()
data = [(1, 101, 201, "2023-01-15"),
(2, 102, 202, "2023-01-16"),
(3, 101, 201, "2023-01-17"),
(4, 103, 203, "2023-01-18")]
columns = ["OrderID", "CustomerID", "ProductID", "OrderDate"]
orders_df = spark.createDataFrame(data, columns)

# Deduplicate based on CustomerID and ProductID

window_spec = Window.partitionBy(orders_df["CustomerID"], orders_df["ProductID"]).orderBy('OrderID')
deduplicated_df = orders_df.withColumn("RowNum", row_number().over(window_spec))
deduplicated_df = deduplicated_df.filter(deduplicated_df["RowNum"] == 1).drop("RowNum")

deduplicated_df.show()

+-------+----------+---------+----------+
|OrderID|CustomerID|ProductID| OrderDate|
+-------+----------+---------+----------+
| 1| 101| 201|2023-01-15|
| 2| 102| 202|2023-01-16|
| 4| 103| 203|2023-01-18|
+-------+----------+---------+----------+

''' Imagine you work for a retail company that sells a wide range of products across different categories. You have a massive dataset
with the following columns: "Date," "ProductID," "Category," "QuantitySold," and "Revenue."

Your task is to perform sales analysis to identify trends and patterns within each product category. Specifically, you want to calcula
for each product category to understand how sales are evolving over time '''

from pyspark.sql import Window

from pyspark.sql import functions as func

# Define a window specification partitioned by "Category" and ordered by "Date"

window = Window.partitionBy("Category").orderBy("Date").rowsBetween(Window.currentRow, 2)

# Calculate the rolling sum of revenue for each product category

sales_df = sales_df.withColumn("RollingRevenueSum", func.sum("Revenue").over(window))

NameError: name 'sales_df' is not defined

file:///C:/Users/Asus/Downloads/24 December (1).html 8/29

13/01/2024, 21:08 24 December - Databricks

'''
Imagine you're managing sales data for an e-commerce platform. Your dataset contains information about products, the quantity
sold, and the quantity returned. However, not all data is perfect, and some quantities are missing, represented as NaN (Not-a-
Number). Find net Quantity sold from the data.
'''
from pyspark.sql import SparkSession
from pyspark.sql.functions import nanvl
from pyspark.sql.functions import lit
data = [(1,10.0, 2.0), (2,8.0, float('nan')), (3,12.0, 3.0), (4,float('nan'), 5.0)]
df = spark.createDataFrame(data, ["product_id","quantity_sold", "quantity_returned"])

# Calculate net quantity sold, handling NaN values

net_sales_df = df.withColumn(
"quantity_sold_withoutNull", nanvl(df["quantity_sold"], lit(0.0))
).withColumn(
"quantity_returned_withoutNull", nanvl(df["quantity_returned"], lit(0.0))
)
result_df = net_sales_df.withColumn(
"net_quantity_sold", net_sales_df["quantity_sold_withoutNull"] - net_sales_df["quantity_returned_withoutNull"]
)

# Show the result

result_df.drop('quantity_sold','quantity_returned').show()

+----------+-------------------------+-----------------------------+-----------------+
|product_id|quantity_sold_withoutNull|quantity_returned_withoutNull|net_quantity_sold|
+----------+-------------------------+-----------------------------+-----------------+
| 1| 10.0| 2.0| 8.0|
| 2| 8.0| 0.0| 8.0|
| 3| 12.0| 3.0| 9.0|
| 4| 0.0| 5.0| -5.0|
+----------+-------------------------+-----------------------------+-----------------+

%sql
/* '''
Leveraging Managed and External Tables for Real-World Data Management
''' */

CREATE TABLE managed_product_data (

product_id INT,
product_name STRING,
price DECIMAL,
stock_quantity INT
);

How and when to use broadcast function in pyspark?

The broadcast function in PySpark should be used when you want to optimize join operations between DataFrames, particularly when
one DataFrame is significantly smaller than the other. Broadcasting the smaller DataFrame can greatly improve query performance
by reducing data shuffling and network overhead.

Suppose we have two dataframe sales_data (with millions of records) and customer_info (small with few thousand records)

In this scenario, the customer_info DataFrame is relatively small compared to the sales_data DataFrame. Broadcasting the smaller

🔹
DataFrame (customer_info) is beneficial when:
Joining Large and Small DataFrames: You are joining a large DataFrame (e.g., sales_data) with a significantly smaller

🔹
DataFrame (customer_info).
Reducing Data Shuffling: Broadcasting helps reduce the amount of data that needs to be shuffled across worker nodes during
the join operation, improving performance.

file:///C:/Users/Asus/Downloads/24 December (1).html 9/29

13/01/2024, 21:08 24 December - Databricks

Explain the use of df.explain() and its parameters?

Spark provides a powerful tool called df.explain() that gives you a backstage pass to the inner workings of your DataFrame
operations.
Optimization Insights: Understand how Spark optimizes your queries to boost performance.
Bottleneck Detection: Spot potential bottlenecks and fine-tune your code for speed.
Shuffle and Partitioning: Get a grip on data shuffling and partitioning strategies.
Efficiency Boost: Ensure your code runs efficiently, especially with large-scale datasets.

from pyspark.sql.functions import broadcast

🔹
there are four types of plans: Logical Plan, Analyzed Logical Plan, Optimized Logical Plan and Physical Plan

🔹
Logical Plan: Represents the abstract representation of a query without optimization.

🔹
Analyzed Logical Plan: Represents the query plan after parsing and semantic analysis but before optimization.

🔹
Optimized Logical Plan: Incorporates query optimizations to improve query efficiency.
Physical Plan: Specifies how the query will be executed physically, including details about data shuffling, joins, and
partitioning strategies.

Mastering Sorting in PySpark: nulls in focus

from pyspark.sql import SparkSession

data = [
(1, "Alice", "2023-01-15"),
(2, "Bob", "2022-12-10"),
(3, "Charlie", None),
(4, "David", "2023-02-20"),
(5, "Eve", None),
]
columns = ["customer_id", "customer_name", "subscription_start_date"]
df = spark.createDataFrame(data, columns)
df.orderBy(df.subscription_start_date.asc_nulls_first()).show()
df.orderBy(df.subscription_start_date.asc_nulls_last()).show()

+-----------+-------------+-----------------------+
|customer_id|customer_name|subscription_start_date|
+-----------+-------------+-----------------------+
| 3| Charlie| null|
| 5| Eve| null|
| 2| Bob| 2022-12-10|
| 1| Alice| 2023-01-15|
| 4| David| 2023-02-20|
+-----------+-------------+-----------------------+

+-----------+-------------+-----------------------+
|customer_id|customer_name|subscription_start_date|
+-----------+-------------+-----------------------+
| 2| Bob| 2022-12-10|
| 1| Alice| 2023-01-15|
| 4| David| 2023-02-20|
| 3| Charlie| null|
| 5| Eve| null|
+-----------+-------------+-----------------------+

file:///C:/Users/Asus/Downloads/24 December (1).html 10/29

13/01/2024, 21:08 24 December - Databricks

pyspark.sql.Window function, the rowsBetween method.

'''
Window is used to specify the range of rows considered in a windowed operation. It determines the set of rows relative to the
current row that should be included in the window frame. The frame is used for performing calculations like aggregations,
ranking, and other window functions.

The rowsBetween method accepts two arguments: start and end, which define the boundaries of the frame. These boundaries are
relative to the current row and are specified using specific constants.

🔹
Here are the main constants you can use with rowsBetween:
Window.unboundedPreceding: Represents the earliest possible row. It means all rows from the beginning of the partition up to

🔹
and including the current row.
Window.unboundedFollowing: Represents the latest possible row. It means all rows from the current row up to the end of the

🔹
partition.

🔹
Window.currentRow: Represents the current row.
Any integer values
'''

from pyspark.sql.window import Window

from pyspark.sql.functions import sum
data = [("Aman", 10),
("Bahadur", 20),
("Anjali", 30),
("Babita", 40),
("Aditya", 50)]

columns = ["category", "value"]

df = spark.createDataFrame(data, columns)

# Define a window specification

window_spec = Window.partitionBy("category").orderBy("value")

# Calculate a cumulative sum considering all rows from the start of the partition up to the current row
df.withColumn("cumulative_sum", sum("value").over(window_spec.rowsBetween(Window.unboundedPreceding, Window.currentRow))).show()
df.withColumn("cumulative_sum", sum("value").over(window_spec.rowsBetween(Window.currentRow, Window.unboundedFollowing))).show()
df.withColumn("cumulative_sum", sum("value").over(window_spec.rowsBetween(Window.unboundedPreceding,
Window.unboundedFollowing))).show()
df.withColumn("cumulative_sum", sum("value").over(window_spec.rowsBetween(Window.currentRow,1))).show()

+--------+-----+--------------+
|category|value|cumulative_sum|
+--------+-----+--------------+
| Aditya| 50| 50|
| Aman| 10| 10|
| Anjali| 30| 30|
| Babita| 40| 40|
| Bahadur| 20| 20|
+--------+-----+--------------+

file:///C:/Users/Asus/Downloads/24 December (1).html 11/29

13/01/2024, 21:08 24 December - Databricks

how and when to use datediff() in pyspark?

'''
The datediff function in PySpark is used to calculate the difference in days between two dates. It is a valuable tool in various
real-life scenarios where you need to perform date-based calculations and analysis. Here are some common use cases for datediff
in PySpark:

🔹 Employee Tenure Analysis: You can use datediff to calculate the tenure of employees in an organization. By subtracting the

🔹
hire date from the current date, you can determine how long each employee has been with the company.
Customer Churn Analysis: When analyzing customer behavior, datediff can help calculate the time elapsed between a customer's

🔹
first and last purchase. This information is essential for identifying and predicting customer churn.
Loan and Mortgage Calculations: In the financial sector, you can use datediff to calculate the duration of loans or

🔹
mortgages. This helps in determining interest accrued over time and remaining payment periods.
Event Scheduling: When scheduling events or appointments, datediff can be used to calculate the time remaining until an event

🔹
or the time passed since an event occurred.
Inventory Aging: For managing inventory, you can calculate the age of each item in stock using datediff. This helps in

🔹
identifying and managing aging or obsolete inventory.
Healthcare Analytics: In healthcare, datediff can be used to calculate the length of hospital stays, the time between medical
procedures, or the duration of treatment plans.
'''

from pyspark.sql.functions import datediff, current_date, lit

from pyspark.sql.types import DateType,StructField, StructType, StringType

data = [
('2023-04-08',),
('2023-04-09',),
('2023-04-10',),
('2023-04-11',),
('2023-04-12',),
('2023-04-13',)
]
columns = ['d1']

schema = StructType([
StructField("d1", StringType(), True)
])

df = spark.createDataFrame(data, schema=schema)

# Convert 'd1' to DateType

df = df.withColumn("d1", df["d1"].cast(DateType()))

# Create a new DataFrame with the current date in 'd2'

df_with_current_date = df.withColumn("d2", lit(current_date()))
df_with_current_date.show()
# Calculate the difference in days
df_with_current_date. select(datediff(df_with_current_date.d2, df_with_current_date.d1).alias('diff')).show()
# Calculate the difference in days
df_with_current_date. select(datediff(df_with_current_date.d2, df_with_current_date.d1).alias('diff')).show()

|diff|

file:///C:/Users/Asus/Downloads/24 December (1).html 12/29

13/01/2024, 21:08 24 December - Databricks

| 280|
| 279|
| 278|
| 277|
| 276|
| 275|
+----+

1️⃣ Ingest Data in Real-Time: As users browse your site, their actions are immediately ingested into Delta Live Tables, creating

2️⃣
a real-time data stream.
Transform Data on the Fly: Using Databricks' user-friendly interface, you can apply transformations to this data stream in

3️⃣
real-time. For instance, you can enrich user profiles with up-to-the-second information.
Make Instant Decisions: With this enriched data, you can power real-time dashboards that show which products are trending,

4️⃣
personalize product recommendations instantly, and even detect unusual behavior indicative of fraud, all in the blink of an eye.
Ensure Data Reliability: Delta Live Tables ensures that your data is reliable and transactional, maintaining data integrity
even as you process it in real-time.

%sql
CREATE table orders(

order_id INT,
order_date STRING,
customer_id INT,
order_status STRING
)
using DELTA
TBLPROPERTIES (delta.enableChangeDataFeed = True)

from pyspark.sql import SparkSession

data = [("Alice", 25), ("Bob", 30), ("Charlie", 22), ("David", 28), ("Eve", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])

# Transformation 1: Group by Age

grouped_data = df.groupBy("Age").count()

# Transformation 2: Filter Ages above 25

filtered_data = grouped_data.filter(grouped_data.Age > 25)

# Action 1: Show the grouped data (Lazy Evaluation)

print("Grouped Data (Lazy):")
grouped_data.show()

# Action 2: Show the filtered data (Lazy Evaluation)

print("Filtered Data (Lazy):")
filtered_data.show()

Grouped Data (Lazy):

+---+-----+
|Age|count|
+---+-----+
| 25| 1|
| 30| 1|
| 22| 1|
| 28| 1|
| 35| 1|
+---+-----+

Filtered Data (Lazy):

+---+-----+
|Age|count|
+---+-----+
| 30| 1|

file:///C:/Users/Asus/Downloads/24 December (1).html 13/29

13/01/2024, 21:08 24 December - Databricks

| 28| 1|
| 35| 1|
+---+-----+

n PySpark, transformations are categorized into two types: narrow transformations and wide transformations. These categories are
based on how they impact the execution plan and data shuffling in a Spark job.

Narrow Transformations:
Narrow transformations are those transformations where each output partition depends on a single input partition.
They do not require data shuffling or data movement across partitions, making them more efficient.
Examples of narrow transformations include map, filter, and union.

# Sample DataFrame
data = [("Alice", 25), ("Bob", 30), ("Charlie", 22), ("David", 28), ("Eve", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])

# Narrow Transformation: Filtering ages above 25

filtered_df = df.filter(df.Age > 25)

Wide Transformations:
Wide transformations are those transformations where each output partition depends on multiple input partitions.
They require data shuffling or redistribution across partitions, which can be resource-intensive and time-consuming.
Examples of wide transformations include groupByKey and join.

# Sample DataFrames
df1 = spark.createDataFrame([(1, "Alice"), (2, "Bob"), (3, "Charlie")], ["ID", "Name"])
df2 = spark.createDataFrame([(1, "Math"), (2, "Science"), (3, "History")], ["ID", "Subject"])

# Wide Transformation: Joining two DataFrames

joined_df = df1.join(df2, "ID")

file:///C:/Users/Asus/Downloads/24 December (1).html 14/29

13/01/2024, 21:08 24 December - Databricks

How to handle out of memory error in databricks?

🔹
Here are some steps you can take to handle and mitigate out-of-memory errors in Databricks:
Increase Cluster Memory:
You can try scaling up your cluster by adding more worker nodes or increasing the instance types of the existing nodes. This can

🔹
provide more memory to your Spark jobs.
Optimize Your Code:
Review your Spark code and optimize it to use memory efficiently. Make use of Spark transformations and actions that minimize

🔹
data shuffling and memory usage, such as filter, map, and reduce.
Partition Your Data:
Ensure that your data is properly partitioned. Well-distributed and properly-sized partitions can significantly reduce memory

🔹
pressure during processing.
Use Caching and Persisting:
Cache or persist intermediate DataFrames or RDDs that you need to reuse. This can help avoid recomputation and reduce memory

🔹
pressure.
Increase Spark Driver Memory:

🔹
If you're running into driver memory issues, consider increasing the driver memory configuration for your Spark job.
Monitor and Tune Memory Settings:
Use Databricks' built-in monitoring tools to track the memory usage of your Spark jobs. Adjust Spark memory configurations like

🔹
spark.driver.memory and spark.executor.memory based on your cluster's available resources and job requirements.
Data Sampling and Filtering:
If your dataset is too large to fit in memory, consider sampling or filtering it to work with smaller subsets. This may be

🔹
necessary for exploratory data analysis.
Use Off-Heap Memory:
Spark allows you to use off-heap memory for certain data structures, which can help avoid Java heap space issues. You can

🔹
configure this using the spark.memory.offHeap.enabled configuration.
Consider Cluster Autoscaling:
Enable cluster autoscaling in Databricks so that your cluster can automatically add or remove nodes based on workload, ensuring

🔹
you have the necessary resources when needed.
Use External Storage:
Consider using external storage solutions like Delta Lake or Data Lakes to store and manage large datasets efficiently without

🔹
consuming too much memory.
Regularly Clean Up Unused Data and Resources:
Periodically clean up temporary tables, cached DataFrames, and other resources that are no longer needed to free up memory.

from pyspark.sql import SparkSession

from pyspark.sql.functions import col, regexp_replace

spark = SparkSession.builder.appName("ProductDescriptionCleanup").getOrCreate()

data = [("Product A: $19.99!",),

("Special Offer on Product B - $29.95",),
("Product C (Limited Stock)",)]

df = spark.createDataFrame(data, ["description"])

# Clean and preprocess the descriptions using regex_replace

cleaned_df = df.withColumn("cleaned_description",
regexp_replace(col("description"), r'[^a-zA-Z0-9\s]', ''))
cleaned_df.show(truncate=False)

file:///C:/Users/Asus/Downloads/24 December (1).html 15/29

13/01/2024, 21:08 24 December - Databricks

# . Using na.fill() Method:

# Replace null values in a specific column with a constant value. Here's an example
from pyspark.sql import SparkSession
data = [(1, None), (2, None), (3, "value")]
columns = ["ID", "column_name"]
df = spark.createDataFrame(data, columns)
filledDF = df.na.fill("replacement_value", subset=["column_name"])
filledDF.show()

+---+-----------------+
| ID| column_name|
+---+-----------------+
| 1|replacement_value|
| 2|replacement_value|
| 3| value|
+---+-----------------+

from pyspark.sql import SparkSession

data = [("Product A", "2023-01"), ("Product B", "2023-02"), ("Product C", "2023-03")]
columns = ["product", "sale_date"]

# Create a DataFrame
df = spark.createDataFrame(data, columns)

# Extract the year and month from the sale_date

result_df = df.withColumn("year_month", df["sale_date"].substr(1, 7))
result_df.show()

+---------+---------+----------+
| product|sale_date|year_month|
+---------+---------+----------+
|Product A| 2023-01| 2023-01|
|Product B| 2023-02| 2023-02|
|Product C| 2023-03| 2023-03|
+---------+---------+----------+

file:///C:/Users/Asus/Downloads/24 December (1).html 16/29

13/01/2024, 21:08 24 December - Databricks

Dealing with Data Skewness in PySpark 📊

Data skewness can be a silent performance killer in our PySpark data processing jobs. When a few partitions or keys in our data
set have significantly more data than others, it can lead to imbalanced workloads, slower processing times, and sometimes even
out-of-memory errors.

🔍 Identifying Data Skewness

Before solving a problem, we must first recognize it. In PySpark, you can identify data skewness by monitoring the distribution
of data across partitions using df.groupBy().count(). If some partitions have substantially more records than others, you likely
have a skewness issue.

🔧 Solving Data Skewness in PySpark

🔹
Here are a few strategies to address data skewness in PySpark:
Salting Your Data: Add a random value (salt) to your data using functions like rand() to distribute the data more evenly

🔹
across partitions. Then, repartition the DataFrame.
Bucketing: Use bucketing to pre-organize your data into a fixed number of buckets based on a specific column. This can help

🔹
evenly distribute data and improve join performance.

🔹
Custom Partitioning: Implement custom partitioning logic based on your domain knowledge to evenly distribute the data.
Use Appropriate Joins: Choose the appropriate join type, like broadcast joins or bucketed joins, depending on your data and

🔹
query requirements.
Sampling: In some cases, you might consider using random sampling to reduce the data size, making it more manageable and

🔹
balanced.
Caching: Caching heavily accessed DataFrames or tables can reduce the overhead of repeatedly computing the same data,

🔹
improving query performance.

🚀
Data skewness is a common challenge in distributed data processing, but with these strategies and careful monitoring, we can
keep our PySpark jobs running smoothly.

GroupBy

'''
GroupBy is a fundamental operation in PySpark that allows you to group rows of a DataFrame based on one or more columns and
perform aggregate functions on each group. This operation is essential for summarizing, analyzing, and transforming data
'''

from pyspark.sql.functions import avg, sum, count

data = [("Movie_A", "Drama", 4.5),
("Movie_B", "Comedy", 3.8),
("Movie_C", "Drama", 4.2),
("Movie_D", "Action", 4.0),
("Movie_E", "Comedy", 3.5)]

schema = ["movie", "genre", "rating"]

df = spark.createDataFrame(data, schema=schema)

df.groupBy("genre").agg(avg("rating").alias("avg_rating"),count("genre").alias("Movie in each genre")).show()

+------+----------+-------------------+
| genre|avg_rating|Movie in each genre|
+------+----------+-------------------+
| Drama| 4.35| 2|
|Comedy| 3.65| 2|
|Action| 4.0| 1|
+------+----------+-------------------+

file:///C:/Users/Asus/Downloads/24 December (1).html 17/29

13/01/2024, 21:08 24 December - Databricks

from pyspark.sql.functions import col, to_timestamp , year, month,datediff,current_date, date_add, date_trunc,date_format

data = [("EventA", "2023-11-15 08:30:00"),

("EventB", "2023-12-20 15:45:30"),
("EventC", "2023-12-10 12:00:00")]

schema = ["event_name", "timestamp_str"]

df = spark.createDataFrame(data, schema)
df_timestamps = df.withColumn("event_time", to_timestamp(col("timestamp_str"), "yyyy-MM-dd HH:mm:ss"))
df_timestamps = df_timestamps.withColumn("year", year(col("event_time")))
df_timestamps = df_timestamps.withColumn("month", month(col("event_time")))
df_timestamps = df_timestamps.withColumn("days_diff", datediff(current_date(), col("event_time")))
df_timestamps = df_timestamps.withColumn("next_week", date_add(col("event_time"), 7))
df_timestamps = df_timestamps.withColumn("truncated_hour", date_trunc("hour", col("event_time")))
df_timestamps = df_timestamps.withColumn("formatted_date", date_format(col("event_time"), "dd/MM/yyyy HH:mm:ss"))
df_timestamps.show()

+----------+-------------------+-------------------+----+-----+---------+----------+-------------------+-------------------+
|event_name| timestamp_str| event_time|year|month|days_diff| next_week| truncated_hour| formatted_date|
+----------+-------------------+-------------------+----+-----+---------+----------+-------------------+-------------------+
| EventA|2023-11-15 08:30:00|2023-11-15 08:30:00|2023| 11| 59|2023-11-22|2023-11-15 08:00:00|15/11/2023 08:30:00|
| EventB|2023-12-20 15:45:30|2023-12-20 15:45:30|2023| 12| 24|2023-12-27|2023-12-20 15:00:00|20/12/2023 15:45:30|
| EventC|2023-12-10 12:00:00|2023-12-10 12:00:00|2023| 12| 34|2023-12-17|2023-12-10 12:00:00|10/12/2023 12:00:00|
+----------+-------------------+-------------------+----+-----+---------+----------+-------------------+-------------------+

filter or where Transformation:

Purpose: Filters rows based on specified conditions.
•Syntax: df.filter(condition)
•Example: filtered_df = df.filter(df["age"] > 25)
•Use Cases:
•Filtering rows based on specific column values.
•Complex conditions involving multiple columns.

•dropDuplicates Transformation:
Purpose: Removes duplicate rows based on specified columns.
•Syntax: df.dropDuplicates(subset=columns)
•Example:
•unique_department_df = df.dropDuplicates(subset=["department"])
•Use Cases:
•Ensuring unique values in specific columns.
•Preprocessing data before aggregation.

file:///C:/Users/Asus/Downloads/24 December (1).html 18/29

13/01/2024, 21:08 24 December - Databricks

from pyspark.sql import SparkSession

data = [("Alice", 28, 60000, "HR"),
("Bob", 35, 75000, "Engineering"),
("Charlie", 22, 50000, "Marketing"),
("Alice", 28, 60000, "HR"), # Duplicate row
("David", 40, 90000, "Engineering")]

# Define the schema

schema = ["name", "age", "salary", "department"]

df = spark.createDataFrame(data, schema)

#Filtering Employees with Age Over 30:

senior_employees_df = df.where(df["age"] > 30)
senior_employees_df.show
()

#Dropping Duplicate Rows Based on All Columns:

unique_records_df = df.dropDuplicates()
unique_records_df.show()

+-------+---+------+-----------+
| name|age|salary| department|
+-------+---+------+-----------+
| Alice| 28| 60000| HR|
| Bob| 35| 75000|Engineering|
|Charlie| 22| 50000| Marketing|
| David| 40| 90000|Engineering|
+-------+---+------+-----------+

'''
You have a DataFrame containing information about products, including their names and prices. You are tasked with creating a new
column, "PriceCategory," based on the following conditions:
If the price is less than 50, categorize it as "Low."
If the price is between 50 (inclusive) and 100 (exclusive), categorize it as "Medium."
If the price is 100 or greater, categorize it as "High."
'''

from pyspark.sql import SparkSession

from pyspark.sql.functions import when, col

# Sample data
data = [("ProductA", 30),
("ProductB", 75),
("ProductC", 110)]

# Define the schema

schema = ["ProductName", "Price"]

# Create the DataFrame

productsData = spark.createDataFrame(data, schema)

# Use when and otherwise to categorize product prices

result_df = productsData.withColumn("PriceCategory",
when(col("Price") < 50, "Low")
.when((col("Price") >= 50) & (col("Price") < 100), "Medium")
.otherwise("High")
)

result_df.show()

file:///C:/Users/Asus/Downloads/24 December (1).html 19/29

13/01/2024, 21:08 24 December - Databricks

+-----------+-----+-------------+
|ProductName|Price|PriceCategory|
+-----------+-----+-------------+
| ProductA| 30| Low|
| ProductB| 75| Medium|
| ProductC| 110| High|
+-----------+-----+-------------+

•ArrayType is a data type in PySpark that represents an array or a list of elements.

•It's commonly used when dealing with structured data where a column needs to contain multiple values.

You've been provided with a dataset containing information about stock transactions for an investment portfolio.
Question:
You've been provided with a dataset containing information about stock transactions for an investment portfolio.
1) Calculate the total transaction amount for each transaction. Create a new column named total_transaction in the DataFrame.
2) Compute the cumulative transaction amount for each stock symbol. Create a new column named cumulative_transaction for each
stock symbol, representing the sum of total transaction amounts for all transactions of that stock.
3)Identify the most traded stock symbol for each month. Create a new column named top_stock_monthly that contains the stock
symbol with the highest total quantity traded in each month.
4)Determine the average unit price for each stock symbol.
5)Identify the stocks with the highest lifetime transaction value (LTV).

Unlocking Data Transformation: Using explode Function in PySpark

from pyspark.sql import SparkSession

# Instantiate a Spark
spark = SparkSession.builder.appName("PySparkExplodeFunctionUsage").getOrCreate()
from pyspark.sql.functions import explode

# Sample DataFrame
data = [("Alice", ["apple", "banana", "cherry"]),
("Bob", ["orange", "peach"]),
("Cathy", ["grape", "kiwi", "pineapple"])]

df = spark.createDataFrame(data, ["Name", "Fruits"])

# Using explode function

exploded_df = df.select("Name", explode("Fruits").alias("Fruit"))
exploded_df.show()

file:///C:/Users/Asus/Downloads/24 December (1).html 20/29

13/01/2024, 21:08 24 December - Databricks

file:///C:/Users/Asus/Downloads/24 December (1).html 21/29

Teaching Science in Elementary Grades (Physics, Earth and Space Science
73% (11)
Teaching Science in Elementary Grades (Physics, Earth and Space Science
8 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
PySpark Data Frame Questions PDF
100% (1)
PySpark Data Frame Questions PDF
57 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
Footwear Testing 101
No ratings yet
Footwear Testing 101
73 pages
Earned Value Practice Exercises
100% (1)
Earned Value Practice Exercises
4 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Spark SQL and DataFrames - Spark 2.2.0 Documentation
No ratings yet
Spark SQL and DataFrames - Spark 2.2.0 Documentation
35 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Databricks Project
No ratings yet
Databricks Project
1 page
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
azure DE interview que
100% (1)
azure DE interview que
25 pages
Spark Optimization PDF
100% (1)
Spark Optimization PDF
14 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
PySpark VS SQL Interview Questions
No ratings yet
PySpark VS SQL Interview Questions
16 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Lakshmi Snowflake Resume
No ratings yet
Lakshmi Snowflake Resume
4 pages
Azure DataEngineering End To End Videos
No ratings yet
Azure DataEngineering End To End Videos
21 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Spark Walmart Data Analysis Project
No ratings yet
Spark Walmart Data Analysis Project
17 pages
Pyspark Hands on
No ratings yet
Pyspark Hands on
189 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Snowflake UNIT II
No ratings yet
Snowflake UNIT II
44 pages
50 PySpark Interview Questions.pdf
No ratings yet
50 PySpark Interview Questions.pdf
7 pages
Databricks
No ratings yet
Databricks
11 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
No ratings yet
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
19 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Manage Data Access With Unity Catalog
No ratings yet
Manage Data Access With Unity Catalog
17 pages
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
Structured Streaming
No ratings yet
Structured Streaming
12 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Snowflake
No ratings yet
Snowflake
122 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
35 pages
Databricks
No ratings yet
Databricks
4 pages
Pyspark Notes
No ratings yet
Pyspark Notes
93 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
Top 50 Azure Data Factory Interview Questions and Answers
No ratings yet
Top 50 Azure Data Factory Interview Questions and Answers
14 pages
Databricks Spark Knowledge Base
100% (1)
Databricks Spark Knowledge Base
22 pages
Apache Spark - Optimization Techniques
No ratings yet
Apache Spark - Optimization Techniques
7 pages
Data Warehousing Interview Questions and Answers
No ratings yet
Data Warehousing Interview Questions and Answers
6 pages
DBT Interview Questions
No ratings yet
DBT Interview Questions
18 pages
Expert Tips for ALL Your Snowflake SnowPro Certifications
From Everand
Expert Tips for ALL Your Snowflake SnowPro Certifications
Cristian Scutaru
No ratings yet
Biostatistics for the Biological and Health Sciences 2nd Edition Triola Test Bank download
100% (4)
Biostatistics for the Biological and Health Sciences 2nd Edition Triola Test Bank download
35 pages
Abdul Fathah CV- Accountant_Financial
No ratings yet
Abdul Fathah CV- Accountant_Financial
2 pages
Toolbox Talk 10 Safety Footwear
No ratings yet
Toolbox Talk 10 Safety Footwear
1 page
Upshoot and Downshoot in Duane's Retraction
No ratings yet
Upshoot and Downshoot in Duane's Retraction
4 pages
User Manual For Monthly Return 214
No ratings yet
User Manual For Monthly Return 214
21 pages
Form 50 (See Rule 90 (3) ) Bill of Lading
No ratings yet
Form 50 (See Rule 90 (3) ) Bill of Lading
1 page
The Surprise Symphony - Haydn
No ratings yet
The Surprise Symphony - Haydn
2 pages
Year 5 The Legend of Hengest and Horsa
No ratings yet
Year 5 The Legend of Hengest and Horsa
1 page
Logitech Wireless Combo Mk345
No ratings yet
Logitech Wireless Combo Mk345
79 pages
Crawford v. ABC
No ratings yet
Crawford v. ABC
2 pages
Analysis and Interpretation of Data 3.1 Indian Telecom Sector
No ratings yet
Analysis and Interpretation of Data 3.1 Indian Telecom Sector
16 pages
Gobi SDK Integration Guide For Windows CE
No ratings yet
Gobi SDK Integration Guide For Windows CE
17 pages
9-CLASSIFICATION AND MECHANISM OF INJURY
No ratings yet
9-CLASSIFICATION AND MECHANISM OF INJURY
7 pages
Discourse Studies 2009 Park 79 104
No ratings yet
Discourse Studies 2009 Park 79 104
27 pages
Fencing Permit Form
No ratings yet
Fencing Permit Form
7 pages
MPPT Fuzzy Logic Controller For Photovoltaic System
No ratings yet
MPPT Fuzzy Logic Controller For Photovoltaic System
1 page
Ing B1 Ce Modelo
No ratings yet
Ing B1 Ce Modelo
4 pages
Multi dimensional Approaches Towards New Technology Ashish Bharadwaj download pdf
100% (3)
Multi dimensional Approaches Towards New Technology Ashish Bharadwaj download pdf
55 pages
G9slm3q1final For Student
No ratings yet
G9slm3q1final For Student
22 pages
Repetition, Variety or Contrast, Rhythm, Balance, Compositional Unity, Emphasis, Economy, and Proportion
100% (3)
Repetition, Variety or Contrast, Rhythm, Balance, Compositional Unity, Emphasis, Economy, and Proportion
32 pages
Anodal Block PDF
No ratings yet
Anodal Block PDF
10 pages
Journal of Nano Technology by Dini Fuadillah Sofyan
No ratings yet
Journal of Nano Technology by Dini Fuadillah Sofyan
15 pages
G10-Maths-SQP-5 2024-2025
No ratings yet
G10-Maths-SQP-5 2024-2025
8 pages
Ozonation Photocatalytic
No ratings yet
Ozonation Photocatalytic
5 pages
Standardization of 0.1 N Potassium Permanganate Solution
No ratings yet
Standardization of 0.1 N Potassium Permanganate Solution
2 pages
Multicultural Groupings
No ratings yet
Multicultural Groupings
3 pages
20 ĐỀ THI TUYỂN SINH VÀO 10 MÔN ANH TỈNH HẢI DƯƠNG MỚI 2025 ĐỀ SỐ 1
No ratings yet
20 ĐỀ THI TUYỂN SINH VÀO 10 MÔN ANH TỈNH HẢI DƯƠNG MỚI 2025 ĐỀ SỐ 1
9 pages