0% found this document useful (0 votes)

11 views23 pages

Bda F

Lab manual of big data analytics

Uploaded by

Rushil Beladiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views23 pages

Bda F

Lab manual of big data analytics

Uploaded by

Rushil Beladiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 1
AIM : Implement following using Map- Reduce a. Matrix multiplication b.
Sorting c. Indexing.

Code:

A. Matrix Multiplication

• Mapper Function:
def mapper(matrix_entry):
matrix, i, j, value = matrix_entry
if matrix == 'A':
for k in range(1, N + 1):
yield (i, k), ('A', j, value)
else:
for k in range(1, N + 1):
yield (k, j), ('B', i, value)

• Reducer Function:

from collections import defaultdict

def reducer(index, values):

A = defaultdict(int)
B = defaultdict(int)

for matrix, idx, value in values:

if matrix == 'A':
A[idx] = value
else:
B[idx] = value

product = sum(A[i] * B[i] for i in A if i in B)

return (index, product)

• Example Usage:

N = 2 # Dimension of the matrices

matrix_entries = [
('A', 1, 1, 2), ('A', 1, 2, 3),
('B', 1, 1, 4), ('B', 2, 1, 5)
]

BAIT,SURAT Page 1
Big Data Analytics [1010206714] [2107020701005]

mapped_entries = []
for entry in matrix_entries:
mapped_entries.extend(mapper(entry))

# Group by key
grouped_entries = defaultdict(list)
for key, value in mapped_entries:
grouped_entries[key].append(value)

# Reduce phase
result = []
for key, values in grouped_entries.items():
result.append(reducer(key, values))

# Display the result

for ((i, j), value) in result:
print(f"Element ({i}, {j}) = {value}")

Output:

B. Sorting
def mapper(value):
yield (value, None)

def reducer(key, values):

yield key

data = [3, 1, 4, 1, 5, 9, 2, 6, 5]

mapped_data = []
for value in data:
mapped_data.extend(mapper(value))

# Sort by key
sorted_data = sorted(mapped_data, key=lambda x: x[0])

BAIT,SURAT Page 2
Big Data Analytics [1010206714] [2107020701005]

# Reduce phase
sorted_result = []
for key, _ in sorted_data:
sorted_result.extend(reducer(key, None))

print(sorted_result)

Output:

C. Indexing
def mapper(document_id, document):
for word in document.split():
yield (word, document_id)

from collections import defaultdict

def reducer(word, document_ids):

yield (word, list(set(document_ids)))

documents = [
("doc1", "hello world"),
("doc2", "hello mapreduce world")
]

mapped_data = []
for doc_id, text in documents:
mapped_data.extend(mapper(doc_id, text))

# Group by key
grouped_data = defaultdict(list)
for key, value in mapped_data:
grouped_data[key].append(value)

# Reduce phase
index = {}
for word, document_ids in grouped_data.items():

BAIT,SURAT Page 3
Big Data Analytics [1010206714] [2107020701005]

index.update(reducer(word, document_ids))

print(index)

Output:

BAIT,SURAT Page 4
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 2
AIM : Distributed Cache & Map Side Join, Reduce side Join Building and
Running a Spark Application Word count in Hadoop and Spark
Manipulating RDD.

Code :

• Map Side Join:

from collections import defaultdict

print("Map Side Join:")

# Assuming we have two datasets, dataset1 and dataset2, where dataset2 is small enough to fit
into memory
dataset1 = [("A", 1), ("B", 2), ("C", 3)]
dataset2 = [("A", "X"), ("B", "Y")]

# Distributed Cache - dataset2 is cached

cache_dict = {key: value for key, value in dataset2}

# Mapper function
def mapper(record):
key, value = record
if key in cache_dict:
yield (key, (value, cache_dict[key]))

mapped_result = []
for record in dataset1:
mapped_result.extend(mapper(record))

print(mapped_result)
print("-------------------------------------------")

• Reduce Side Join

print("Reduce Side Join")

# Assuming dataset1 and dataset2 are large datasets

dataset1 = [("A", 1), ("B", 2), ("C", 3)]
dataset2 = [("A", "X"), ("B", "Y")]

# Mapper function
def mapper1(record):
key, value = record

BAIT,SURAT Page 5
Big Data Analytics [1010206714] [2107020701005]

yield key, ("dataset1", value)

def mapper2(record):
key, value = record
yield key, ("dataset2", value)

mapped_result1 = []
mapped_result2 = []

for record in dataset1:

mapped_result1.extend(mapper1(record))
for record in dataset2:
mapped_result2.extend(mapper2(record))

# Combine both mapped results

mapped_result = mapped_result1 + mapped_result2

# Group by key
grouped_result = defaultdict(list)
for key, value in mapped_result:
grouped_result[key].append(value)

# Reducer function
def reducer(key, values):
dataset1_values = [v for source, v in values if source == "dataset1"]
dataset2_values = [v for source, v in values if source == "dataset2"]
return [(key, (v1, v2)) for v1 in dataset1_values for v2 in dataset2_values]

reduced_result = []
for key, values in grouped_result.items():
reduced_result.extend(reducer(key, values))

print(reduced_result)
print("-------------------------------------------")

Output :

BAIT,SURAT Page 6
Big Data Analytics [1010206714] [2107020701005]

• Word Count in Spark:

from pyspark.sql import SparkSession

# Initialize Spark Session

spark = SparkSession.builder.appName("WordCount").getOrCreate()

# Read input file

input_file = "path/to/input.txt"
text_file = spark.read.text(input_file).rdd

# Word Count Logic

words = text_file.flatMap(lambda line: line.value.split())
word_counts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)

# Collect and print the result

output = word_counts.collect()
for word, count in output:
print(f"{word}: {count}")

spark.stop()

• Word Count in Hadoop

Word Count Mapper:
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String[] tokens = value.toString().split("\\s+");
for (String token : tokens) {
word.set(token);
context.write(word, one);
}
}
}

BAIT,SURAT Page 7
Big Data Analytics [1010206714] [2107020701005]

Word Count Reducer:

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}

• Manipulating RDD in Spark

Basic RDD Operations:
from pyspark.sql import SparkSession

# Initialize Spark Session

spark = SparkSession.builder.appName("RDDExamples").getOrCreate()

# Create an RDD
data = [1, 2, 3, 4, 5]
rdd = spark.sparkContext.parallelize(data)

# Map Transformation
squared_rdd = rdd.map(lambda x: x * x)

# Filter Transformation
filtered_rdd = rdd.filter(lambda x: x % 2 == 0)

# Reduce Action
sum_of_elements = rdd.reduce(lambda a, b: a + b)

# Collect Action
collected_elements = rdd.collect()

print("Squared RDD:", squared_rdd.collect())

print("Filtered RDD:", filtered_rdd.collect())
print("Sum of elements:", sum_of_elements)

BAIT,SURAT Page 8
Big Data Analytics [1010206714] [2107020701005]

print("Collected elements:", collected_elements)

spark.stop()

Output :

BAIT,SURAT Page 9
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 3
AIM : Implementation of Matrix algorithms in Spark Sql programming.

Code:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("MatrixMultiplication").getOrCreate()

# Creating DataFrames for Matrix A

data_a = [(1, 1, 1), (1, 2, 2), (2, 1, 3), (2, 2, 4)]
df_a = spark.createDataFrame(data_a, ["row", "col", "value"])

# Creating DataFrames for Matrix B

data_b = [(1, 1, 5), (1, 2, 6), (2, 1, 7), (2, 2, 8)]
df_b = spark.createDataFrame(data_b, ["row", "col", "value"])

df_a.createOrReplaceTempView("matrix_a")
df_b.createOrReplaceTempView("matrix_b")

result = spark.sql("""
SELECT a.row AS row, b.col AS col, SUM(a.value * b.value) AS value
FROM matrix_a a
JOIN matrix_b b
ON a.col = b.row
GROUP BY a.row, b.col
ORDER BY a.row, b.col
""")

result.show()

Output:

BAIT,SURAT Page 10
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 4
AIM : Implementing K-Means Clustering algorithm using Map-Reduce.

Code:
import math

def mapper(data_point, centroids):

min_dist = float('inf')
nearest_centroid = None
for centroid in centroids:
dist = math.sqrt(sum((data_point[i] - centroid[i]) ** 2 for i in range(len(data_point))))
if dist < min_dist:
min_dist = dist
nearest_centroid = centroid
yield nearest_centroid, data_point

from collections import defaultdict

import numpy as np

def reducer(centroid, data_points):

data_points = np.array(data_points)
new_centroid = data_points.mean(axis=0)
return centroid, new_centroid

def k_means_map_reduce(data, initial_centroids, max_iterations=10):

centroids = initial_centroids
for _ in range(max_iterations):
# Map step
mapped = []
for point in data:
mapped.extend(mapper(point, centroids))

# Group by centroid
grouped = defaultdict(list)
for centroid, point in mapped:
grouped[centroid].append(point)

# Reduce step
new_centroids = []
for centroid, points in grouped.items():
_, new_centroid = reducer(centroid, points)

BAIT,SURAT Page 11
Big Data Analytics [1010206714] [2107020701005]

new_centroids.append(tuple(new_centroid))

# Check for convergence

if set(new_centroids) == set(centroids):
break
centroids = new_centroids

return centroids

# Example usage
data = [
(1.0, 2.0), (1.5, 1.8), (5.0, 8.0),
(8.0, 8.0), (1.0, 0.6), (9.0, 11.0)
]
initial_centroids = [(1.0, 1.0), (5.0, 5.0)]

final_centroids = k_means_map_reduce(data, initial_centroids)

print("Final centroids:", final_centroids)

def k_means_map_reduce(data, initial_centroids, max_iterations=10):

centroids = initial_centroids
for _ in range(max_iterations):
# Map step
mapped = []
for point in data:
mapped.extend(mapper(point, centroids))

# Group by centroid
grouped = defaultdict(list)
for centroid, point in mapped:
grouped[centroid].append(point)

# Reduce step
new_centroids = []
for centroid, points in grouped.items():
_, new_centroid = reducer(centroid, points)
new_centroids.append(tuple(new_centroid))

# Check for convergence

if set(new_centroids) == set(centroids):
break
centroids = new_centroids

BAIT,SURAT Page 12
Big Data Analytics [1010206714] [2107020701005]

return centroids

# Example usage
data = [
(1.0, 2.0), (1.5, 1.8), (5.0, 8.0),
(8.0, 8.0), (1.0, 0.6), (9.0, 11.0)
]
initial_centroids = [(1.0, 1.0), (5.0, 5.0)]

final_centroids = k_means_map_reduce(data, initial_centroids)

print("Final centroids:", final_centroids)

Output:

BAIT,SURAT Page 13
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 5
AIM : Implementing any one Frequent Item set algorithm using Map-
Reduce.

Code:
def mapper(transaction):
items = transaction.split()
item_pairs = []

# Generate all possible pairs of items

for i in range(len(items)):
for j in range(i + 1, len(items)):
item_pairs.append((frozenset([items[i], items[j]]), 1))

return item_pairs

from collections import defaultdict

def reducer(item_pairs):
pair_counts = defaultdict(int)

for item_pair, count in item_pairs:

pair_counts[item_pair] += count

return pair_counts

transactions = [
"bread milk",
"bread butter",
"milk butter",
"bread milk butter",
"bread",
"milk"
]

# Map step
mapped_data = []
for transaction in transactions:
mapped_data.extend(mapper(transaction))

BAIT,SURAT Page 14
Big Data Analytics [1010206714] [2107020701005]

# Reduce step
reduced_data = reducer(mapped_data)

# Print the frequent itemsets

for itemset, count in reduced_data.items():
print(f"Itemset: {itemset}, Count: {count}")

min_support = 2

frequent_itemsets = {itemset: count for itemset, count in reduced_data.items() if count >=

min_support}

for itemset, count in frequent_itemsets.items():

print(f"Frequent Itemset: {itemset}, Count: {count}")

Output:

BAIT,SURAT Page 15
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 6
AIM : Create A Data Pipeline Based On Messaging Using PySpark And Hive
- Covid-19 Analysis.

Step 1: Data Ingestion

First, gather the COVID-19 data from various sources like APIs, CSV files, or databases.
Code:
from pyspark.sql import SparkSession

# Initialize Spark session

spark = SparkSession.builder \
.appName("COVID-19 Analysis") \
.enableHiveSupport() \
.getOrCreate()

# Read data from a CSV file

covid_data = spark.read.csv("path/to/covid_data.csv", header=True, inferSchema=True)

Step 2: Data Processing

Process the data to clean and transform it for analysis.
Code:
from pyspark.sql.functions import col

# Select relevant columns and clean data

covid_data_cleaned = covid_data.select(
col("date"),
col("state"),
col("confirmed_cases"),
col("deaths"),
col("recovered")
).filter(col("confirmed_cases").isNotNull())

Output:

BAIT,SURAT Page 16
Big Data Analytics [1010206714] [2107020701005]

Step 3: Data Storage in Hive

Store the processed data in a Hive table for querying.
Code:
# Save data to a Hive table

covid_data_cleaned.write.mode("overwrite").saveAsTable("covid_analysis.covid_data")

# Verify data is stored in Hive

spark.sql("SELECT * FROM covid_analysis.covid_data").show()

Output:

Step 4: Data Analysis

Run queries on the Hive table to perform analysis.
Code:
# Run a query to get total confirmed cases per state
total_cases_per_state = spark.sql("""
SELECT state, SUM(confirmed_cases) as total_cases
FROM covid_analysis.covid_data
GROUP BY state
ORDER BY total_cases DESC
""")
total_cases_per_state.show()

Output:

BAIT,SURAT Page 17
Big Data Analytics [1010206714] [2107020701005]

Step 5: Messaging and Notification

Set up a messaging system to notify users about significant data insights.

Code:

import smtplib
from email.mime.text import MIMEText

def send_email(subject, body, to):

msg = MIMEText(body)
msg["Subject"] = subject
msg["From"] = "[email protected]"
msg["To"] = to

# Send email
with smtplib.SMTP("smtp.example.com") as server:
server.login("[email protected]", "password")
server.sendmail("[email protected]", to, msg.as_string())

# Example usage
send_email("COVID-19 Update", "Total confirmed cases have increased.",
"[email protected]")

Output:

An email will be sent to the recipient with the subject "COVID-19 Update" and
body "Total confirmed cases have increased."

BAIT,SURAT Page 18
Big Data Analytics [1010206714] [2107020701005]

PRACTICAL – 7
AIM : Case Study: Stage 1: Selection of case study topics and formation of
small working groups of 2/3 students per group. Students engage with the
cases, read through background material provided in the session and work
through an initial set of questions to deepen the understanding of the case.
Sample applications and data will be provided to help students familiarize
themselves with the cases and available (big) data.

Stage 2: The groups are given a specific task relevant to the case in question
and are expected to develop a corresponding big data concept using the
knowledge gained in the course and the parameters set by the case study
scenario. A set of questions that help guide through the scenarios will be
provided.

Stage 3: Each group prepares a short 2 – 5 page report on their results and
a 10 min oral presentation of their big data concept.

Case Study on Amazon

1. Introduction

Amazon is a multinational technology company that was founded in 1994 by Jeff Bezos in
Seattle, Washington. Initially conceived as an online bookstore, Amazon has since expanded
into a variety of other e-commerce categories, including electronics, apparel, groceries, and
digital services like cloud computing (AWS), streaming (Amazon Prime Video), and artificial
intelligence.

Amazon's meteoric rise can be attributed to its pioneering approach to online shopping, focus
on customer-centric services, vast product offerings, innovation in logistics, and continuous
diversification. Today, Amazon is one of the world’s most valuable companies and a dominant
force in both the e-commerce and technology sectors.

2. Key Business Segments

Amazon operates across a variety of business segments, with its revenue and profits driven by
the following key areas:

BAIT,SURAT Page 19
Big Data Analytics [1010206714] [2107020701005]

a. E-commerce Retail

Amazon Marketplace: This is Amazon’s core business, where it allows third-party sellers to
list products alongside its own inventory. This segment includes categories like books,
electronics, clothing, toys, and more.

Amazon Prime: A subscription service that offers free shipping, access to streaming media,
and other benefits. It is a significant driver of customer loyalty and recurring revenue.

Amazon Fresh & Whole Foods: With acquisitions like Whole Foods and the introduction of
Amazon Fresh, Amazon is now a major player in the grocery retail industry.

b. Amazon Web Services (AWS)

Cloud Computing: AWS is the largest cloud computing provider globally, offering services
such as computing power, storage, and databases to businesses. AWS is a critical part of
Amazon’s profitability, contributing a significant portion of its total operating income.

c. Digital Streaming

Amazon Prime Video: Competing with services like Netflix and Disney+, Prime Video offers
a range of original content and licensed films and TV shows. This has helped Amazon penetrate
the entertainment and media sector.

d. Amazon Devices & AI

Alexa & Echo Devices: Amazon’s entry into AI and smart home technology with its Alexa
voice assistant and Echo devices has been a major success. Alexa enables users to control smart
devices, stream music, and access services.

Kindle: Amazon revolutionized digital reading with the Kindle e-reader, which has become
synonymous with e-books.

3. Business Model

Amazon’s business model is primarily based on the following strategies:

a. Customer-Centricity

Amazon has built its business around the philosophy of being "Earth’s most customer-centric
company." It consistently prioritizes customer experience through fast delivery, easy returns,
competitive pricing, and personalized recommendations.

b. Diversification

Amazon's continuous diversification into new industries—cloud computing, entertainment,

grocery retail, AI, and logistics—has reduced its reliance on any single market and created a
robust revenue model with multiple income streams.

BAIT,SURAT Page 20
Big Data Analytics [1010206714] [2107020701005]

c. Economies of Scale & Logistics

Amazon operates a vast network of fulfillment centers, warehouses, and delivery systems that
allow it to achieve economies of scale. This gives Amazon a competitive advantage in both
product availability and delivery speed, with options like same-day or two-day delivery for
Prime members.

d. Data-Driven Decisions

Amazon uses vast amounts of data to inform its business decisions. Customer browsing
behavior, purchase patterns, and search trends help Amazon optimize its product
recommendations, pricing strategy, and inventory management. This data also powers Alexa
and other AI-driven products.

e. Subscription Revenue

Through Amazon Prime, the company has built a substantial recurring revenue stream. With
benefits extending beyond shipping (e.g., streaming, exclusive deals), Prime has become a
powerful customer retention tool.

4. Challenges Faced by Amazon

a. Competition

Amazon faces competition from both traditional brick-and-mortar retailers (like Walmart and
Target) and online-only rivals (like eBay, Alibaba, and other specialized e-commerce
platforms). Additionally, AWS competes with Microsoft Azure, Google Cloud, and other cloud
providers.

b. Regulation and Antitrust Scrutiny

As Amazon’s dominance continues to grow, it has faced increased scrutiny from regulators,
particularly concerning issues like data privacy, market dominance, tax practices, and labor
rights. The company has been subject to antitrust investigations in several countries.

c. Profitability in Retail Business

While Amazon’s retail business is a significant revenue driver, it often operates on thin profit
margins. The company frequently reinvests its profits into expanding its infrastructure,
logistics, and new services, which can limit overall profitability in the short term.

d. Labor and Ethical Concerns

Amazon has faced criticism over its treatment of workers, including reports of high turnover
rates, safety concerns in warehouses, and issues regarding wages and benefits for fulfillment
center employees. The company has also faced accusations of undercutting small businesses
and squeezing suppliers with low-cost demands.

BAIT,SURAT Page 21
Big Data Analytics [1010206714] [2107020701005]

5. Strategic Initiatives and Innovation

a. Amazon Go and Automation

Amazon has ventured into physical retail with Amazon Go stores, which use sensors and AI to
allow customers to shop without checkout lines. This aligns with the company’s focus on
streamlining operations through automation and technology.

b. Acquisitions and Partnerships

Over the years, Amazon has acquired several companies to expand its reach and capabilities.
Key acquisitions include Whole Foods (grocery retail), Ring (smart security), Zoox
(autonomous driving), and PillPack (online pharmacy). These acquisitions reflect Amazon’s
strategy of entering and transforming different industries.

c. Sustainability and Green Initiatives

Amazon has made significant strides toward sustainability, pledging to reach net-zero carbon
by 2040. The company has invested in renewable energy, electric delivery vehicles, and
sustainable packaging to reduce its environmental footprint.

6. Financial Performance

Revenue Growth: Amazon has shown impressive revenue growth over the years, driven by
its diversified business model. For instance, its annual revenue exceeded $500 billion in 2023.

Profit Margins: While Amazon's retail business operates on low margins, AWS delivers high-
profit margins, making it a key driver of the company's overall profitability.

Stock Performance: Amazon's stock has performed exceptionally well since its IPO in 1997,
with the company now being one of the most valuable in the world.

7. Future Outlook

Amazon's future growth is likely to continue through:

Global Expansion: Amazon is expanding its e-commerce footprint in international markets

like India and Europe.

Technological Innovations: With advancements in AI, machine learning, and robotics,

Amazon is well-positioned to continue leading in logistics, customer experience, and product
offerings.

Amazon Prime: The continued growth of Amazon Prime will likely drive recurring revenue,
customer loyalty, and data insights.

BAIT,SURAT Page 22
Big Data Analytics [1010206714] [2107020701005]

Sustainability: Given the increasing focus on climate change and environmental

responsibility, Amazon's green initiatives will play a crucial role in its long-term brand
positioning and regulatory standing.

8. Conclusion

Amazon has evolved from a small online bookstore to one of the most influential companies
in the world. Its commitment to customer-centricity, continuous innovation, and strategic
diversification has allowed it to dominate various industries. However, as it expands into new
territories, Amazon will need to address challenges related to competition, regulation, and labor
practices. The company's ability to adapt and innovate in these areas will be key to its future
success.

BAIT,SURAT Page 23

Error Message List: Lambdatronic 3200
100% (1)
Error Message List: Lambdatronic 3200
20 pages
(Prof. S.S.Sarkate) : " Round Robin Algorithm "
50% (2)
(Prof. S.S.Sarkate) : " Round Robin Algorithm "
18 pages
Bda 041
No ratings yet
Bda 041
35 pages
RDD Actions
No ratings yet
RDD Actions
18 pages
Big Data Lab
No ratings yet
Big Data Lab
12 pages
Big Data Lab
No ratings yet
Big Data Lab
52 pages
Bda Lab Output
No ratings yet
Bda Lab Output
22 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
Comp9313: Big Data Management: Mapreduce
No ratings yet
Comp9313: Big Data Management: Mapreduce
65 pages
2.RDDs in Spark
No ratings yet
2.RDDs in Spark
38 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
Introduction To PySpark
100% (1)
Introduction To PySpark
21 pages
Transformations and Actions: A Visual Guide of The API
No ratings yet
Transformations and Actions: A Visual Guide of The API
122 pages
BDA Manual SHUBHAM
No ratings yet
BDA Manual SHUBHAM
22 pages
Big Data Unit-1
No ratings yet
Big Data Unit-1
9 pages
ScalaJVMBigData SparkLessons PDF
100% (1)
ScalaJVMBigData SparkLessons PDF
100 pages
2 - Intro To PySpark RDD
No ratings yet
2 - Intro To PySpark RDD
35 pages
Bda - Unit I - Lecture 6, 7
No ratings yet
Bda - Unit I - Lecture 6, 7
48 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
BDA Lab Manual - BAD601-Final One - 7-11
No ratings yet
BDA Lab Manual - BAD601-Final One - 7-11
25 pages
BDA - Manual - 1to6 Ayushi
No ratings yet
BDA - Manual - 1to6 Ayushi
22 pages
BDA Mayur
No ratings yet
BDA Mayur
43 pages
BDA Final Manual 1-8 Sourav
No ratings yet
BDA Final Manual 1-8 Sourav
43 pages
Shark: SQL and Rich Analytics at Scale
No ratings yet
Shark: SQL and Rich Analytics at Scale
35 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Note
No ratings yet
Note
14 pages
23CP309T BDA RE-MSE Question Paper
No ratings yet
23CP309T BDA RE-MSE Question Paper
2 pages
Flipkart Recommendation
0% (1)
Flipkart Recommendation
35 pages
Journal
No ratings yet
Journal
47 pages
Spark Running Notes
No ratings yet
Spark Running Notes
19 pages
10 SparkIntroduction BigData 2x
No ratings yet
10 SparkIntroduction BigData 2x
33 pages
Assignment 1 - Ue21cs343ab2 - Big Data
No ratings yet
Assignment 1 - Ue21cs343ab2 - Big Data
8 pages
Apache Spark: CS240A Winter 2016. T Yang
No ratings yet
Apache Spark: CS240A Winter 2016. T Yang
36 pages
Map Reduce
No ratings yet
Map Reduce
26 pages
Big Data Analytics in Apache Spark
No ratings yet
Big Data Analytics in Apache Spark
79 pages
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
No ratings yet
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
5 pages
Bda Practical 2
No ratings yet
Bda Practical 2
3 pages
Big Data - Spark
No ratings yet
Big Data - Spark
42 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
2335 m8 Demo1 v1 0h2 Cq188do
No ratings yet
2335 m8 Demo1 v1 0h2 Cq188do
9 pages
Apache Spark Fundamentals: Getting Started
No ratings yet
Apache Spark Fundamentals: Getting Started
28 pages
Bda Lab
No ratings yet
Bda Lab
11 pages
PySpark Cheat Sheet For RDD Operations
No ratings yet
PySpark Cheat Sheet For RDD Operations
1 page
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
ADB Lab Bismita
No ratings yet
ADB Lab Bismita
15 pages
Mapreduce Final
No ratings yet
Mapreduce Final
55 pages
BDCC IA2 SCHEME - Set 1
No ratings yet
BDCC IA2 SCHEME - Set 1
3 pages
Int 421
No ratings yet
Int 421
2 pages
Docse
No ratings yet
Docse
3 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
25 pages
Spark Using Python
No ratings yet
Spark Using Python
28 pages
Pyspark
No ratings yet
Pyspark
44 pages
Action and Transformations (Wide and Narrow)
No ratings yet
Action and Transformations (Wide and Narrow)
7 pages
LabExam DistributedSystem
No ratings yet
LabExam DistributedSystem
4 pages
BDA List of Experiments For Practical Exam
No ratings yet
BDA List of Experiments For Practical Exam
21 pages
Learn 2
No ratings yet
Learn 2
32 pages
Section8 Mapreduce Solution PDF
No ratings yet
Section8 Mapreduce Solution PDF
5 pages
Big Data Science
No ratings yet
Big Data Science
18 pages
Book Summary and Questions
No ratings yet
Book Summary and Questions
8 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Gaddis Python 6e Chapter 06 - Accessible
No ratings yet
Gaddis Python 6e Chapter 06 - Accessible
36 pages
Starting and Stopping The AS/400
No ratings yet
Starting and Stopping The AS/400
31 pages
AP12A ID 1 0254 QNO RA3 I1 PT Pelayaran Menaratama Samudra Indah
No ratings yet
AP12A ID 1 0254 QNO RA3 I1 PT Pelayaran Menaratama Samudra Indah
4 pages
"Student Management System": A Major Project ON
100% (1)
"Student Management System": A Major Project ON
13 pages
Hong Kong LEGS 2017
No ratings yet
Hong Kong LEGS 2017
230 pages
Latest SAS Version Is 9.2.2. Make Sure You Download The Correct Corresponding Version of SAS ODBC
No ratings yet
Latest SAS Version Is 9.2.2. Make Sure You Download The Correct Corresponding Version of SAS ODBC
4 pages
Capabilities Deck - Spencer Tahil
No ratings yet
Capabilities Deck - Spencer Tahil
15 pages
Z2 Pro Leaflet
No ratings yet
Z2 Pro Leaflet
2 pages
How To Guide - IT Onboarding
No ratings yet
How To Guide - IT Onboarding
5 pages
Swarm Intelligence For Cloud Computing 0367030551 9780367030551 Compress
No ratings yet
Swarm Intelligence For Cloud Computing 0367030551 9780367030551 Compress
219 pages
Medium Static Pressure Duct (A5 Duct) : Compact Design
No ratings yet
Medium Static Pressure Duct (A5 Duct) : Compact Design
2 pages
Skripsi Dian Diana
No ratings yet
Skripsi Dian Diana
85 pages
Crime 11
No ratings yet
Crime 11
2 pages
Library Management System Report
No ratings yet
Library Management System Report
23 pages
PWV 32 Sound Waves and Beats
No ratings yet
PWV 32 Sound Waves and Beats
5 pages
Astm A572m
No ratings yet
Astm A572m
4 pages
Operating System - Module II
No ratings yet
Operating System - Module II
13 pages
DAY 1 Illustrating Polynomial Functions
No ratings yet
DAY 1 Illustrating Polynomial Functions
15 pages
SAP SD Hana Sales - 2022 - Sample Questions - 3
No ratings yet
SAP SD Hana Sales - 2022 - Sample Questions - 3
20 pages
IQ R Datasheet
No ratings yet
IQ R Datasheet
100 pages
16 - BAA - Labour RatesSchedule
No ratings yet
16 - BAA - Labour RatesSchedule
4 pages
Eslam CV
No ratings yet
Eslam CV
3 pages
China - Blockchain - Report - 1583813145 1
No ratings yet
China - Blockchain - Report - 1583813145 1
44 pages
Electrical Calculator Engineering
No ratings yet
Electrical Calculator Engineering
16 pages
A Study On Customer Satisfaction of Airtel Mobile Network in Guntur
No ratings yet
A Study On Customer Satisfaction of Airtel Mobile Network in Guntur
8 pages
Oct 19 Electricity Bill PDF
No ratings yet
Oct 19 Electricity Bill PDF
4 pages
Component Maintenance Manual
No ratings yet
Component Maintenance Manual
2 pages
Logistics Analyst Resume
100% (2)
Logistics Analyst Resume
6 pages