CCS368 Stream Processing Record
CCS368 Stream Processing Record
NAME :
REGISTER NUMBER :
YEAR/SEMESTER : III/VI
10
Certified to be the bonafide record of work done by Ms.________________________
Register Number ____________________________ of III Year VI Semester,
B.TECH Artificial Intelligence and Data Science course in the practical CCS368 –
Submitted for the UNIVERSITY PRACTICAL EXAMINATION held at Arunachala College of Engineering
for Women on …………………
11
INDEX
1.
INSTALL MONGODB
6.
BUILD A MICRO-BATCH
APPLICATION
8. REAL-TIME PERSONALIZATION,
MARKETING AND ADVERTISING
12
Exp.no: 1
Date:
INSTALL MONGODB
AIM:
INSTALL MONGODB
To install MongoDB, begin by visiting the official MongoDB website and navigate to the 'Downloads' section.
Select the appropriate version of MongoDB for your operating system (Windows, macOS, or Linux) and
download the installer package. Once the download is complete, follow the installation instructions provided
by MongoDB. For most operating systems, this involves running the installer package and following the
prompts in the installation wizard. MongoDB is a leading NoSQL database solution renowned for its flexibility,
scalability, and ease of use. It employs a document-oriented data model, storing data in JSON-like documents,
which allows for seamless integration with modern development practices. With its distributed architecture,
MongoDB excels in handling large volumes of data and high throughput applications. Its powerful querying
capabilities, including support for complex aggregations and secondary indexes, make it suitable for a wide
range of use cases, from content management to real-time analytics.
Step 1: Visit the official MongoDB website using any web browser.
1
Step 2: Click on Download, a new webpage will open with different installers of MongoDB.
Step 3: Downloading of the executable file will start shortly. It is a 64 bit file that will take some time.
Step 4: Now check for the terms and conditions and give it, I agree and click next
2
Step 5: It will prompt confirmation to make changes to your system. Click on Next.
Step 8: Next step, go to command prompt, and type mangodb –version. The version will be displayed.
3
Step 9: The next step is setting path. Go to windows and search environment variable.
Step 10: Choose the path option, click on new and copy the path link and click ok. Now the MongoDB installed
successfully.
RESULT:
Thus, MongoDB has been installed and the protocols have been verified successfully.
4
Exp.no: 2
Date:
AIM:
PROCEDURE:
Let's say we're building a web-based task management application called "Taskify."
Data Model:
Each task will have fields such as title, description, dueDate, priority, and status.
We’ll have a collection named tasks to store task documents, wach representing a single task.
User Interaction:
Key Functionalities:
5
Step 3: Design Data Model
1. Users Collection:
Each user in the system will have a unique identifier (_id).
User documents will contain fields like username, email, and password for authentication
purposes.
Optionally, additional fields like fullName or profilePicture can be included.
2. Tasks Collection:
Each task will have a unique identifier (_id).
Task documents will contain fields such as title, description, dueDate, priority, status, and
userId to associate tasks with users.
Optionally, we can include fields like projectId to associate tasks with projects if project
management functionality is implemented.
Use the MongoDb shell or client to create a new database and collections based on your data model.
Write functions or methods to perform CRUD (Create, Read, Update, Delete) operations on your MongoDB
collections. Here's a small algorithm for each operation:
6
Create:
Read:
Update:
Delete:
Integrate your application with MongoDB by using a MongoDB driver for your programming language (e.g.,
pymongo for Python).
Test the application thoroughly to ensure that it functions correctly and handles edge cases gracefully.
Deploy the application to a production environment, making sure it's accessible to users.
OUTPUT:
RESULT:
Thus, the design and implementation of simple application in MongoDB is executed and verified successfully.
8
Exp.no: 3
Date:
AIM:
ALGORITHM:
PROGRAM:
title: String,
description: String,
9
dueDate: Date,
priority: String,
status: String,
userId: mongoose.Schema.Types.ObjectId
});
Use the ‘find()’ method to retrieve tasks that match our criteria.
Task.find(query)
.then(tasks => {
})
.catch(error => {
});
OUTPUT:
_id: 607c87122bb8541b90831a68,
priority: 'High',
status: 'Pending',
userId: 607c86fd2bb8541b90831a67
},
_id: 607c872f2bb8541b90831a69,
dueDate: 2024-05-15T00:00:00.000Z,
priority: 'High',
status: 'Pending',
userId: 607c86fd2bb8541b90831a67
RESULT:
Thus, the experiment Query the designed system using MongoDB is executed and verified successfully.
11
Exp.no: 4
Date:
AIM:
PROCEDURE:
Download and install Apache Kafka from the official website: https://kafka.apache.org/downloads
Follow the installation instructions provided in the documentation for your operating system.
Kafka depends on ZooKeeper for coordination. Start ZooKeeper by running the following command in the Kafka
installation directory:
bin/zookeeper-server-start.sh config/zookeeper.properties
Start the Kafka server by running the following command in the Kafka installation directory:
bin/kafka-server-start.sh config/server.properties
Create a Kafka topic to represent your event stream. Topics are used to categorize events. Run the following
command to create a topic named "events":
Write a Kafka producer application to publish events to the "events" topic. Here's an example using Python and
the confluent_kafka library:
12
from confluent_kafka import Producer
if err:
else:
p = Producer({'bootstrap.servers': 'localhost:9092'})
# Produce events
for i in range(10):
p.flush()
Write a Kafka consumer application to subscribe to the "events" topic and process the events. Here's an example
using Python and the confluent_kafka library:
c = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'my_consumer_group',
'auto.offset.reset': 'earliest'
})
c.subscribe(['events'])
13
try:
while True:
msg = c.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
# End of partition
elif msg.error():
raise KafkaException(msg.error())
else:
except KeyboardInterrupt:
pass
finally:
c.close()
14
OUTPUT:
RESULT:
Thus, the experiment to create a event stream with Apache Kafka is executed and verified successfully.
15
Ex No:5
Date:
Streaming
Aim:
To create a Real-Time Stream processing application using Spark Streaming.
Procedure:
Setup Apache Spark:
Ensure you have Apache Spark installed and configured on your system. You can download it from the
official Apache Spark website and follow the installation instructions provided there.
Determine the source of your streaming data. Common sources include Apache Kafka, Apache Flume,
Kinesis, TCP sockets, or even files in a directory that are continuously updated.
In your Python script, import the necessary modules from PySpark and initialize a SparkContext and
StreamingContext.
Create DStream:
Apply transformations and actions to the DStream to process the data. This could include operations
like flatMap, map, reduceByKey, etc.
Decide what to do with the processed data. You can print it to the console, save it to a file, push it to
another system, or perform further analysis.
16
Program:
# Create a local StreamingContext with two working threads and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)
word_counts = word_pairs.reduceByKey(lambda x, y: x + y)
word_counts.pprint()
spark = SparkSession \
.builder \
17
.appName("StructuredNetworkWordCount") \
.getOrCreate()
lines = spark \
.readStream \
.format("socket") \
.option("host", "localhost") \
.option("port", 9999) \
.load()
words = lines.select(
explode(
).alias("word")
wordCounts = words.groupBy("word").count()
# Start running the query that prints the running counts to the console
query = wordCounts \
.writeStream \
.outputMode("complete") \
.format("console") \
.start()
query.awaitTermination()
18
Output:
Result:
Thus, we have successfully created a Real-time Stream processing application using Spark Streaming.
19
Ex No:6
Date:
Procedure:
Select the appropriate technologies for your application based on your requirements and preferences.
For example, you might choose Python with libraries like SQLAlchemy for database access and Pandas
for data manipulation, or Java with Spring Batch framework.
Ensure that you have access to the data source where your telephone call records are stored.
This could be a relational database, a NoSQL database, or any other data storage system.
Design a data model that represents the structure of your telephone call records.
Write code to connect to the data source and retrieve call records in batches.
This could involve calculations, aggregations, filtering, or any other data manipulation operations.
This could be a database table, a file, a message queue, or any other suitable output method.
Program:
from sqlalchemy import create_engine, select, func
from sqlalchemy.orm import sessionmaker
from datetime import datetime, timedelta
20
from collections import defaultdict
engine = create_engine('mysql://username:password@localhost/telephone_system')
records = session.execute(select(CallRecord).offset(offset).limit(batch_size)).fetchall()
if not records:
break
offset += batch_size
21
# Calculate total call duration per user per day
user_call_duration[caller_number][call_start_time.date()] += call_duration
22
Output:
Result:
Thus, we successfully build a Micro-batch application for the telephone system.
23
Ex No:7
Date:
Aim:
Procedure:
Database Setup:
Install MongoDB: Download and install MongoDB from the official website
(https://www.mongodb.com/try/download/community).
Start MongoDB: Start the MongoDB service using the appropriate command for your operating system.
Access MongoDB Shell: Access the MongoDB shell to create a database and collection(s) for storing
transaction data.
Data Ingestion:
Establish a data pipeline to ingest real-time transaction data into MongoDB. This can be done using
various methods such as MongoDB Change Streams, messaging queues (e.g., Kafka), or directly through
API integration with transaction systems.
Ensure that each transaction record includes relevant information such as timestamp, transaction amount,
user ID, transaction type, etc.
Write scripts or applications to continuously insert incoming transaction data into the MongoDB
collection.
Real-time Processing:
Implement real-time processing logic to analyze incoming transactions for anomalies and fraudulent
patterns.
Use MongoDB Aggregation Pipeline to perform real-time aggregation, filtering, and analysis of
transaction data.
Anomaly Detection:
Develop algorithms or rules for detecting anomalies based on transaction attributes, historical patterns,
user behavior, etc.
Define thresholds or rules for identifying suspicious transactions, such as unusually large amounts,
frequent transactions within a short time, transactions from unusual locations, etc.
24
Program:
# Connect to MongoDB
db = client['fraud_detection']
transactions_collection = db['transactions']
def record_transaction(transaction):
transactions_collection.insert_one(transaction)
def detect_anomalies():
# For demonstration purposes, let's assume any transaction amount above $1000 is considered an anomaly
25
if transaction['amount'] > 1000:
transactions = [
record_transaction(transaction)
# Detect anomalies
detect_anomalies()
26
Output:
Result:
Thus, we have successfully build a Real-Time Fraud and Anamoly Detection.
27
Ex No:8
Date:
Procedure:
Design your MongoDB schema to efficiently store and retrieve this data. Consider using collections for
users, products, campaigns, and events.
Use MongoDB Change Streams to listen for changes in relevant collections. Change Streams allow you
to subscribe to real-time data changes in the database.
Set up triggers to react to changes in user behavior, product updates, or campaign statuses. For example,
when a user makes a purchase, update their profile and trigger relevant marketing actions.
Use MongoDB to store marketing campaign data, such as email templates, audience segments, and
campaign performance metrics.
Integrate with advertising platforms like Google Ads or Facebook Ads to create targeted advertising
campaigns.
Use MongoDB to store ad creative assets, targeting criteria, and campaign performance data.
Optimize your MongoDB queries and indexes for performance, especially for real-time analytics and
personalization.
Use MongoDB's built-in tools or third-party monitoring solutions to monitor database metrics, query
performance, and resource utilization.
28
Program:
import pymongo
import datetime
# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['marketing']
# Define collections
users = db['users']
products = db['products']
campaigns = db['campaigns']
events = db['events']
event = {
'user_id': user_id,
'type': event_type,
29
'data': event_data,
'timestamp': datetime.datetime.utcnow()
events.insert_one(event)
def get_personalized_recommendations(user_id):
# For simplicity, let's just return some random products for now
return list(products.find().limit(5))
def simulate_user_activity():
user_id = 123
product_id = 456
30
# Update user profile
recommendations = get_personalized_recommendations(user_id)
# Main program
while True:
simulate_user_activity()
time.sleep(10)m:
31
Output:
Personalized Recommendations: [{'_id': 1, 'name': 'Product A', 'price': 100}, {'_id': 2, 'name': 'Product B',
'price': 150}, {'_id': 3, 'name': 'Product C', 'price': 200}, {'_id': 4, 'name': 'Product D', 'price': 120}, {'_id': 5,
'name': 'Product E', 'price': 180}]
Email sent to user 123: Subject - Check out our latest products!, Body - ...
Result:
32