0% found this document useful (0 votes)
20 views35 pages

CCS368 Stream Processing Record

The document is a laboratory record for a student in the Artificial Intelligence and Data Science department, detailing practical experiments conducted in the CCS368 Stream Processing course during the academic year 2024-2025. It includes a series of experiments related to MongoDB, Apache Kafka, and Spark Streaming, outlining aims, procedures, and results for each experiment. The record serves as a certification of the student's work and is intended for submission for university practical examinations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views35 pages

CCS368 Stream Processing Record

The document is a laboratory record for a student in the Artificial Intelligence and Data Science department, detailing practical experiments conducted in the CCS368 Stream Processing course during the academic year 2024-2025. It includes a series of experiments related to MongoDB, Apache Kafka, and Spark Streaming, outlining aims, procedures, and results for each experiment. The record serves as a certification of the student's work and is intended for submission for university practical examinations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

LABORATORY RECORD

NAME :

REGISTER NUMBER :

YEAR/SEMESTER : III/VI

DEPARTMENT :ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SUBJECT : CCS368 STREAM PROCESSING

ACADEMIC YEAR : 2024-2025(EVEN)

10
Certified to be the bonafide record of work done by Ms.________________________
Register Number ____________________________ of III Year VI Semester,
B.TECH Artificial Intelligence and Data Science course in the practical CCS368 –

STREAM PROCESSING Laboratory during the academic year 2024-2025.

Faculty in-charge Head of the Department

Submitted for the UNIVERSITY PRACTICAL EXAMINATION held at Arunachala College of Engineering
for Women on …………………

Internal Examiner External Examiner

11
INDEX

S.NO. DATE EXPERIMENT NAME PAGE SIGNATURE


NO.

1.
INSTALL MONGODB

2. DESIGN AND IMPLEMENT


SIMPLEAPPLICATION USING
MONGODB

3. QUERY THE DESIGNED SYSTEM


USING MONGODB

4. CREATE A EVENT STREAM WITH


APACHE KAFKA

5. CREATE A REAL-TIME STREAM


PROCESSING APPLICATION USING
SPARK

6.
BUILD A MICRO-BATCH
APPLICATION

7. REAL-TIME FRAUD AND ANOMALY


DETECTION

8. REAL-TIME PERSONALIZATION,
MARKETING AND ADVERTISING

12
Exp.no: 1

Date:

INSTALL MONGODB

AIM:

To install MongoDB and explore the various protocols.

INSTALL MONGODB

To install MongoDB, begin by visiting the official MongoDB website and navigate to the 'Downloads' section.
Select the appropriate version of MongoDB for your operating system (Windows, macOS, or Linux) and
download the installer package. Once the download is complete, follow the installation instructions provided
by MongoDB. For most operating systems, this involves running the installer package and following the
prompts in the installation wizard. MongoDB is a leading NoSQL database solution renowned for its flexibility,
scalability, and ease of use. It employs a document-oriented data model, storing data in JSON-like documents,
which allows for seamless integration with modern development practices. With its distributed architecture,
MongoDB excels in handling large volumes of data and high throughput applications. Its powerful querying
capabilities, including support for complex aggregations and secondary indexes, make it suitable for a wide
range of use cases, from content management to real-time analytics.

Installing Wireshark on Windows:

Follow the below steps to install MongoDB on Windows:

Step 1: Visit the official MongoDB website using any web browser.

1
Step 2: Click on Download, a new webpage will open with different installers of MongoDB.

Step 3: Downloading of the executable file will start shortly. It is a 64 bit file that will take some time.

Step 4: Now check for the terms and conditions and give it, I agree and click next

2
Step 5: It will prompt confirmation to make changes to your system. Click on Next.

Step 6: Setup screen will appear, click on Install.

Step 7: The next screen will be Installing screen.

Step 8: Next step, go to command prompt, and type mangodb –version. The version will be displayed.

3
Step 9: The next step is setting path. Go to windows and search environment variable.

Step 10: Choose the path option, click on new and copy the path link and click ok. Now the MongoDB installed
successfully.

RESULT:

Thus, MongoDB has been installed and the protocols have been verified successfully.

4
Exp.no: 2

Date:

DESIGN AND IMPLEMENT SIMPLE APPLICATION USING MONGODB

AIM:

To Design and implement simple application using MongoDB

PROCEDURE:

Step 1: Application Requirements

Let's say we're building a web-based task management application called "Taskify."

Data Model:

 Each task will have fields such as title, description, dueDate, priority, and status.
 We’ll have a collection named tasks to store task documents, wach representing a single task.

User Interaction:

 Users will log in to the Taskify website.


 Upon logging in, they’ll see a dashboard displaying a list of their tasks.
 From the dashboard, users can add new tasks,, view task details, update task information, mark tasks
as completed, and delete tasks.
 Users can filter tasks based on status (pending/completed) or priority level using deopdown filters.
 Users can search foe tasks by entering keywords in a search bar.
 Users can create projects and categorize tasks under each project.

Key Functionalities:

 Task Management: Users can perform CRUD operations on tasks.


 Task Filtering: Users can filter tasks based on status and priority.
 Search: Users can search for tasks by title or description.
 Project Management: Users can create projects and organize tasks within them.

Step 2: Set Up MongoDB

1. Install MongoDB on your system if you haven't already.


2. Start the MongoDB server.
3. Connect to MongoDB using a MongoDB client or shell.

5
Step 3: Design Data Model

1. Users Collection:
 Each user in the system will have a unique identifier (_id).
 User documents will contain fields like username, email, and password for authentication
purposes.
 Optionally, additional fields like fullName or profilePicture can be included.

2. Tasks Collection:
 Each task will have a unique identifier (_id).
 Task documents will contain fields such as title, description, dueDate, priority, status, and
userId to associate tasks with users.
 Optionally, we can include fields like projectId to associate tasks with projects if project
management functionality is implemented.

Step 4: Create a New Database and Collections

Use the MongoDb shell or client to create a new database and collections based on your data model.

Step 5: Implement CRUD Operations

Write functions or methods to perform CRUD (Create, Read, Update, Delete) operations on your MongoDB
collections. Here's a small algorithm for each operation:

6
Create:

Read:

Update:

Delete:

Step 6: Connect application to MongoDB

Integrate your application with MongoDB by using a MongoDB driver for your programming language (e.g.,
pymongo for Python).

Step 7: Implement Application Logic


7
Write the logic of the application using the CRUD operations defined earlier. Handle user input, perform data
validation, and execute database operations.

Step 8: Test the Application

Test the application thoroughly to ensure that it functions correctly and handles edge cases gracefully.

Step 9: Deploy the Application

Deploy the application to a production environment, making sure it's accessible to users.

OUTPUT:

RESULT:

Thus, the design and implementation of simple application in MongoDB is executed and verified successfully.

8
Exp.no: 3

Date:

QUERY THE DESIGNED SYSTEM USING MONGODB

AIM:

To query the designed system using MongoDB.

ALGORITHM:

Step 1: Connect to MongoDB

Step 2: Choose a Collection

Step 3: Choose a Query Method

Step 4: Construct Query Parameters

Step 5: Execute the Query

PROGRAM:

Step 1: Connect to MongoDB

const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/taskify', { useNewUrlParser: true, useUnifiedTopology: true })

.then(() => console.log('Connected to MongoDB'))

.catch(error => console.error('Error connecting to MongoDB:', error));

Step 2: Choose a Collection

const Task = mongoose.model('Task', {

title: String,

description: String,

9
dueDate: Date,

priority: String,

status: String,

userId: mongoose.Schema.Types.ObjectId

});

Step 3: Choose a Query Method

Use the ‘find()’ method to retrieve tasks that match our criteria.

Step 4: Construct Query Parameters

const query = { priority: 'High' };

Step 5: Execute the Query

Task.find(query)

.then(tasks => {

// Handle the results

console.log('Tasks with priority High:', tasks);

})

.catch(error => {

console.error('Error querying tasks:', error);

});

OUTPUT:

Tasks with priority High: [

_id: 607c87122bb8541b90831a68,

title: 'Complete project proposal',

description: 'Write a detailed proposal for the upcoming project.',


10
dueDate: 2024-05-10T00:00:00.000Z,

priority: 'High',

status: 'Pending',

userId: 607c86fd2bb8541b90831a67

},

_id: 607c872f2bb8541b90831a69,

title: 'Review code changes',

description: 'Review and provide feedback on the latest code changes.',

dueDate: 2024-05-15T00:00:00.000Z,

priority: 'High',

status: 'Pending',

userId: 607c86fd2bb8541b90831a67

RESULT:

Thus, the experiment Query the designed system using MongoDB is executed and verified successfully.

11
Exp.no: 4

Date:

CREATE A EVENT STREAM WITH APACHE KAFKA

AIM:

To create a event stream with Apache Kafka

PROCEDURE:

Step 1 : Install Apache Kafka

Download and install Apache Kafka from the official website: https://kafka.apache.org/downloads
Follow the installation instructions provided in the documentation for your operating system.

Step 2: Start ZooKeeper

Kafka depends on ZooKeeper for coordination. Start ZooKeeper by running the following command in the Kafka
installation directory:

bin/zookeeper-server-start.sh config/zookeeper.properties

Step 3: Start Kafka Server

Start the Kafka server by running the following command in the Kafka installation directory:

bin/kafka-server-start.sh config/server.properties

Step 4: Create a Topic

Create a Kafka topic to represent your event stream. Topics are used to categorize events. Run the following
command to create a topic named "events":

bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic events

Step 5: Produce Events

Write a Kafka producer application to publish events to the "events" topic. Here's an example using Python and
the confluent_kafka library:

12
from confluent_kafka import Producer

def delivery_callback(err, msg):

if err:

print('Message delivery failed:', err)

else:

print('Message delivered to', msg.topic())

p = Producer({'bootstrap.servers': 'localhost:9092'})

# Produce events

for i in range(10):

p.produce('events', f'Event {i}', callback=delivery_callback)

p.flush()

Step 6: Consume Events

Write a Kafka consumer application to subscribe to the "events" topic and process the events. Here's an example
using Python and the confluent_kafka library:

from confluent_kafka import Consumer, KafkaError

c = Consumer({

'bootstrap.servers': 'localhost:9092',

'group.id': 'my_consumer_group',

'auto.offset.reset': 'earliest'

})

c.subscribe(['events'])

13
try:

while True:

msg = c.poll(timeout=1.0)

if msg is None:

continue

if msg.error():

if msg.error().code() == KafkaError._PARTITION_EOF:

# End of partition

print('%% %s [%d] reached end at offset %d\n' %

(msg.topic(), msg.partition(), msg.offset()))

elif msg.error():

raise KafkaException(msg.error())

else:

print('Received message: {}'.format(msg.value().decode('utf-8')))

except KeyboardInterrupt:

pass

finally:

# Leave group and commit final offsets

c.close()

Step 7: Run producer and consumer

 Run the producer application to publish events to the "events" topic.


 Run the consumer application to subscribe to the "events" topic and consume events.

14
OUTPUT:

RESULT:

Thus, the experiment to create a event stream with Apache Kafka is executed and verified successfully.

15
Ex No:5

Date:

Create a Real-time Stream processing application using Spark

Streaming

Aim:
To create a Real-Time Stream processing application using Spark Streaming.

Procedure:
Setup Apache Spark:

 Ensure you have Apache Spark installed and configured on your system. You can download it from the
official Apache Spark website and follow the installation instructions provided there.

Choose a Streaming Source:

 Determine the source of your streaming data. Common sources include Apache Kafka, Apache Flume,
Kinesis, TCP sockets, or even files in a directory that are continuously updated.

Initialize SparkContext and StreamingContext:

 In your Python script, import the necessary modules from PySpark and initialize a SparkContext and
StreamingContext.

Create DStream:

 Define a DStream (discretized stream) by connecting to the streaming source.

Define Transformations and Actions:

 Apply transformations and actions to the DStream to process the data. This could include operations
like flatMap, map, reduceByKey, etc.

Output the Result:

 Decide what to do with the processed data. You can print it to the console, save it to a file, push it to
another system, or perform further analysis.

16
Program:

from pyspark import SparkContext

from pyspark.streaming import StreamingContext

# Create a local StreamingContext with two working threads and batch interval of 1 second

sc = SparkContext("local[2]", "NetworkWordCount")

ssc = StreamingContext(sc, 1)

# Create a DStream connected to hostname:port

lines = ssc.socketTextStream("localhost", 9999)

# Split each line into words

words = lines.flatMap(lambda line: line.split(" "))

# Count each word in each batch

word_pairs = words.map(lambda word: (word, 1))

word_counts = word_pairs.reduceByKey(lambda x, y: x + y)

# Print the result to console

word_counts.pprint()

ssc.start() # Start the computation

ssc.awaitTermination() # Wait for the computation to terminate

from pyspark.sql import SparkSession

from pyspark.sql.functions import explode

from pyspark.sql.functions import split

spark = SparkSession \

.builder \
17
.appName("StructuredNetworkWordCount") \

.getOrCreate()

# Create DataFrame representing data in the stream

lines = spark \

.readStream \

.format("socket") \

.option("host", "localhost") \

.option("port", 9999) \

.load()

# Split the lines into words

words = lines.select(

explode(

split(lines.value, " ")

).alias("word")

# Generate word count

wordCounts = words.groupBy("word").count()

# Start running the query that prints the running counts to the console

query = wordCounts \

.writeStream \

.outputMode("complete") \

.format("console") \

.start()

query.awaitTermination()
18
Output:

Result:
Thus, we have successfully created a Real-time Stream processing application using Spark Streaming.

19
Ex No:6
Date:

Build a Micro-batch application


Aim:
To build a Micro-batch application for a telephone system.

Procedure:

 Clearly outline the requirements for your micro batch application.

 Select the appropriate technologies for your application based on your requirements and preferences.

 For example, you might choose Python with libraries like SQLAlchemy for database access and Pandas
for data manipulation, or Java with Spring Batch framework.

 Ensure that you have access to the data source where your telephone call records are stored.

 This could be a relational database, a NoSQL database, or any other data storage system.

 Design a data model that represents the structure of your telephone call records.

 Write code to connect to the data source and retrieve call records in batches.

 Implement logic to process each batch of call records.

 This could involve calculations, aggregations, filtering, or any other data manipulation operations.

 Write the calculated statistics to an output destination.

 This could be a database table, a file, a message queue, or any other suitable output method.

Program:
from sqlalchemy import create_engine, select, func
from sqlalchemy.orm import sessionmaker
from datetime import datetime, timedelta
20
from collections import defaultdict

# Define SQLAlchemy engine

engine = create_engine('mysql://username:password@localhost/telephone_system')

# Define SQLAlchemy session


Session = sessionmaker(bind=engine)
session = Session()

# Function to process call records in batches


def process_call_records(batch_size):

# Query call records in batches


offset = 0
while True:

records = session.execute(select(CallRecord).offset(offset).limit(batch_size)).fetchall()
if not records:

break

# Process batch of call records


process_batch(records)

offset += batch_size

# Function to process a batch of call records


def process_batch(records):

user_call_duration = defaultdict(lambda: defaultdict(int))


for record in records:
caller_number = record.caller_number
call_start_time = record.call_start_time
call_end_time = record.call_end_time
call_duration = (call_end_time - call_start_time).total_seconds()

21
# Calculate total call duration per user per day
user_call_duration[caller_number][call_start_time.date()] += call_duration

# Write statistics to output (e.g., another table or file)


write_statistics(user_call_duration)

# Function to write statistics to output


def write_statistics(user_call_duration):
for user, durations_per_day in user_call_duration.items():
for date, total_duration in durations_per_day.items():
print(f"User: {user}, Date: {date}, Total Duration: {total_duration}")

# Call the function to process call records in batches


process_call_records(batch_size=1000)

# Close the session


session.close()

22
Output:

Result:
Thus, we successfully build a Micro-batch application for the telephone system.

23
Ex No:7

Date:

Real-time Fraud and Anomaly Detection

Aim:

To write a program for Real-time Fraud and Anomaly Detection.

Procedure:

Database Setup:

 Install MongoDB: Download and install MongoDB from the official website
(https://www.mongodb.com/try/download/community).
 Start MongoDB: Start the MongoDB service using the appropriate command for your operating system.
 Access MongoDB Shell: Access the MongoDB shell to create a database and collection(s) for storing
transaction data.

Data Ingestion:

 Establish a data pipeline to ingest real-time transaction data into MongoDB. This can be done using
various methods such as MongoDB Change Streams, messaging queues (e.g., Kafka), or directly through
API integration with transaction systems.
 Ensure that each transaction record includes relevant information such as timestamp, transaction amount,
user ID, transaction type, etc.
 Write scripts or applications to continuously insert incoming transaction data into the MongoDB
collection.

Real-time Processing:

 Implement real-time processing logic to analyze incoming transactions for anomalies and fraudulent
patterns.
 Use MongoDB Aggregation Pipeline to perform real-time aggregation, filtering, and analysis of
transaction data.

Anomaly Detection:

 Develop algorithms or rules for detecting anomalies based on transaction attributes, historical patterns,
user behavior, etc.
 Define thresholds or rules for identifying suspicious transactions, such as unusually large amounts,
frequent transactions within a short time, transactions from unusual locations, etc.

24
Program:

from pymongo import MongoClient

from datetime import datetime, timedelta

# Connect to MongoDB

client = MongoClient('localhost', 27017)

db = client['fraud_detection']

transactions_collection = db['transactions']

def record_transaction(transaction):

"""Record a transaction in the MongoDB database."""

transactions_collection.insert_one(transaction)

def detect_anomalies():

"""Detect anomalies in transactions."""

# Define time window for detecting anomalies (e.g., last 24 hours)

window_start = datetime.now() - timedelta(hours=24)

# Query transactions within the time window

transactions = transactions_collection.find({"timestamp": {"$gte": window_start}})

for transaction in transactions:

# Implement your anomaly detection algorithm here

# For demonstration purposes, let's assume any transaction amount above $1000 is considered an anomaly
25
if transaction['amount'] > 1000:

print("Anomaly detected: ", transaction)

if name == " main ":

# Simulate transaction data (replace with your real-time data source)

transactions = [

{"timestamp": datetime.now(), "amount": 500},

{"timestamp": datetime.now() - timedelta(hours=12), "amount": 1500},

{"timestamp": datetime.now() - timedelta(hours=20), "amount": 700},

{"timestamp": datetime.now() - timedelta(hours=3), "amount": 1200},

# Record transactions in MongoDB

for transaction in transactions:

record_transaction(transaction)

# Detect anomalies

detect_anomalies()

26
Output:

Anomaly detected: {'_id': ObjectId('609dc45cb127f47b9d18d274'), 'timestamp': datetime.datetime(2024, 5,


1, 3, 58, 52, 985747), 'amount': 1500}

Anomaly detected: {'_id': ObjectId('609dc45cb127f47b9d18d275'), 'timestamp': datetime.datetime(2024, 4,


30, 9, 58, 52, 985805), 'amount': 1200}

Result:
Thus, we have successfully build a Real-Time Fraud and Anamoly Detection.

27
Ex No:8
Date:

Real-time personalization, Marketing, Advertising


Aim:
To write a program for Real-Time personalization, Marketing and Advertising.

Procedure:
 Design your MongoDB schema to efficiently store and retrieve this data. Consider using collections for
users, products, campaigns, and events.

 Use MongoDB Change Streams to listen for changes in relevant collections. Change Streams allow you
to subscribe to real-time data changes in the database.

 Set up triggers to react to changes in user behavior, product updates, or campaign statuses. For example,
when a user makes a purchase, update their profile and trigger relevant marketing actions. 

 Use MongoDB's aggregation framework to perform real-time analytics on user data.

 Use MongoDB to store marketing campaign data, such as email templates, audience segments, and
campaign performance metrics.

 Integrate with advertising platforms like Google Ads or Facebook Ads to create targeted advertising
campaigns.

 Use MongoDB to store ad creative assets, targeting criteria, and campaign performance data.

 Utilize real-time user data to dynamically adjust ad targeting or creative content.

 Optimize your MongoDB queries and indexes for performance, especially for real-time analytics and
personalization.

 Use MongoDB's built-in tools or third-party monitoring solutions to monitor database metrics, query
performance, and resource utilization.

28
Program:

import pymongo

from pymongo import MongoClient

import datetime

# Connect to MongoDB

client = MongoClient('mongodb://localhost:27017/')

db = client['marketing']

# Define collections

users = db['users']

products = db['products']

campaigns = db['campaigns']

events = db['events']

# Function to update user profile

def update_user_profile(user_id, data):

users.update_one({'_id': user_id}, {'$set': data}, upsert=True)

# Function to track user events

def track_event(user_id, event_type, event_data):

event = {

'user_id': user_id,

'type': event_type,
29
'data': event_data,

'timestamp': datetime.datetime.utcnow()

events.insert_one(event)

# Function to retrieve personalized recommendations for a user

def get_personalized_recommendations(user_id):

# Your recommendation algorithm implementation here

# This can involve querying user's past behavior, preferences, etc.

# For simplicity, let's just return some random products for now

return list(products.find().limit(5))

# Function to send personalized marketing email

def send_personalized_email(user_id, subject, body):

# Your email sending implementation here

print(f"Email sent to user {user_id}: Subject - {subject}, Body - {body}")

# Simulate user activity

def simulate_user_activity():

# Simulate user behavior

user_id = 123

product_id = 456

track_event(user_id, 'view_product', {'product_id': product_id})

30
# Update user profile

update_user_profile(user_id, {'last_activity': datetime.datetime.utcnow()})

# Get personalized recommendations

recommendations = get_personalized_recommendations(user_id)

print("Personalized Recommendations:", recommendations)

# Send personalized marketing email

send_personalized_email(user_id, 'Check out our latest products!', '...')

# Main program

if name == " main ":

# Simulate user activity every 10 seconds

while True:

simulate_user_activity()

# Sleep for 10 seconds

time.sleep(10)m:

31
Output:

Personalized Recommendations: [{'_id': 1, 'name': 'Product A', 'price': 100}, {'_id': 2, 'name': 'Product B',
'price': 150}, {'_id': 3, 'name': 'Product C', 'price': 200}, {'_id': 4, 'name': 'Product D', 'price': 120}, {'_id': 5,
'name': 'Product E', 'price': 180}]

Email sent to user 123: Subject - Check out our latest products!, Body - ...

Result:

Thus, we successfully build a model for Real-time personalization, Marketing, Advertising.

32

You might also like