0% found this document useful (0 votes)

8 views25 pages

bda2

This document outlines a report on the implementation of Hadoop for a BCA program, detailing its installation, configuration, and basic commands for handling big data. It includes a case study on the application of Hadoop in healthcare, emphasizing its advantages and challenges, particularly in predictive analytics for patient data management. The report concludes that while Hadoop is a powerful tool for big data analytics, it has limitations that necessitate the use of complementary technologies like Apache Spark.

Uploaded by

arsalan79843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views25 pages

bda2

Uploaded by

arsalan79843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

School of Computer Science and Information Technology

Programme: BCA
2024-2025

SEMESTER: VI

COURSE NAME:
Activity#2: HADOOP

Submitted by
22BCAR0503
SYED ARSALAN

Date of Submission : 31/03/2025

Name of Faculty In-Charge:

Dr. K. Suneetha
Professor & Head
CS and IT

EVALUATION CRITERIA
Report Submission Oral Viva Total Convert 25 into
(15) Presentation (5) (25) 15 marks
(05)

DECLARATION

We declared that Activity-3 has been carried out by us following all ethical practices of Jain
(Deemed-to-be-University)for the partial fulfillment of the General Course of BCA

in the year 2024-2025 (6th Semester)

SYED ARSALAN , 22BCAR0503

2 | Page
INDEX

Sl. No. Table of Contents Page No.

1 Introduction 4-5

2 Interface and Installation Steps 6-9

3 Basic Commands and Execution 10-16

4 Case Study diagram or workflow where applicable 17-20

5 Advantages and Disadvantages 20-23

6 Conclusion and Summary 24

7 References 25

3 | Page
INTRODUCTION
Big Data refers to extremely large and complex datasets that cannot be efficiently processed using
traditional data management tools. These datasets originate from various sources, including social
media, sensors, financial transactions, healthcare records, and more.

Characteristics of Big Data (5Vs Model)

1. Volume – Large amounts of data generated every second.

2. Velocity – The speed at which data is generated and processed.

3. Variety – Different formats like structured (databases), semi-structured (JSON, XML), and
unstructured (videos, images, text).

4. Veracity – Data accuracy and reliability.

5. Value – The ability to extract useful insights from data.

Challenges of Big Data

1. Storage and Management – Traditional databases struggle to store and manage vast
amounts of data.

2. Processing Speed – Handling real-time or batch processing efficiently.

3. Data Integration – Combining data from multiple sources with different formats.

4. Security and Privacy – Ensuring data protection against cyber threats.

5. Scalability – Systems must scale efficiently as data grows.

What is Hadoop?

Apache Hadoop is an open-source framework designed for storing and processing large datasets in a
distributed computing environment. It enables organizations to handle vast amounts of data
efficiently and cost-effectively.

Key Components of Hadoop

1. Hadoop Distributed File System (HDFS) – A distributed storage system that breaks data into
chunks and stores it across multiple nodes.

2. MapReduce – A processing model that distributes computation tasks across multiple servers.

3. YARN (Yet Another Resource Negotiator) – Manages resources and schedules tasks in the
Hadoop ecosystem.

4. HBase, Hive, Pig, and Spark (Hadoop Ecosystem) – Additional tools for data querying, real-
time processing, and analytics.

4 | Page
Why Hadoop?

• Scalability – Handles large-scale data across multiple machines.

• Fault Tolerance – Data replication ensures reliability.

• Cost-Effective – Runs on commodity hardware.

• Flexibility – Supports structured, semi-structured, and unstructured data.

5 | Page
INTERFACE AND INSTALLATION PROCESS

Hadoop does not have a single user-friendly interface like traditional software. Instead, it
provides multiple ways to interact with the system:
1. Command Line Interface (CLI) – Most Hadoop operations are performed using
terminal commands, such as HDFS file management and running MapReduce jobs.

2. Web User Interfaces (Web UI) – Hadoop provides web-based monitoring tools:
o Hadoop ResourceManager UI – Monitors and manages cluster resources.
o HDFS NameNode UI – Tracks file system metadata and block locations.
3. Hadoop Ecosystem Interfaces – Additional tools provide user-friendly interfaces:
o Apache Hive – SQL-like query interface for Hadoop.

o Apache Hue – A web-based UI for Hadoop services.

o Apache Spark UI – Interactive data processing and monitoring tool.
Installing Hadoop on Windows requires additional configurations since Hadoop is designed
to run on Linux. Below is a step-by-step guide to setting up Hadoop on Windows 10/11
(Single Node Cluster).
1. System Requirements
• Operating System: Windows 10/11 (64-bit)
• Java Development Kit (JDK): JDK 8 or later

• Hadoop Version: Latest stable release (e.g., Hadoop 3.3.4)

• RAM: Minimum 8GB recommended
• Storage: At least 50GB free space

2. Install Java JDK

1. Download Java from Oracle JDK or OpenJDK.
2. Install it and set up environment variables:

6 | Page
o Open System Properties → Advanced System Settings → Environment
Variables.
o Under System Variables, create/edit:

▪ JAVA_HOME = C:\Program Files\Java\jdk-8 (or your JDK installation

path)
▪ Add %JAVA_HOME%\bin to the Path variable.
3. Verify Java installation:

bash
CopyEdit
java -version

3. Download and Extract Hadoop

1. Download Hadoop Binary for Windows from Apache Hadoop Releases.
2. Extract the ZIP file to C:\hadoop (or any preferred location).

4. Install and Configure Hadoop

(A) Configure core-site.xml

1. Navigate to C:\hadoop\etc\hadoop\core-site.xml.
2. Open it in a text editor (Notepad++ or VS Code) and add:
xml

CopyEdit
<configuration>
<property>
<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>
</property>
</configuration>

7 | Page
(B) Configure hdfs-site.xml
1. Open C:\hadoop\etc\hadoop\hdfs-site.xml.
2. Add the following configuration:

xml
CopyEdit
<configuration>
<property>

<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

(C) Configure Hadoop Environment File

1. Edit C:\hadoop\etc\hadoop\hadoop-env.cmd.

2. Set the Java path:

cmd
CopyEdit
set JAVA_HOME=C:\Program Files\Java\jdk-8

5. Format the Hadoop Namenode

1. Open Command Prompt as Administrator.
2. Run:

bash
CopyEdit

8 | Page
hdfs namenode -format
6. Verify Installation
• Open your browser and check:

o HDFS Web UI: http://localhost:9870/

o YARN Web UI: http://localhost:8088/
• Run the following to check running services:
bash

CopyEdit
jps
Expected output:
nginx

CopyEdit
NameNode
DataNode
ResourceManager
NodeManager

9 | Page
COMMANDS AND EXECUTION

10 | Page
11 | Page
12 | Page
13 | Page
Steps to Upload a File into HDFS from Local
1. Start Hadoop Services
bash

CopyEdit
start-dfs.cmd
start-yarn.cmd
jps

2. Create a Directory in HDFS

bash
CopyEdit

14 | Page
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/yourusername
hdfs dfs -mkdir /user/yourusername/input

3. Upload a File from Local to HDFS

bash
CopyEdit

hdfs dfs -put C:\localpath\filename.txt /user/yourusername/input/

4. Verify the File in HDFS

bash

CopyEdit
hdfs dfs -ls /user/yourusername/input/

Steps to Upload a Folder into HDFS from Local

1. Start Hadoop Services

bash
CopyEdit
start-dfs.cmd
start-yarn.cmd

jps # Verify services

2. Create a Directory in HDFS (If Not Exists)
bash
CopyEdit

15 | Page
hdfs dfs -mkdir -p /user/yourusername/input
3. Upload a Folder from Local to HDFS
bash

CopyEdit
hdfs dfs -put C:\path\to\local\folder /user/yourusername/input
4. Verify Upload
bash

CopyEdit
hdfs dfs -ls /user/yourusername/input

16 | Page
CASESTUDIES
Case Study: Hadoop in Healthcare for Patient Data Analysis

1. Introduction

1.1 Overview of Big Data in Healthcare

The healthcare sector generates vast amounts of data daily from electronic
health records (EHRs), medical imaging, wearable devices, and insurance
claims. Managing and analyzing this large volume of data efficiently is crucial
for improving patient care and operational efficiency. Traditional database
systems often struggle to handle such massive and diverse datasets, leading to
delays in decision-making and inefficiencies in healthcare delivery.

1.2 Role of Hadoop in Healthcare

Apache Hadoop, an open-source framework, provides a scalable and cost-

effective solution for handling big data in healthcare. By leveraging Hadoop’s
distributed computing model, healthcare institutions can store, process, and
analyze large datasets efficiently. Hadoop enables predictive analytics, real-
time data processing, and machine learning applications that help improve
patient care and reduce costs.

2. Problem Statement
2.1 Challenges in Healthcare Data Management

17 | Page
Hospitals and healthcare institutions face several challenges in managing
patient data:
Data Volume: Huge amounts of structured and unstructured data from EHRs,
medical scans, and IoT devices.
Data Variety: Different formats, including text, images, and real-time sensor
data.
Processing Speed: Traditional systems struggle to process large datasets
quickly.
Predicting Readmissions: Identifying high-risk patients for early intervention to
reduce hospital readmission rates.
2.2 Need for Predictive Analytics
Predicting patient readmission risks is a critical challenge for hospitals.
Readmissions increase healthcare costs and indicate gaps in post-discharge
care. Analyzing historical patient data using Hadoop can help predict which
patients are at high risk of being readmitted and allow proactive intervention.
3. Solution Using Hadoop
3.1 Hadoop-Based Predictive Analytics Model
Hadoop enables efficient storage and processing of vast healthcare datasets.
The process involves:
1. Data Collection: Patient data from EHRs, IoT devices (wearables),
medical imaging, and hospital visit records.
2. Data Storage: Storing structured and unstructured data in Hadoop
Distributed File System (HDFS).
3. Data Processing: Using MapReduce and Apache Spark to clean,
transform, and process data.
4. Machine Learning: Applying predictive analytics to identify patients at
risk of readmission.

18 | Page
5. Visualization & Decision Making: Displaying results in dashboards (e.g.,
Tableau, Power BI) to help healthcare providers make informed
decisions.

Work Flow:

19 | Page
20 | Page
Advantages of Hadoop
1. Scalability

Hadoop is highly scalable because it distributes data across multiple machines.

As data grows, new nodes can be added easily without significant changes to
the existing infrastructure.
2. Cost-Effective
Since Hadoop is open-source, organizations can use commodity hardware (low-
cost servers) instead of expensive, high-end servers. This makes Hadoop an
affordable solution for big data storage and processing.
3. Fault Tolerance
Hadoop replicates data across multiple nodes. If a node fails, the system
automatically recovers the data from another node, ensuring high availability
and reliability.
4. Fast Data Processing
With its parallel processing capabilities, Hadoop processes large volumes of
data efficiently. MapReduce allows data to be processed in parallel across
multiple nodes, reducing execution time.
5. Flexibility in Data Processing
Hadoop supports structured, semi-structured, and unstructured data,
including text, images, videos, and logs. This makes it ideal for handling diverse
datasets.
6. Wide Adoption and Community Support
Being open-source, Hadoop has a strong developer community, extensive
documentation, and a large number of contributors, making it easy to get
support and updates.

21 | Page
Disadvantages of Hadoop
1. Complexity in Setup and Management
Hadoop requires expertise in Java, Linux, and distributed computing, making it
difficult to install, configure, and manage, especially for beginners.
2. High Latency for Small Data
Hadoop is designed for batch processing and is not ideal for real-time data
analytics. For small datasets, traditional databases perform better with lower
latency.
3. Security Issues
By default, Hadoop lacks built-in security features like authentication and
encryption. It needs external security mechanisms such as Kerberos for secure
access.
4. High Memory and CPU Usage
MapReduce operations require significant computational resources, making
Hadoop inefficient for applications that demand low-latency and high-speed
processing.
5. Inefficiency with Iterative Processing
Hadoop’s MapReduce model is not efficient for iterative machine learning and
real-time data analytics. Frameworks like Apache Spark provide better
performance in such cases.
6. Data Integrity Challenges

22 | Page
Managing large-scale data replication can sometimes lead to data
inconsistencies, requiring additional monitoring and maintenance.

23 | Page
Summary
This report explored the implementation of Hadoop, where we successfully
installed and configured the framework and performed basic commands to
understand its working principles. Hadoop's distributed storage (HDFS) and
MapReduce processing enable efficient handling of large datasets, making it a
powerful tool for big data analytics.
Additionally, a case study on AI in healthcare was conducted, detailing its
workflow and how AI-powered systems enhance medical diagnostics,
treatment planning, and patient management. The case study demonstrated
how AI applications leverage machine learning, deep learning, and big data
analytics to improve healthcare efficiency while addressing challenges like data
privacy, bias, and high computational requirements.
Furthermore, the report covered the advantages and disadvantages of Hadoop,
highlighting its scalability, cost-effectiveness, and fault tolerance, along with
challenges such as complex setup, security issues, and inefficiency in real-time
data processing.

Conclusion
Hadoop remains a fundamental tool in big data analytics, offering a robust
infrastructure for processing massive datasets. Its implementation in AI-driven
healthcare proves beneficial in managing and analyzing complex medical data.
However, the challenges associated with Hadoop, such as security concerns
and inefficiencies in real-time analytics, suggest that organizations must
complement Hadoop with other big data technologies like Apache Spark for
better performance.
The integration of AI and big data in healthcare presents vast opportunities to
enhance patient care, optimize operations, and support medical research.
While AI-driven systems continue to evolve, addressing ethical, regulatory, and
data security challenges remains critical for widespread adoption. The synergy
of Hadoop’s big data capabilities and AI innovations in healthcare will likely play
a significant role in the future of medical advancements.

24 | Page
References
1. Big Data Analytics: A Literature Review Paper
Elgendy, Nada & Elragal, Ahmed. (2014). Big Data Analytics: A
Literature Review Paper. Lecture Notes in Computer Science. 8557.
214-227. 10.1007/978-3-319-08976-8_16.

2. The use of Big Data Analytics in healthcare

Batko, K., Ślęzak, A. The use of Big Data Analytics in healthcare. J Big
Data 9, 3 (2022). https://doi.org/10.1186/s40537-021-00553-4

3. Big Data Analytics in Support of the Decision Making

Process
Nada Elgendy a, Ahmed Elragal ,
https://doi.org/10.1016/j.procs.2016.09.251

25 | Page

Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
BDA
No ratings yet
BDA
30 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
UNIT II
No ratings yet
UNIT II
30 pages
Big Data
No ratings yet
Big Data
27 pages
BDA lab Manual
No ratings yet
BDA lab Manual
62 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Bda Practical
No ratings yet
Bda Practical
62 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
HADOOP NOTES
No ratings yet
HADOOP NOTES
8 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
44 pages
Bda Unit-4 Notes
No ratings yet
Bda Unit-4 Notes
15 pages
An Introduction To Hadoop Presentation PDF
100% (1)
An Introduction To Hadoop Presentation PDF
91 pages
Unit_IV_Hadoop
No ratings yet
Unit_IV_Hadoop
90 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
w_java132
No ratings yet
w_java132
14 pages
Unit-4-Unit-4-Bda EDIT
No ratings yet
Unit-4-Unit-4-Bda EDIT
16 pages
Module 4_hadoop
No ratings yet
Module 4_hadoop
5 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
HADOOP
No ratings yet
HADOOP
4 pages
Bda Record
No ratings yet
Bda Record
83 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
CLOUD COMPUTING LAB MANUAL
No ratings yet
CLOUD COMPUTING LAB MANUAL
12 pages
wk8__final
No ratings yet
wk8__final
39 pages
2nd Unit Bda
No ratings yet
2nd Unit Bda
30 pages
Bda Lab 1
No ratings yet
Bda Lab 1
9 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
hadoop.pptx
No ratings yet
hadoop.pptx
61 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
BIG DATA APACHE SPARK123
No ratings yet
BIG DATA APACHE SPARK123
121 pages
CLOUD.pdf
No ratings yet
CLOUD.pdf
138 pages
BDA_Experiment1
No ratings yet
BDA_Experiment1
8 pages
Hadoop and Big Data
No ratings yet
Hadoop and Big Data
41 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
34 pages
Hadoop Components
No ratings yet
Hadoop Components
5 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data Unit 2 (Easy Notes ) Edushine Classes
No ratings yet
Big Data Unit 2 (Easy Notes ) Edushine Classes
35 pages
Unit 4 Bda
No ratings yet
Unit 4 Bda
19 pages
Hadoop
No ratings yet
Hadoop
11 pages
ADM Hadoop
No ratings yet
ADM Hadoop
25 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
VMC 8506
No ratings yet
VMC 8506
60 pages
Aditya Resume Latest 2
No ratings yet
Aditya Resume Latest 2
1 page
Cadence Home Installation Manual (RHEL5)
No ratings yet
Cadence Home Installation Manual (RHEL5)
11 pages
Minor Project
No ratings yet
Minor Project
12 pages
Niral Open Product Frameworks TIP
No ratings yet
Niral Open Product Frameworks TIP
17 pages
Moving From Amba Ahb To Axi Bus in Soc Designs: A Comparative Study
No ratings yet
Moving From Amba Ahb To Axi Bus in Soc Designs: A Comparative Study
4 pages
Mini Helix Driver Instructions
No ratings yet
Mini Helix Driver Instructions
26 pages
Dell EMC Unity - How To Use The mluBadBlockEx - Exe Tool Against A Suspect LUN (Dell EMC Correctable) - Dell US
No ratings yet
Dell EMC Unity - How To Use The mluBadBlockEx - Exe Tool Against A Suspect LUN (Dell EMC Correctable) - Dell US
6 pages
Unix Common CMD
No ratings yet
Unix Common CMD
17 pages
C Programming Interview Prep Questions
No ratings yet
C Programming Interview Prep Questions
8 pages
Lab 4 Keypad LCD Interfacing
No ratings yet
Lab 4 Keypad LCD Interfacing
5 pages
5. MP7900 Maintenance Manual-328925J
No ratings yet
5. MP7900 Maintenance Manual-328925J
54 pages
CEA 201 - Full Flashcards - Quizlet
No ratings yet
CEA 201 - Full Flashcards - Quizlet
12 pages
Please Read This
No ratings yet
Please Read This
3 pages
Manual Unitech HT630
No ratings yet
Manual Unitech HT630
19 pages
What Is Bluetooth Mesh?
No ratings yet
What Is Bluetooth Mesh?
13 pages
Novell Course 3071 - SUSE Linux Enterprise Server 10 - Fundamentals PDF
0% (1)
Novell Course 3071 - SUSE Linux Enterprise Server 10 - Fundamentals PDF
382 pages
How To EMUBOX CCcam Windows Complete Install
No ratings yet
How To EMUBOX CCcam Windows Complete Install
11 pages
WIZ107 108SR An S2E-Programming-Guide V100E
No ratings yet
WIZ107 108SR An S2E-Programming-Guide V100E
4 pages
Release Notes Palletizing Powerpac 6.03 PDF
No ratings yet
Release Notes Palletizing Powerpac 6.03 PDF
15 pages
Buy Ebook How Linux Works What Every Superuser Should Know 3rd Edition Brian Ward Cheap Price
100% (5)
Buy Ebook How Linux Works What Every Superuser Should Know 3rd Edition Brian Ward Cheap Price
53 pages
PD Laptop&Printer
No ratings yet
PD Laptop&Printer
9 pages
Code Optimization Compiler Construction
No ratings yet
Code Optimization Compiler Construction
5 pages
Fanuc 10M Configuration Document
100% (1)
Fanuc 10M Configuration Document
3 pages
Ethical Hacking & Cyber Security Course - Massmatic Cyber Forensic & Information Security
No ratings yet
Ethical Hacking & Cyber Security Course - Massmatic Cyber Forensic & Information Security
10 pages
USBcurrent
No ratings yet
USBcurrent
3 pages
NSE7_SDW-7.2-questions
No ratings yet
NSE7_SDW-7.2-questions
7 pages
ECE123 Logic Chapter - 1 - Number Systems
No ratings yet
ECE123 Logic Chapter - 1 - Number Systems
45 pages
Corel Draw Portable 64-Bit
No ratings yet
Corel Draw Portable 64-Bit
3 pages
SF Dump
No ratings yet
SF Dump
16 pages

bda2

Uploaded by

bda2

Uploaded by

School of Computer Science and Information Technology

Date of Submission : 31/03/2025

Name of Faculty In-Charge​:

in the year 2024-2025 (6th Semester)

​ ​ SYED ARSALAN , 22BCAR0503

Sl. No. Table of Contents Page No.

2 Interface and Installation Steps 6-9

3 Basic Commands and Execution 10-16

4 Case Study diagram or workflow where applicable 17-20

5 Advantages and Disadvantages 20-23

6 Conclusion and Summary 24

Characteristics of Big Data (5Vs Model)

1. Volume – Large amounts of data generated every second.

2. Velocity – The speed at which data is generated and processed.

4. Veracity – Data accuracy and reliability.

5. Value – The ability to extract useful insights from data.

Challenges of Big Data

2. Processing Speed – Handling real-time or batch processing efficiently.

4. Security and Privacy – Ensuring data protection against cyber threats.

5. Scalability – Systems must scale efficiently as data grows.

Key Components of Hadoop

• Scalability – Handles large-scale data across multiple machines.

• Fault Tolerance – Data replication ensures reliability.

• Cost-Effective – Runs on commodity hardware.

• Flexibility – Supports structured, semi-structured, and unstructured data.

o Apache Hue – A web-based UI for Hadoop services.

• Hadoop Version: Latest stable release (e.g., Hadoop 3.3.4)

2. Install Java JDK

▪ JAVA_HOME = C:\Program Files\Java\jdk-8 (or your JDK installation

3. Download and Extract Hadoop

4. Install and Configure Hadoop

(A) Configure core-site.xml

(C) Configure Hadoop Environment File

2. Set the Java path:

5. Format the Hadoop Namenode

o HDFS Web UI: http://localhost:9870/

2. Create a Directory in HDFS

3. Upload a File from Local to HDFS

hdfs dfs -put C:\localpath\filename.txt /user/yourusername/input/

4. Verify the File in HDFS

Steps to Upload a Folder into HDFS from Local

jps # Verify services

1.1 Overview of Big Data in Healthcare

1.2 Role of Hadoop in Healthcare

Apache Hadoop, an open-source framework, provides a scalable and cost-

Hadoop is highly scalable because it distributes data across multiple machines.

2. The use of Big Data Analytics in healthcare

3. Big Data Analytics in Support of the Decision Making

You might also like

Name of Faculty In-Charge:

SYED ARSALAN , 22BCAR0503