0% found this document useful (0 votes)
38 views28 pages

OIST Bhopal Big Data Lab Manual 2025

Uploaded by

adisri2915
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views28 pages

OIST Bhopal Big Data Lab Manual 2025

Uploaded by

adisri2915
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ORIENTAL INSTITUTE OF SCIENCE & TECHNOLOGY (OIST)

BHOPAL

Session: 2025
LAB MANUAL

[Link]
(Computer Science and Engineering)
VII-Semester
Big Data
(CS704(D))

SubmiitedBy:
Name Of Student:
Enrolment No:

Name of Faculty: Dr. Ahtesham Farooqui


Designation
Associate Professor (CSE)
Department Of Computer Science & Engineering
Index

S. No. Particulars Status


1 Vision and Mission of the Institute
2 Vision and Mission of the Department
3 ProgrammeEducational Objective (PEOs)
4 Programme Specific Outcomes (PSOs)
5 Programme Outcomes (POs)
6 Course Outcomes (COs)
7 CO-PO Matrix
8 CO-PSO Matrix
9 University Scheme
10 Syllabus
11 Academic Calendar
12 List of Experiments
13 Lab Time Table (Individual & Class)
14 Laboratory Plan
15 Lab Manual
16 Important Viva questions
17 Attendance Record
18 Internal Assessment Record of attainment of Course Outcomes

19 Measurement of CO attainment through Internal Assessments


ORIENTAL INSTITUTE OF SCIENCE & TECHNOLOGY (OIST)
BHOPAL
Department Of Computer Science & Engineering

VISION AND MISSION OF THE INSTITUTE

VISION

To become a pioneer in the field of engineering and research by providing quality, skilled and
compatible engineers who are proficient in their domain knowledge

MISSION

To create awareness on cutting edge technologies to make the outgoing engineers


acceptable to the employers meeting their on–job requirements.

To develop an in-house facility for giving solutions to industrial problems.

To inculcate professional ethics, leadership qualities, communication and


entrepreneurial skills satisfying societal needs.
ORIENTAL INSTITUTE OF SCIENCE & TECHNOLOGY (OIST)
BHOPAL
Department Of Computer Science & Engineering

Vision And Mission Of The Department

VISION

The department commits to adopt latest development in the field of technology and industry to
make undergraduates employable and excel in the field of research and entrepreneurship.

MISSION

Create awareness of latest knowledge and technology amongst undergraduates for their
M1:
professional growth
Develop the art of logical thinking to solve complex industrial problems related to Computer
M2:
Science and Engineering resulting in research and innovation
Seek cooperation of industry to make undergraduates aware of industrial work culture and its
M3:
environment.
ORIENTAL INSTITUTE OF SCIENCE & TECHNOLOGY (OIST)
BHOPAL
Department Of Computer Science & Engineering

Programme Specific Outcomes (PSOs)

Develop skills to analyze problems, design algorithms and implement


PSO1: those using recent computer languages.

Impart skill to describe web intelligence, cloud computing, cyber


PSO2: security, machine learning, data science & analytics in order to design
systems
ORIENTAL INSTITUTE OF SCIENCE & TECHNOLOGY (OIST)
BHOPAL
Department Of Computer Science & Engineering

PROGRAMOUTCOMES(POs)

EngineeringGraduatesshouldpossessthefollowing:

1. Engineering knowledge: Apply the knowledge of mathematics, science,


engineeringfundamentals, and an engineering specialization to the solution of complex
engineeringproblems.

2. Problem analysis: Identify, formulate, review research literature, and analyze


complexengineeringproblemsreachingsubstantiatedconclusionsusingfirstprinciplesofmathematic
s,naturalsciences,andengineeringsciences.

3. Design / development of solutions: Design solutions for complex engineering


problemsanddesignsystemcomponentsorprocessesthatmeetthespecifiedneedswithappropriate
consideration for the public health and safety, and the cultural, societal,
andenvironmentalconsiderations.

4. Conductinvestigationsofcomplexproblems:Useresearch-basedknowledgeandresearch methods
including design of experiments, analysis and interpretation of
data,andsynthesisoftheinformationtoprovidevalidconclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources,
andmodernengineeringandITtoolsincludingpredictionandmodelingtocomplexengineeringactiviti
eswithanunderstandingofthelimitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge
toassesssocietal,health,safety,legalandculturalissuesandthe
consequentresponsibilitiesrelevanttotheprofessionalengineeringpractice.

7. Environmentandsustainability:Understandtheimpactof
theprofessionalengineeringsolutionsinsocietalandenvironmentalcontexts,anddemonstratethekno
wledgeof,andneedforsustainabledevelopment.

8. Ethics: Apply ethical principles and commit to professional ethics and


responsibilitiesandnormsoftheengineeringpractice.

9. Individual and team work: Function effectively as an individual, and as a member


orleaderindiverseteams,andinmultidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with


theengineering community and with society at large, such as, being able to comprehend
andwrite effective reports and design documentation, make effective presentations, and
giveandreceiveclearinstructions.

11. Project management and finance: Demonstrate knowledge and understanding of


Course Outcomes (COs)

Course Name: Big Data (CS702)

Year of Study: 2020-21 (VII Semester)

CS-702.1 1. Students should be able to understand the concept and challenges of Big data.

2. Students should be able to demonstrate knowledge of big data analytics.


CS-702.2

CS-702.3 3. Students should be able to develop Big Data Solutions using Hadoop Eco System

4. Students should be able to gain hands-on experience on large-scale analytics


CS-702.4
tools.
BigData

CS702(D)

Syllabus

Unit1: Introduction to Big data, Big data characteristics, Types of big data, Traditional
versus Big data, Evolution of Big data, challenges with Big Data, Technologies available
for Big Data, Infrastructure for Big data, Use of Data Analytics, Desired properties of Big
Data system.

Unit2: Introduction to Hadoop, Core Hadoop components, Hadoop Eco system, Hive
Physical Architecture, Hadoop limitations, RDBMS Versus Hadoop, Hadoop Distributed
File system, Processing Data with Hadoop, Managing Resources and Application with
Hadoop YARN, MapReduce programming.
Unit3: Introduction to Hive Hive Architecture, Hive Data types, Hive Query Language,
Introduction to Pig, Anatomy of Pig, Pig on Hadoop, Use Case for Pig, ETL Processing,
Data types in Pig running Pig, Execution model of Pig, Operators, functions,Data types of
Pig.

Unit4: Introduction to NoSQL, NoSQL Business Drivers, NoSQL Data architectural


patterns, Variations of NOSQL architectural patterns using NoSQL to Manage Big Data,
Introduction to MangoDB

Unit5: Mining social Network Graphs: Introduction Applications of social Network


mining, Social Networks as a Graph, Types of social Networks, Clustering of social
Graphs Direct Discovery of communities in a social graph, Introduction to recommender
system.

Academic Calendar
Academic Calendar for Odd Sem, Session JAN-JULY 2024

B.E. III, V, VII SEM


SCHOOL OF COMPUTER SCIENCE & TECHNONOLGY

GENERALLABORATORYINSTRUCTIONS

1. Students are advised tocome to the laboratory atleast5 minutes before(to the starting
time),thosewhocome after5minuteswillnotbe allowedintothelab.
2. Planyourtaskproperlymuchbeforeto thecommencement,comepreparedtothelab withthesynopsis
/program/experimentdetails.
3. Studentshouldenterintothelaboratory with:
a. Laboratoryobservationnoteswithallthedetails(Problemstatement,Aim,Algorithm,Procedu
re,Program,ExpectedOutput,etc.,)filledin forthelabsession.
b. Laboratory Record updated up to the last session experiments and other utensils (if any)
neededinthelab.
c. ProperDresscodeandIdentitycard.
4. Sign in the laboratory login register, write the TIME-IN, and occupy the computer
systemallottedtoyoubythefaculty.
5. Execute your task in the laboratory, and record the results / output in the lab observation
notebook,andgetcertifiedbythe concernedfaculty.
6. All the students should be polite and cooperative with the laboratory staff, must maintain
thedisciplineanddecencyinthelaboratory.
7. Computer labs are established with sophisticated and high-end branded systems, which
shouldbe utilizedproperly.
8. Students / Faculty must keep their mobile phones in SWITCHED OFF mode during the
labsessions. Misuse of the equipment, misbehaviors with the staff and systems etc., will
attractseverepunishment.
9. Students must take the permission of the faculty in case of any urgency to go out; if
anybodyfound loitering outside the lab / class without permission during working hours will be
treatedseriouslyandpunishedappropriately.
10. Students should LOG OFF/ SHUT DOWN the computer system before he/she leaves the
labafter completing the task (experiment) in all aspects. He/she must ensure the system / seat
iskeptproperly.

HEADOFTHEDEPARTMENT
List of Experiments

[Link] NameoftheExperiment Date Sign

1. Install, configure VMWARE

2 Install, configure and run python, numPy and Pandas.


3. Install, configure and run Hadoop and HDFS.

4. Visualize data using basic plotting techniques in Python.

5. Implement a MapReduce Program that process a dataset.

6. Install, Configure PIG LATIN Language – PIG.

7. Execute PIG command.

8. PIG LATIN modes programs

9. Install and Configure HIVE.

10. Basic HIVE operations.

Signature
Dr. Ahtesham Farooqui
Asso. Prof.
CSE, OIST

8
Experiment-01
INSTALL VMWARE

OBJECTIVE:

ToInstallVMWare.
RESOURCES:

VMWarestack,4GBRAM,Web browser,HardDisk80 GB.

PROGRAMLOGIC:

9
[Link],entertotheofficialsiteofVMwareanddownloadVMwareWorkstation[Link]
[Link]/tryvmware/?p=workstation-w

[Link] downloadingVMwareworkstation,installitonyour PC

[Link] WelcomeScreen

10
INPUT/OUTPUT

PRE LABVIVAQUESTIONS:

1. WhatisVMWarestack?
2. Listoutvariousdataformats?
3. Listoutthecharacteristicsofbig data?

LABASSIGNMENT:
1. InstallPig?
2. InstallHive?

POST LAB VIVAQUESTIONS:


1. ListoutvariousterminologiesinBigDataenvironments?
2. Define bigdataanalytics?

15
Experiment-02
Install,configureandrunpython,numpyandpandas .

PROGRAM:
AIM:ToInstallingandRunningApplicationsOnpython, numpyandpandas.
HowtoInstallAnacondaonWindows?

Anaconda is an open-source software that contains Jupyter, spyder, etc that are used for large
dataprocessing,dataanalytics,[Link]
language. Spyder(sub-application of Anaconda) is used for python. Opencv for pythonwill workin
spyder. Package versions are managed by the package management system calledconda.
To begin working with Anaconda, one must get it installed first. Follow the below instructions
toDownloadandinstallAnacondaonyoursystem:
DownloadandinstallAnaconda:
[Link]
―Python3.7Version‖fortheappropriatearchitecture.

Beginwiththeinstallationprocess:
 GettingStarted

GettingthroughtheLicenseAgreement:
Experiment-03
Install,Configure andRunHadoopandHDFS
PROGRAM:
AIM:ToInstallingandRunningApplicationsOnHadoopandHDFS.
HADOOPINSTALATIONINWINDOWS
1. Prerequisites
HardwareRequirement
* RAM—Min.8GB,ifyouhave SSDin yoursystemthen4GBRAMwouldalsowork.
* CPU—Min. Quadcore,withatleast1.80GHz
2. JRE 1.8—OfflineinstallerforJRE
3. JavaDevelopmentKit—1.8
4. ASoftwareforUn-Zippinglike7Zipor WinRar
* I willbeusinga64-bitwindowsfortheprocess,pleasecheck
anddownloadtheversionsupportedbyyoursystemx86orx64forallthesoftware.
5. DownloadHadoopzip
* IamusingHadoop-2.9.2,youcanuseanyotherSTABLEversionforhadoop.

OncewehaveDownloadedalltheabovesoftware,wecanproceedwithnextstepsin installingtheHadoop.
2. UnzipandInstallHadoop
AfterDownloadingtheHadoop,[Link].

Onceextracted,[Link].
Experiment-04
VisualizeDataUsing BasicPlottingTechniquesInPython.
PROGRAM:
AIM:[Link]
ndas as pb
import [Link] as
pltimportseabornassnscrime=[Link]
d_csv('[Link]')crime

[Link]([Link],[Link]);

importseabornassns
[Link]
mportpandasas pd
import numpy as
npdata=pd.read_csv('[Link]
v')x=[Link]=data.C
[Link](x,y)[Link]
el('Population')[Link]('Car
Theft')
[Link]('PopulationVsCarTheft')
[Link]();
Experiment-05
Implement Word Count/ Frequency Programs Using Map Reduce.

PROGRAM:
AIM:Tocountagivennumberusingmapreducefunctions.

Hadoop Streaming API for helping us passing data between our Map and Reduce codevia
STDIN(standardinput)andSTDOUT(standardoutput).
Note:Changethefilehasexecutionpermission(chmod+x/home/hduser/[Link])Changethefilehasexe
cutionpermission(chmod+x/home/hduser/[Link]
Mapperprogram
[Link]
importsys
#inputcomesfromSTDIN(standardinput)forli
[Link]:
line=[Link]()#removeleadingandtrailingwhitespacewords=li
[Link]()# splitthelineintowords
#increasecountersf
orwordinwords:
#writetheresultstoSTDOUT(standardoutput);#w
hatweoutputherewillbetheinputforthe#Reduces
tep, [Link]
#tab-
delimited;thetrivialwordcountis1print'%s\
t%s'%(word, 1)
Reducer
program"""reduc
[Link]"""
from operator import
itemgetterimportsys
current_word=Nonec
urrent_count =
0word=None

# input comes from


[Link]:
line = [Link]()# remove leading and trailing
whitespace#[Link]
word,count=[Link]('\t',1)
#convertcount(currentlyastring)tointtry:
count=int(count)
exceptValueError:
# count was not a number, so
silently#ignore/discardthisline
continue
#thisIF-switchonlyworksbecauseHadoopsorts map
output#bykey(here:word)beforeitispassedtothereducer
if current_word ==
word:current_count+=c
shuffled[j[0]].append(j[1])
file=open('[Link]','ab')
[Link](shuffled,file)
[Link]()

print("Datahasbeen [Link],run [Link] to reduce the contents


[Link].")

ReducerProgram

Import pickle

file=open('[Link]','rb')
shuffled=[Link](file)
defreduce(shuffled_dict):
reduced={}

foriinshuffled_dict:

reduced[i]=sum(shuffled_dict[i])/len(shuffled_dict[i])

returnreduced
final=reduce(shuffled)
print("Averagevolatileacidityindifferentclasses ofwine:")
foriinfinal:
print(i,':',final[i])

44
EXPERIMENT-07
Install, Configure PIG LATIN LANGUAGE - PIG

OBJECTIVE:
1. InstallationofPIG.

PROGRAMLOGIC:
STEPSFORINSTALLINGAPACHEPIG
1) [Link]
2) SettheenvironmentofPIGinbashrcfile.
3) Pig can run in two
modesLocal Mode and
Hadoop ModePig–
xlocalandpig
4) Grunt
ShellGrunt>
5) LOADINGDataintoGruntShell
DATA=LOAD<CLASSPATH>USINGPigStorage(DELIMITER)as(ATTRIBUTE:
DataType1,ATTRIBUTE:DataType2…..)
6) Describe
DataDescribeD
ATA;
7) DUMPData
DumpDATA;

INPUT/OUTPUT:
InputasWebsiteClickCountData

45
Experiement-08
PIG COMMANDS

OBJECTIVE:
Write and execute PIG PLatinscriptssort,group,join, project, andfilteryourdata.

PROGRAM
LOGIC:FILTER
Data
FDATA=FILTERDATAbyATTRIBUTE=VALUE;
GROUPData
GDATA=GROUPDATAby ATTRIBUTE;
IteratingData
FOR_DATA=FOREACHDATAGENERATE GROUPASGROUP_FUN,
ATTRIBUTE=<VALUE>
Sorting Data
SORT_DATA=ORDERDATABYATTRIBUTEWITHCONDITION;
LIMITData
LIMIT_DATA=LIMITDATACOUNT;
JOIN Data
JOINDATA1BY(ATTRIBUTE1,ATTRIBUTE2….),DATA2BY(ATTRIBUTE3,ATTRIBUTE
….N)

INPUT/ OUTPUT:

47
Experiments-09
PIG LATIN MODES, PROGRAMS
OBJECTIVE:
a. RunthePigLatinScriptstofindWord Count.
b. RunthePigLatinScriptstofind amaxtempfor eachandeveryyear.

PROGRAMLOGIC:
RunthePigLatinScriptstofindWord Count.

lines=LOAD'/user/hadoop/HDFS_File.txt'AS(line:chararray);
words=FOREACHlinesGENERATEFLATTEN(TOKENIZE(line))asword;grouped=
GROUPwords BYword;
wordcount=FOREACHgroupedGENERATEgroup,COUNT(words);DUM
Pwordcount;

RunthePigLatinScriptstofindamaxtempfor eachandeveryyear

--
max_temp.pig:Findsthemaximumtemperaturebyyearrecor
ds=LOAD'input/ncdc/micro-tab/[Link]'
AS(year:chararray,temperature:int,quality:int);
filtered_records=FILTERrecordsBYtemperature!=9999 AND
(quality==0ORquality== 1ORquality== 4ORquality== 5 ORquality==
9);grouped_records=GROUPfiltered_records BYyear;
max_temp=FOREACHgrouped_recordsGENERATEgroup,M
AX(filtered_records.temperature);
DUMP max_temp;

INPUT/OUTPUT:
(1950,0,1)
(1950,22,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)

PRE-LABVIVAQUESTIONS:
1. ListoutthebenefitsofPig?
2. ClassifyPigLatincommandsinPig?

49
Experiment- 10
Install, Configure HIVE.

OBJECTIVE:
InstallationofHIVE.

PROGRAMLOGIC:
InstallMySQL-Server

1) Sudoapt-getinstallmysql-server
2) ConfiguringMySQL UserNameandPassword
3) CreatingUser
andgrantingallPrivilegesMysql–uroot–
proot
Createuser<USER_NAME>identifiedby<PASSWORD>
4) ExtractandConfigureApacheHiveta
[Link]
5) MoveApacheHivefromLocaldirectorytoHomedirectory
6) SetCLASSPATHinbashrc
Export HIVE_HOME = /home/apache-
hiveExportPATH=$PATH:$HIVE_HOME/
bin
7) [Link]
<property>
<name>[Link]</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist
=true
</value>
</property>
<property>
<name>[Link]</name>
<value>[Link]</value>
</property>
<property>
<name>[Link]</name>
<value>hadoop</value>
</property>
<property>
<name>[Link]</name>
<value>hadoop</value>
</property>
8) [Link]/libdirectory.

51
Experiment-11
Basic HIVE OPERATIONS

OBJECTIVE:
UseHivetocreate,alter,anddropdatabases,tables,views,functions,and indexes.

PROGRAMLOGIC:
SYNTAXforHIVEDatabaseOperationsD
ATABASECreation
CREATEDATABASE|SCHEMA[IFNOTEXISTS]<databasename>
DropDatabaseStatement
DROPDATABASEStatementDROP(DATABASE|SCHEMA)[IFEXISTS]
database_name[RESTRICT|CASCADE];
CreatingandDroppingTableinHIVE
CREATE[TEMPORARY][EXTERNAL]TABLE[IFNOTEXISTS][db_name.]
table_name
[(col_namedata_type[COMMENTcol_comment],...)]
[COMMENTtable_comment][ROWFORMATrow_format][STOREDASfile_format
]
Loading Dataintotable
log_dataSyntax:
LOADDATALOCALINPATH'<path>/[Link]'OVERWRITEINTOTABLE
u_data;
AlterTable inHIVE
Syntax

ALTERTABLEnameRENAMETOnew_name
ALTERTABLEname
ADDCOLUMNS(col_spec[,col_spec...])ALTERTABLEnameDROP
[COLUMN]column_name
ALTER TABLE name CHANGE
column_namenew_namenew_typeALTERTABLEnameREPLACECOLU
MNS(col_spec[,col_spec...])CreatingandDroppingView
CREATEVIEW[IFNOTEXISTS]view_name[(column_name[COMMENTcolumn_comment],
...)][COMMENTtable_comment]ASSELECT...
DroppingView
Syntax:
DROPVIEWview_name
FunctionsinHIVE
String Functions:-round(),ceil(),substr(),upper(),reg_exp()etcDate
and Time Functions:- year(), month(), day(), to_date()
etcAggregateFunctions:-sum(), min(),max(), count(), avg() etc

53
INPUT/OUTPUT:

55
PRE-LABVIVAQUESTIONS:
1. How manytypesofjoinsare there inPigLatinwithanexamples?
2. WritetheHivecommandtocreateatablewithfourcolumns:Firstname,lastname,age,andi
ncome?

LABASSIGNMENT:
1. AnalyzestockdatausingApacheHive.

POST-LABVIVAQUESTIONS:
1. WriteashellcommandinHivetolistallthefilesinthecurrentdirectory?
2. ListthecollectiontypesprovidedbyHiveforthepurposeastart-
upcompanywanttouseHiveforstoringitsdata.

56

You might also like