0% found this document useful (0 votes)
56 views

Apache HTTP Server Log Analysis Business Analytics Using R - Project Business Analytics Using R - Project - Apace Log Analysis

This document outlines a project to analyze Apache HTTP server logs using R. It describes objectives to enhance skills in R, exploratory analysis, and visualization. Procedures include parsing sample logs into a dataframe, generating required analytics on hosts, pages, and downloads by month, and creating visualizations like time series graphs. Questions are provided to analyze results for most popular hosts, pages, downloads, and minimum/maximum sizes. The report structure is defined including an overview, code/commands, results, and summary of using R for data analytics.

Uploaded by

Owais Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Apache HTTP Server Log Analysis Business Analytics Using R - Project Business Analytics Using R - Project - Apace Log Analysis

This document outlines a project to analyze Apache HTTP server logs using R. It describes objectives to enhance skills in R, exploratory analysis, and visualization. Procedures include parsing sample logs into a dataframe, generating required analytics on hosts, pages, and downloads by month, and creating visualizations like time series graphs. Questions are provided to analyze results for most popular hosts, pages, downloads, and minimum/maximum sizes. The report structure is defined including an overview, code/commands, results, and summary of using R for data analytics.

Uploaded by

Owais Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Apache HTTP Server Log Analysis

Business Analytics Using R Project


Business Analytics Using R Project Apace Log Analysis
Objec t ive s
T his pr o gram e nable s the par ticipa nts to r evie w the le ar nin gs o f t he
B usine ss Analyt ics Usi ng R Wor ks ho p.
T he pr imar y o bje ct ive o f the pr o je ct is to e nhance the par tici pan ts
kno w le dge o f R & de velo p e xplo rator y analysis & visualiz at io n skill s.
Pr o c edur e

Vie w A pache Sam ple L o g


Refer apa che_ sam ple .p df
Un der sta nd Apache L o gs
Refer apa che_ de sc. pdf
So ur ce : ht tp :// ht tpd .apache .o r g /do cs/2 .2 /lo gs .ht ml
Use Data se t as give n
Refer se ctio n Apache Data Se ts
Par se & Analyze
Refer se ctio n Pr o ce dur e
Ana lytic s Requir e me nt
Refer se ctio n Analy tics Require me nt
G e ne rate Pr o je ct Repo r t
Refer se ctio n Pr o je ct Repor t

A pa c he Da t a Set s

apac he_ ht tp.lo g - small apache lo g to cre ate your pro to type
usask_ acce ss_ lo g. gz - co mpr e sse d file co nta inin g
" Uo fS_ acce ss_ lo g" ; an apache lo g o f appr o x 2 33 MB

N o te :

" Uo fS_ acce ss_ lo g" to be r e name d as " apache_ da tase t. lo g"
Si te : Web lo gs fro m N ASA Web Si te
So ur ce : ht tp :// ita .ee .l bl. gov/h tml/ co ntr ib/N A SA-HTTP.ht ml

Pr o c edur e

C o py lo g_ file to your w or ki ng dire cto r y o f your cho ice


Par se lo g file & r ead into data frame
S tor e csv fo r mat data in an w or kin g dir e cto r y
T he csv file sho uld have

da te fie ld in yyyy-mm -dd for ma t ( time zo ne to be igno r e d)

ti me fie ld in hh: mm:s s fo r mat (time zone to be ig no re d)

pr o to co l, page viste d & htt p-ver sio n sho ul d be se parate co ls


Pr o vide analysis re sult s as pe r se ctio n Analy tics Require me nt be low
C o py r esul ts to lo cal file sys te m o r lo cal MySQ L as per r e quir e me n t.

A na lyt ic s R equir em ent


R e quir e d As Data Fr ame

340925032

Page: 1/3

Apache HTTP Server Log Analysis


Business Analytics Using R Project

For e ach mo nth , ho w ma ny time s e ach indivi dual ho st has co nne cte d

to our se r ver ? Sto re data mo nth w ise & hi ghe st co un t fir st .


For e ach mo nth , ho w ma ny time s e ach indivi dual pa ge has be e n
r e que ste d fr o m o ur ser ver? S to re da ta mo nt h w ise & hig he st co unt

fir st .
For e ach mo nth , ho w muc h data has be e n do w nlo ade d by each
in divid ual ho st tha t has co nne cte d to o ur se r ver ? Sto r e data mo nth

w ise & by hi ghe st co un t fir st .


Ho w much dat a was se nt

o ut

as

e ach

indiv idual

page

was

do w nlo ade d fr o m o ur ser ver ? Sto r e da ta mo nth w ise & by highe st


co unt fir st .
R e quir e d As Visua liz atio n

For each data se t ge ne rate d ab ove, pr e pare suitab le visualiz at io n


( givin g re aso n w hy the graph is cho se n). L imi t data to sui table

sig nifica nt numbe r if graph is lo o king too clutte r e d.


T ime Se r ie s Graph fo r to t al hits pe r day w ith e ach mo nth be ing

sho w n as se parate line .


T ime Se r ie s Graph fo r to t al do w nlo ad size pe r d ay w ith e ach mo nth

be ing sho w n as se parate line .


Ho w w o uld you sho w to p 10 mo st po pul ar page s pe r d ay as T ime
Se r ie s G raph w ith e ach mo nth be in g sho w n as se parate line . Ex plain
ho w you fi nd to p 1 0 mo st po pular page s

Answ e r s R e quir e d
Usin g t he above r esul ts and also car r ying o ut any o the r analys is as
m ay be r e quire d , pr o vide answe r s to the fo llo w ing que stio ns.

W hich ho st has co nne cte d the maxi mum numbe r o f ti me s to o ur

se r ver ? Gi ve t he ho st name & co un t o f co nne ctio ns fr o m tha t ho st .


W hich page that has be e n r e que ste d t he maxim um nu mbe r o f time s
fr o m o ur ser ver? G ive the pa ge name & co unt o f the ti me s the pa ge

w as r e que ste d.
Ho w ma ny uni que ho sts have co nne cte d to o ur se r ver ? G ive co un ts.
Ho w many uni que page s have bee n r e que ste d fr o m o ur se r ver ? G ive

co unt s.
W hich ho st has cause d maxim um da ta transfe r fro m o ur se r ver ?

G ive ho st na me & the data transfe r fo r the ho st .


W hich page has cause d maximu m da ta transfe r fro m o ur se r ver ?

G ive page name & the data transfe r fo r the pa ge .


W hich page has maximu m do w nlo ad size fr o m our se r ver ? Gi ve page

name & t he size for t he page .


W hat is the do w nlo ad co un t o f the page tha t has maxi mum do w nlo ad

size fr o m o ur se r ver ? G ive page name & dow nlo ad co unt


W hich page has min imum do w nlo ad siz e fr o m o ur ser ver? G ive page
name & t he size for t he page .

340925032

Page: 2/3

Apache HTTP Server Log Analysis


Business Analytics Using R Project

W hat is the do w nlo ad co un t o f the page tha t mini mum do w nlo ad size
fr o m o ur ser ver? G ive page name & the size for the page .

Pr o jec t Repo r t Using RM D

Pr o je ct O ver vie w
C o mmand s / Co de Se ctio n
Results Se ctio n
Su mmar y - Ho w you use d R fo r Data Analy tics

Pr o jec t Ov er v iew

B r ie f O ver vie w O f T he Pr o je ct
L e ar nin g O bje c tive

Co m m a nds / Co de Sec t io n

Sho ul d co nta in all R co mmands use d to transfe r lo gs to data- frame s

and samp le data -frame usin g he ad (dat a-frame , 10 )


Sho ul d co nta in all R co mmands use d vis ualiz atio n o f data -frame s

and the out put as de sir e d.


Sho ul d co ntain all R co mmands use d to answ e r the spe cific quer ie s
raise d in pr o ble m de fini tio n alo ng w ith the out put as de sir e d.

Sum m a r y

De scr ibe yo ur e xper ie nce o f usin g R fo r Data Analyt ics.

340925032

Page: 3/3

You might also like