Data Visualization The Ultimate Handiwork of Data Analysis and Deployment of Mern App
Data Visualization The Ultimate Handiwork of Data Analysis and Deployment of Mern App
Project Report
Done by
AKUMTOSHI AIER
Guided by
Government of India
Government of India
AKUMTOSHI AIER
Place: Chennai
Date:
ACKNOWLEDGEMENT
I am immensely thankful to God for giving me the opportunity to pursue my project on this
area which has opened my mind to a whole new world of physics. I would like to take this
moment to acknowledge all the people who have guided me to complete my work and submit
this project thesis.
No words are sufficient to acknowledge the gratitude I owe to my project guide Mr. Sanjeev
Kumar Jha for inspiring me and motivating me in every step of my work . His kind and
encouraging words have always brought me hope and joy whenever I was down. Above all
he is an ideal mentor from which I have learnt a lot.
I am also highly oblige to Mr. Martin KM, Director In-Charge, NIELIT Chennai, for all
academic and administrative support and also allowing me to avail the facilities of the
department to complete my work.
I earnestly wish to thank Mr Bharath C for all the support he landed me ever since I started
this project.
Last but not the least I would like to thank Aba and Avü for always being the greatest pillar
of support in my life.
AKUMTOSHI AIER
ABSTRACT
Data visualization dominates the professional literature. If we were to only understand data
visualization through the best-selling manuals and tools built for data visualization, it would
seem that it is governed by an almost desperate rush. On one side are techniques to make our
data visualization as easily and quickly comprehensible as possible. The language and lessons
around this approach are dominated by a discourse of restraint: Restrained color choices,
restrained decorative choices, restrained interactivity and restrained chart types. Infact the
emergence of Artificial Intelligence has lead to a further growth in visualisation libraries. The
development of different stacks for web development has also helped build the data
collection infrastructure. After all without the collection of the data no processing can ever be
done. There are now many tools that help in direct extraction of information from the
backend database system without having to rather process it for a quick overview of the subje
TABLE OF CONTENTS
1 INTRODUCTION 1
2 TOOLS AND PACKAGES USED 3
3 DATA VISUALIZATION 12
4 BUILDING A DASHBOARD IN R 18
5 VISUALISATION WITH PYTHON AND TABLEAU
PUBLIC DASHBOARD 29
6 MERN 38
7 CONCLUSION 49
8 REFERENCE 50
INTRODUCTION
Data is beautiful once it’s the best possible version of itself: on the inside and outside.
Although one should be careful to mess around with its core truth, it’s ones duty to make it
shine. It goes beyond communicating insights, it’s about exciting people and reaching an
audience that has never heard of matplotlib, seaborn, or plotly. After all, data science is an art
and art is for everybody. Data visualization is not about the colors but rather the raw data it
helps anyone understand.In his book “Data Visualization: a successful design process”, Andy
Kirk defines Data Visualization as “a multidisciplinary recipe of art, science, math, technology,
and many other interesting ingredients.” Getting the definition right is the easy part. Creating
something meaningful that inspires other people and addresses the information accurately is the
hard part.
They say that every data tells a story waiting to be told. The advent of Python and R
libraries have indeed made it possible for the Exploratory Data Analysis at a very short interval
of time and minimal coding. The statistical libraries present for different environments have
indeed made it possible for an in depth analysis of data with a minimum requirements of
mathematical formulas. Indeed the solution to any problem lies in the approach towards the
subject and some Data Frames and analysis would very much ease the job for a code but the real
question is how do we make it accessible to someone who isn’t aware of the codes or who is not
well versed with the subject itself. The purpose of the processing of any data should be the
interpretation of it by anyone who approaches the subject and there is no way better than visuals
that triggers the human brain. However with the development of many visualization libraries,
the interpretation of the data could easily lose its essence which would lead to a situation where
the visualizations themselves become a bias interpretation. Hence the purpose of this project is
to explore the libraries and choose only those that would lead to the understanding of the data by
any person who views it, visualization that speaks volumes of the datasets and not a fancy yet an
incorrect representation of the data.
The development of different stacks for web development has also helped build the data
collection infrastructure. After all without the collection of the data no processing can ever be
done. There are now many tools that help in direct extraction of information from the backend
database system without having to rather process it for a quick overview of the subject.
Here in this project I have chosen two data sets, one from Quandl and another one
from Spotify and have used specific libraries for a better understanding of the data. The EDA
1|P ag e
has been done in Python. MySql has been used to store the data and connect with Tableau
Public dashboards. The MongoDB database have been deployed in the cloud using
MongoDB Atlas.Mongoose which is an elegant MongoDB object modeling for node.js has
been used in the MERN framework. An attempt has been made to connect the MongoDB
atlas with PySpark for a faster processing and analysis of large data. In this case I have tried
to analyze which words have been used the most in my blog and draw inferences out of it.
2|P ag e
CHAPTER 1
Tools and Packages Used
1.Python
Python is an interpreted, high-level, general-purpose programming language. Created by Guido
van Rossum and first released in 1991, Python's design philosophy emphasizes code readability
with its notable use of significant whitespace. Its language constructs and object-oriented
approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Python is often
described as a "batteries included" language due to its comprehensive standard library.
ii. Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures
designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to
3|P ag e
be the fundamental high-level building block for doing practical, real world data analysis in
Python. Additionally, it has the broader goal of becoming the most powerful and flexible open
source data analysis / manipulation tool available in any language. It is already well on its way
towards this goal.
iii. Plotly
plotly.py is an interactive, open-source, and browser-based graphing library for
Python.Built on top of plotly.js, plotly.py is a high-level, declarative charting library. plotly.js
ships with over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG
maps, financial charts, and more.Plotly graphs can be viewed in Jupyter notebooks, standalone
HTML files, or hosted online using Chart Studio Cloud.
iv. Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.There is
also a procedural "pylab" interface based on a state machine (like OpenGL), designed to closely
resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.
v. ggplot
ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is
built for making profressional looking, plots quickly with minimal code.
2. R and R Studio
R is a language and environment for statistical computing and graphics. It is a GNU project
which is similar to the S language and environment which was developed at Bell Laboratories
(formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be
considered as a different implementation of S. There are some important differences, but much
code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests,
time-series analysis, classification, clustering,and graphical techniques, and is highly extensible.
4|P ag e
The S language is often the vehicle of choice for research in statistical methodology, and R
provides an Open Source route to participation in that activity.
RStudio is an integrated development environment for R, a programming language for
statistical computing and graphics. The RStudio IDE is developed by RStudio, Inc., a
commercial enterprise founded by JJ Allaire, creator of the programming language ColdFusion.
Flexdashboard
Easy interactive dashboards for R. Use R Markdown to publish a group of related data
visualizations as a dashboard. Support for a wide variety of components including htmlwidgets;
base, lattice, and grid graphics; tabular data; gauges and value boxes; and text annotations.
knitr
knitr is an engine for dynamic report generation with R. It is a package in the statistical
programming language R that enables integration of R code into LaTeX, LyX, HTML,
Markdown, AsciiDoc, and reStructuredText document.
DT
The R package DT provides an R interface to the JavaScript library DataTables. R data
objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables
provides filtering, pagination, sorting, and many other features in the tables.
rpivotTable
5|P ag e
The rpivotTable package is an R htmlwidget built around the pivottable
libraryPivotTable.js is a Javascript Pivot Table visualization library with drag’n’drop
functionality built on top of jQuery / jQueryUI and written in CoffeeScript (then compiled to
JavaScript) by Nicolas Kruchten at Datacratic. It is available under a MIT license.
openinto
highcharter
The main features of this package are:
Chart various R objects with one function. With hchart(x) you can chart: data.frames,
numeric, histogram, character, density, factors, ts, mts, xts, stl, ohlc, acf, forecast,
mforecast, ets, igraph, dist, dendrogram, survfit classes.
Support Highstock charts. You can create a candlestick charts in 2 lines of code.
Support xts objects from the quantmod package.
Support Highmaps charts. It's easy to create choropleths or add information in geojson
format.
Themes: you configurate your chart in multiples ways. There are implemented themes
like economist, financial times, google, 538 among others.
A lot of features and plugins: motion, draggable points, fonta-wesome, tooltips,
annotations.
ggvis
ggvis is a data visualization package for R which lets you:
Declaratively describe data graphics with a syntax similar in spirit to ggplot2.
Create rich interactive graphics that you can play with locally in Rstudio or in your
browser.
Leverage shiny’s infrastructure to publish interactive graphics usable from any browser
(either within your company or to the world).
The goal is to combine the best of R (e.g. every modelling function you can imagine)
and the best of the web (everyone has a web browser). Data manipulation and transformation
are done in R, and the graphics are rendered in a web browser, using Vega. For RStudio users,
ggvis graphics display in a viewer panel, which is possible because RStudio is a web browser.
dplyr
6|P ag e
dplyr is a grammar of data manipulation, providing a consistent set of verbs that help
you solve the most common data manipulation challenges:
mutate() adds new variables that are functions of existing variables.
select() picks variables based on their names.
filter() picks cases based on their values.
plotly
Plotly's R graphing library makes interactive, publication-quality graphs. Examples of
how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms,
heatmaps, subplots, multiple-axes, and 3D (WebGL based) charts.
ggcorrplot
It provides a solution for reordering the correlation matrix and displays the significance
level on the correlogram. It includes also a function for computing a matrix of correlation p-
values. It’s inspired from the package corrplot.
3. Project Jupyter
Project Jupyter is a non-profit, open-source project, born out of the IPython Project in
2014 as it evolved to support interactive data science and scientific computing across all
programming languages. Jupyter will always be 100% open-source software, free for all to use
and released under the liberal terms of the modified BSD license.
Jupyter is developed in the open on GitHub, through the consensus of the Jupyter community.
For more information on our governance approach, please see our Governance Document.All
online and in-person interactions and communications directly related to the project are covered
by the Jupyter Code of Conduct. This Code of Conduct sets expectations to enable a diverse
7|P ag e
community of users and contributors to participate in the project with respect and safety.
4. MongoDB
MongoDB is a cross-platform document-oriented database program. Classified as a
NoSQL database program, MongoDB uses JSON-like documents with schema. MongoDB is
developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL).
5. Express
Express.js, or simply Express, is a web application framework for Node.js, released as
free and open-source software under the MIT License. It is designed for building web
applications and APIs. It has been called the de facto standard server framework for Node.js.
6. React
React is a JavaScript library for building user interfaces. It is maintained by Facebook
and a community of individual developers and companies. React can be used as a base in the
development of single-page or mobile applications.
8|P ag e
7.NodeJs
Node.js is an open-source, cross-platform, JavaScript runtime environment that executes
JavaScript code outside of a browser. Node.js lets developers use JavaScript to write command
line tools and for server-side scripting—running scripts server-side to produce dynamic web
page content before the page is sent to the user's web browser. Consequently, Node.js represents
a "JavaScript everywhere" paradigm,[6] unifying web-application development around a single
programming language, rather than different languages for server- and client-side scripts.
8. MERN
MERN Stack: MERN Stack is a Javascript Stack that is used for easier and faster
deployment of full-stack web applications. MERN Stack comprises of 4 technologies namely:
MongoDB, Express, React and Node. js. It is designed to make the development process
smoother and easier.
9.MongoDB Atlas
MongoDB Atlas is a fully-managed cloud database developed by the same people that
build MongoDB. Atlas handles all the complexity of deploying, managing, and healing your
deployments on the cloud service provider of your choice (AWS, Azure, and GCP).
9|P ag e
10.Tableau Public
Tableau is a powerful and fastest growing data visualization tool used in the Business
Intelligence Industry. It helps in simplifying raw data into the very easily understandable
format.
Data analysis is very fast with Tableau and the visualizations created are in the form of
dashboards and worksheets. The data that is created using Tableau can be understood by
professional at any level in an organization. It even allows a non-technical user to create a
customized dashboard.
The best feature Tableau are:
Data Blending
Real time analysis
Collaboration of data
The great thing about Tableau software is that it doesn't require any technical or any kind of
programming skills to operate. The tool has garnered interest among the people from all sectors
such as business, researchers, different industries, etc.
10 | P a g e
11. Heroku
CHAPTER 3
11 | P a g e
DATA VISUALIZATION
Good data visualization takes the burden of effort off brain and puts it on the eyes.The 8 Core
Principles that let us accomplish that are:
1.Simplif
:Just like an artist can capture the essence of an emotion with just a few lines, good data
visualization captures the essence of data - without oversimplifying.
We don't want a tool that gives us 19 more options after we decide we want a cloumn graph. We
want a tool like Tableau that knows which visualization is appropriate and then creates it.
Simple.
2.Compare
We need to be able to compare our data visualizations side by side. We can't hold the
details of our data visualizations in our memory - shift the burden of effort to our eyes.
3.Attend
The tool needs to make it easy for us to attend to the data that's really important. Our
brains are easily encouraged to pay attention to the relevant or irrelevant details. Stephen
demonstrated this convincingly with a video similar to Daniel Simon's classic gorilla and ball
passing.
4.Explore
Data visualization tools should let us just look. Not just to answer a specific question,
but to explore data and discover things. Directed and exploratory analysis are equally valid, but
we need to be sure that out visualization tool makes both possible.
5.View Diversely
Different views of the same data provide different insights. It helps to be able to look at
the same data from different perspectives at the same time and see how they fit together.
6.Ask why
12 | P a g e
More than knowing "what's happening", we need to know "why it's happening". This is
where actionable results come from.
7.Be sceptical
We too rarely question the answers we get from our data because traditional tools have
made data analysis so hard. We accept the first answer we get simply because exploring any
further is tool hard. More powerful tools like Tableau give you the luxury to ask more
questions, as fast as we can think of them.
8.Respond
Simply answering questions for yourself has limited benefit. It's the ability to share our
data that leads to global enlightenment.
"The best software for data analysis is the software you forget you're using. It's such a natural
extension of your thinking process that you can use it without thinking about the mechanics."
- Stephen Few
13 | P a g e
CHAPTER 4
A QUICK STUDY OF THE DATASETS
SPOTIFY DATA SET:
The first data set has been taken from https://www.kaggle.com/nadintamer/top-spotify-tracks-
of-2018 which consist of the top 100 songs of 2018. An overview of it is as follows.
Energy:Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity
and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has
high energy,while a Bach prelude scores low on the scale. Perceptual features contributing to
this attribute include dynamic range, perceived loudness, timbre, onset rate, and general
entropy.
14 | P a g e
Key:The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0
= C, 1 = C♯/D♭, 2 = D, and so on.
Loudness:The overall loudness of a track in decibels (dB). Loudness values are averaged
across the entire track and are useful for comparing relative loudness of tracks. Loudness is the
quality of a sound that is the primary psychological correlate of physical strength (amplitude).
Values typical range between -60 and 0 db.
Mode:Mode indicates the modality (major or minor) of a track, the type of scale from which its
melodic content is derived. Major is represented by 1 and minor is 0.
Speechiness: Speechiness detects the presence of spoken words in a track. The more
exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the
attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken
words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech,
either in sections or layered, including such cases as rap music. Values below 0.33 most likely
represent music and other non-speech-like tracks.
Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0
represents high confidence the track is acoustic.
Instrumentalness:Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are
treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The
closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal
content. Values above 0.5 are intended to represent instrumental tracks, but confidence is
higher as the value approaches 1.0.
Liveness:Detects the presence of an audience in the recording. Higher liveness values represent
an increased probability that the track was performed live. A value above 0.8 provides strong
likelihood that the track is live.
15 | P a g e
Valence:A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks
with low valence sound more negative (e.g. sad, depressed, angry).
Tempo:The overall estimated tempo of a track in beats per minute (BPM). In musical
terminology, tempo is the speed or pace of a given piece and derives directly from the average
beat duration.
Time Signature: An estimated overall time signature of a track. The time signature (meter) is a
notational convention to specify how many beats are in each bar (or measure).
FINANCIAL DATASET:
A simple overview of the dataset is as follows.It consist of two csv files as follows
16 | P a g e
This data set consist of two files consisting of the name of the company the category in which
investment has been done, the status of operation, the country where the company is
registered and other important details that are important while investing in a company or
place.
17 | P a g e
CHAPTER 5
BUILDING A DASHBOARD IN R
The flexdashboard library of R was use to build a Spotify dashboard from the kaggle data set.
Here I have shown the analysis using different graph library. The data was a well built data
with no missing values. The idea of this chapter is to realize the fact that even without much
coding a dashboard can tell us many different things about a particular data set. The data set
was imported from the initial download folder and certain chages were made to the different
columns in accordance with the requirement of my dashboard. The dashboard was created
using R with the flexdashboard package along with plotly, ggcorplot, knitr,ggvis,
highcharter,tidyr and DT.
1.Key Indicator
2.Overall Analysis
18 | P a g e
3.Music Analysis
4.Time Analysis
5.Dataset Table
19 | P a g e
6. Pivot Table
7.Summary
20 | P a g e
Analysis of the Dashboard
The first section of the dashboard gives us the details about the scales on which all the songs
were played. A majority of it was in the C sharp range which is quite expected as this is
considered to be the normal range. 59% of the song are in the major scale and the rest are in
the minor scale. The time signature bar indicates that most songs are in the 2 beats per bar
criteria.
The second section indicates the list of the artist with the most number of songs and also the
correlation between the different parameters given in the data set. For instance loudness and
energy has a very high positive correlation which is the characteristics of EDM and Party
mixes.
The third section gives us the scores of the songs on the basis of the Danceability,Energy and
Loudness. The fourth section tells us the duration of the top 5 songs as well as the musical
tempo bands of the 100 songs with most songs in the Allegro and Andante category.
The fourth section is just the data table of the entire data set which gives us the raw data and
the fifth section is the pivot table for further data exploration
output:
flexdashboard::flex_dashboard:
orientation: rows
source_code: embed
---
library(knitr)
library(DT)
library(tidyr)
library(rpivotTable)
library(openintro)
library(highcharter)
library(ggvis)
library(dplyr)
library(plotly)
library(ggcorrplot)
```
```{r}
df <- read.csv("/home/akum/Pictures/top2018.csv")
```
```{r}
```
Key Indicators
=====================================
Column {data-width=650}
-------------------------------------
22 | P a g e
### Best Spotify Artist 2018
```{r}
valueBox(df$artists[3],
icon = "fa-user")
```
```{r}
gauge(round(6,
digits = 2),
min = 0,
max = 100,
Row
-------------------------------
`{r}
c <- sum(df$key==0)
c2 <- sum(df$key==1)
d <- sum(df$key==2)
d2 <- sum(df$key==3)
e <- sum(df$key==4)
23 | P a g e
f <- sum(df$key==5)
f2 <- sum(df$key==6)
g <- sum(df$key==7)
g2 <- sum(df$key==8)
a <- sum(df$key==9)
a2 <- sum(df$key==10)
b <- sum(df$key==11)
scale = c(c,c2,d,d2,e,f,f2,g,g2,a,a2,b)
plot_ly(
x = c('C','C#','D','D#','E','F','F#','G','G#','A','A#','B'),
y = scale,
type = "bar",
```
```{r}
m1 <- sum(df$mode==1)
m2 <- sum(df$mode==2)
plot_ly(df,labels=~mode,
24 | P a g e
type = "pie"
```
t1 <- sum(df$time_signature==3)
t2 <- sum(df$time_signature==4)
t3 <- sum(df$time_signature==5)
plot_ly(
x = c('3','4','5'),
y = c(t1,t2,t3),
type = "bar"
```
Overall Analysis
========================================
--------------------------------
```{r}
df %>%
group_by(artists) %>%
arrange(desc(freq)) %>%
slice(1:10) %>%
25 | P a g e
ggplot(., aes(reorder(artists, +freq), freq))+
coord_flip()+
```{r}
ggcorrplot(corr)
```
Music Analysis
========================================
### Danceability
```{r}
df %>%
arrange(desc(danceability)) %>%
slice(1:5) %>%
coord_flip()+
26 | P a g e
labs(x = "" ,y = "Score")+
```
### Energy
```{r}
df %>%
arrange(desc(energy)) %>%
slice(1:5) %>%
coord_flip()+
```
### Loudness
```{r}
df %>%
arrange(desc(loudness)) %>%
slice(1:5) %>%
coord_flip()+
```
27 | P a g e
datatable(df[,2:16],
rownames = T,
filter = "top",
rpivotTable(df,
aggregatorName = "Count",
cols= "fm",
rows = "State",
rendererName = "Heatmap")
```
28 | P a g e
CHAPTER 6
We start by importing the required libraries that will be used for the analysis and then getting
an overview of the data set. Thus we find that there are 114949 rows with 6 columns in the
round.csv file. The columns are permalink,name,url,category llist , status, country_code,
state_code,region,city,founded_at. The companies.csv has 66368 rows. All the entries were
converted into lower cases as a precaution. Further exploration was done and finally we got
two data frames that had the same numbe of entries.
29 | P a g e
30 | P a g e
Missing Values Treatment
31 | P a g e
32 | P a g e
Python was used for exploratory data analysis. By using the plotly and the matplotlib and
ggplot and sns a diagram of the datasets were obtained which helped in accessing many
details regarding the data at hand
The following boxplot shows the amount of money invested in different types of investment
33 | P a g e
Top five English Speaking countries with the highest invesrments
34 | P a g e
35 | P a g e
36 | P a g e
The excel file was then use to create an interactive dashboard using the Tableau Public
Software. The output has been given as shown below.
37 | P a g e
CHAPTER 8
MERN
MERN is one of the most relevant Stacks of the world growing fast every day, acquiring
many developers around the globe in a huge community. The main thing you should know
is that, with MERN Stack, we’ll work with Javascript. Told this, let me explain what
means each letter of this acronym.
● Mongo DB: A document-based open source database, that provides you scalability
and flexibility.
● Express JS: A structured base designed to develop web applications and APIs.
● React JS: A Javascript Front-end library for building user interfaces. Maintained by
Facebook.
● Node JS: A javascript runtime built on Chrome’s V8 JS engine.
Let’s create our project’s main directory. This will hold both the code for the front and
back end of our app.
mkdir -mernapp
A package.json file will be created and henceforth we shall be adding all of the packages
we will be using furthermore. Create a new file that will serve as our main code for the
back end and name it server.js. Then, type the following into it. This back end code is
pretty blunt and basic, I have only created it so that beginners won’t have to think much of
the complexity of the code rather than they would think about the code’s intent. Then, they
could easy manipulate it afterwards once they wrapped their heads around it. I’ve put
comments beside every method for ease of understanding. The package.json file has to be
look like this at the end.
38 | P a g e
2.Setting up the MongoDB
First, head on to MongoDB atlas and create an account there. MongoDB Atlas will let us
use a free 500 MB of MongoDB database and use it remotely. It is also hosted in the
cloud. This is the current trend of our industry and acquiring skills that enables us to use
cloud database is a real asset nowadays.After setting up your account, log into your
account. Follow the steps prompted by the website in creating your own cluster and
cluster/database users. Here is the checklist or steps in order to create your own mongoDB
database.
39 | P a g e
40 | P a g e
This are the codes that will help us connect to the server
41 | P a g e
The Databse.js helps us connect to the MOngoDb through mongoose. The following are
the schemas for the Post,Prile and Users field where all the details that a user input and
saved will be stored in MongoDb Atlas
42 | P a g e
43 | P a g e
.
44 | P a g e
45 | P a g e
HEROKU SETUP AND DEPLOYMENT
46 | P a g e
47 | P a g e
48 | P a g e
CHAPTER 9
CONCLUSION
We find that there are many visualisations library available for both Python and R which is
the most popular language for a data scientist. And the choice of our library is dependent
upon what type of data one is dealing with, the purpose of analysing the data and the
evaluation of its future use. As seen tableau is a software that is very powerful in visualising
any data without any coding but it falls short when our desired button or function is missing
whereas in Python and R, we can bring any function into life but sometimes this job can be
tedious. MongoDB is a NOSQL database which is becoming very popular these days because
of its built. Its easily Integratable into any back end engine and provides a schemaless
database management system. MERN emerging as a popular stack for developing web
application was implemented to mimic a working site where user’s data was store in the
MongoDB atlas. Thus we find that data collection and data visualisation forms one of the
most important element of data science
49 | P a g e
REFERENCE
1. https://medium.com/datavisualization
2. https://medium.com/swlh/how-to-create-your-first-mern-mongodb-express-js-react-js-and-
node-js-stack-7e8b20463e66
3. MongoDB University
4. Information Dashboard Design: Displaying Data for At-a-glance Monitoring, Stephen
Few,ISBN 1938377001, 9781938377006, Analytics Press, 2013
50 | P a g e