Exploring The Intersection of Deep Learning and Big Data
Exploring The Intersection of Deep Learning and Big Data
Big Data
Yash Adinath Patil Harshad Rane
23306A1046 23306A1027
MscIT MscIT
Vidyalankar School of Information Technology Vidyalankar School of Information Technology
[email protected] [email protected]
Abstract— Big Data and Deep Learning are currently the manipulates text, images, and voice. It is different from
most trending topics on which extensive research is being done Artificial Neural Network (ANN) because, unlike ANN, it
on. Big Data has an important part of many industries to has multiple hidden layers to process massive amounts of
analyze large amounts of data for helping in decision making. data. Traditional machine learning algorithms may not be as
Many corporations like Google, Microsoft, and Amazon make
effective in extracting complex or non-linear information that
use of big data to analyze a large amount of data to support
their various products. Deep Learning is a subfield of Machine is usually encountered in Big Data, making Deep Learning
Learning that is inspired by the working of the human brain. the more favorable choice when dealing with Big Data.
Deep Learning methods can identify and extract complex and In this paper, we discuss a bit about Big Data, its 3 Vs,
non-linear patterns from a large amount of data more
and its advantages. Then we discuss Deep Learning, its
efficiently compared to traditional methods owing to its
multiple layers which can analyze data to a greater depth. Big advantages, and its applications. After that, we discuss the
Data can benefit greatly from the analytics power of Deep application of Deep Learning on Big Data and a comparison
Learning since most of the data is in an unstructured format, between traditional and Deep Learning methods, along with
from which extracting information is difficult. Deep Learning some use cases.
also helps in avoiding tedious tasks like feature engineering as
it is done by deep learning methods by itself. Deep Learning II. BIG DATA
models require large amounts of training data, which is
another plus point for the use of deep learning on Big Data. Big data is the term used to define data that is huge in
This paper tries to give an idea about Big Data and why it is size and which is growing rapidly with time. This data is so
important, and how Deep Learning methods are the better large and complex that the traditional data management
choice for Big Data. Some applications are also mentioned and tools are not capable of storing them or processing them
comparisons between traditional and Deep Learning methods efficiently. Big data analytics is a growing trend and will
are shown to justify the use of Deep Learning in Big Data.
likely be so for many years to come as the influx is data is
Keywords— Big Data, Deep Learning ever increasing with more technological advancement.
Big Data has integrated itself into our day-to-day life,
with social media providing the most amount of Big Data.
I. INTRODUCTION
This data is analyzed properly can benefit the decision-
Big Data is one of the biggest trends in the ever-growing making process.
industry of information technology. With the advancements
in communication and networking, the amount of data A. The 3 Vs of Big Data
produced is increasing at a rapid rate. Most of this data is
unstructured and thus can’t be used by computers for
analysis, prediction, and other jobs which are important from
a business perspective. This is where Big Data comes into
the picture, with its capability to handle a large amount of
data. With the help of Big Data, many domains have seen a
lot of advancements.
While Big Data has great potential in helping for further
advancements in many domains, the extraction of useful
information using just Big Data will be getting more harder
as the amount of data is getting larger and larger. The use of
machine learning algorithms has shown favorable results
with regards to knowledge extraction using Big Data. They
help provide predictive power to Big Data in fields like
medical, astronomy, and finance. While traditional machine
learning methods have proved beneficial, with demand for
more predictive power and greater performance they fail to
deliver the demands. Here, a subfield of machine learning Figure 1. The 3 Vs of Big Data [5]
called Deep Learning proves beneficial.
Deep Learning methods have been inspired by the
working of the human brain, and have helped in making 1) Volume
advancements in many fields like medicine, NLP, computer There is an exponential growth in data storage as now
vision, and many more. Deep Learning models try to data is not just in text form. We can find data in image,
simulate how the human brain analyses, interprets, and video, or audio form. It is common for big enterprises to
1
have storage of Petabytes. It is said that these storages can clout. The real trend of Deep Learning methods started
go to Zettabytes soon with the ever-increasing speed of when the Deep Learning method won the ImageNet
information generation. Big Data can handle this huge competition in 2012, defeating the other methods by quite a
amount of data. big margin.
Using Deep Learning we can automate complex feature
2) Velocity extraction from unknown data. Deep learning methods give
With the advancement of communication devices, the a more accurate result for classification problems.
speed at which data is generated is increasing rapidly. There
was a time when data from the previous day used to be the
A. Advantage of Deep Learning methods
latest data, but now with the introduction of social media
and other platforms of information passing, data that is 10 1) Good with unstructured data
min old can be considered outdated. To deal with this high Deep learning methods can extract complex and non-
influx of data Big Data was developed, to give near real- linear patterns from unstructured data where traditional
time processing, some external tools along with Big Data Machine Learning methods fail to do so.
need to be used. 2) Less requirement of Labeled Data
Deep Learning methods are better at unsupervised
3) Variety learning hence they do not require labeled data. Traditional
Data can come in various forms from various sources, methods do not give as successful results as deep learning
traditional analysis systems are only capable to handle one methods.
type of data. It can be in textual form, video form, or audio 3) Elimination of feature engineering
form. Big Data can handle data of various forms. Traditional methods require users to train the model for
new features to improve their accuracies but with deep
learning models that is not the case as they are capable of
B. Advanatges learning features by them themselves.
1) Saves Time 4) More robust
Earlier using analyzing a large amount of data used to take a In today’s time, the data can be in any form, traditional
lot of time. Big Data is capable of a faster and more accurate methods can work on only the type of data they are trained
analysis of data thus saving a lot of time. on, deep learning methods work even with variations in
2) Better business decision data.
Big data can help in analyzing market trends and help in
predicting future trends to help the organization make a
B. Applications
better business decision.
3) Reputation control 1) Computer Vision
Big Corporations can use Big Data to perform sentiment CNN has made a massive improvement in the domain of
analysis to keep track of their reputation and use that computer vision. It can help in various computer vision
analysis to make improvements to further improve it. tasks like object detection, human-object interaction
4) Increase efficiency detection, image classification, etc. It can also help in
Big Data tools allow us to increase efficiency by bringing all medical imaging like cancer detection.
the services into one system. 2) Industrial Automation
It can be used to automate the production work line,
III. DEEP LEARNING lessening the human effort as well save time and money. It
can also be used to ensure worker safety by automatically
Artificial Intelligence is helping us in lessening the gap
detecting if a worker is too close to unsafe machinery.
between humans and machines. AI has touched many fields
3) Information retrieval
and has helped in a lot of advancements in them. Data
analytics is one of the fields in which AI through its sub- Deep Learning methods can be used in the search engine
domain Machine Learning has helped in reducing human to provide more relevant information to the search query.
intervention thus saving time. Though now with more 4) Natural Language Processing
unstructured data, the traditional ML algorithms are less NLP requires a lot of training to get desired output with
traditional methods. The deep Learning method saves the
efficient in extracting complex and non-linear insights. To
overcome this, Deep Learning methods were developed. time that would be wasted on feature engineering while
Deep Learning methods aim to simulate human analysis giving more accurate results compared to traditional
methods.
of data that is not in the form that the traditional ML
algorithms operate on. Deep Learning methods are quite 5) Autonomous Vehicles
Many Automobile companies have employed Deep
similar to ANNs but have more hidden layers which help
Learning methods to build self-driving cars.
them to learn more features and thus can extract more
complex patterns from the data.
While Deep Learning methods have been around for a IV. DEEP LEARNING IN BIG DATA
long time, they were not as feasible and efficient as they are
today due to limitations of hardware and training data sets. Big Data deals with a large amount of unstructured and
With technological advancements like the creation of unsupervised data which can be in any form and is
Graphics Processing Units which helped with the hardware increasing at an alarming rate. To make any sense of this
limitation, Deep Learning methods have gained a lot of data, we need to perform an analysis. Machine learning can
2
be used to extract useful information from them but as the beneficial for various image processing tasks. Many Big
data becomes more and more unstructured, traditional companies that make use of big data have employed Deep
methods fail to function as efficiently and accurately. If any Learning for the various image-related task. Google makes
new feature is introduced, the method will fail to consider use of deep learning to recommend its users videos based on
that feature. the videos they have watched. Face recognition is also a big
The biggest feature of Deep Learning is to extract application of deep learning in Computer Vision, with
complex and non-linear patterns from a large amount of data Facebook’s DeepFace and Google’s FaceNet being the most
without human intervention which makes it a great tool for popular examples. Object Detection which refers to
Big Data. So, if a new type of data is encountered, the Deep detecting objects and classifying an object in an image uses
Learning model will attempt to learn it without human Convolution Neural Network which is a popular deep
intervention. learning method. Object detection can be used to group
images with similar objects in Big Data.
A. Application of Deep Learning in Big Data
1) Natural Language Processing C. Finance
Most of the data generated in Big Data is in an Big Data and Deep Learning can be used for financial
unstructured format, and extracting information from this tasks like stock prediction using market trends. In this task,
type of data is not easy. While traditional NLP models have the model uses news, old stock records, and market behavior
been useful, Deep Learning models are trained to look for to predict future stocks. Deep learning can also be used in
deeper patterns that can only be possible by experts. This banks for Fraud Detection in credit cards and bank
analysis is more accurate than the traditional method as transactions. It can be used for customer segmentation
Deep Learning considers not only linear but non-linear which using which the company can plan for a better
features as well. advertisement strategy. Many International banks use Deep
Learning methods to detect money laundering.
2) Semantic indexing D. Automobile Industry
Information retrieval is one of the important tasks of Big
Data Analytics. Efficient storage and retrieval of data are This industry collects a lot of data to support the
becoming a problem due to a huge amount of data in text, company to plan for capacity, revenue, pricing, demand,
image, video, and audio formats are being collected and customer service using predictive analysis. Deep Learning
made available in various fields. Hence traditional strategies can help in this predictive analysis for the industries. It can
that were used for information storage and retrieval are be used to predict user satisfaction. Now deep learning has
becoming less efficient. helped in building Autonomous vehicles which need to
Semantic indexing can be employed to increase analyze a lot of rules to determine the action of the vehicle.
knowledge discovery and comprehension, thus making E. Marketing and Advertising
retrieval more efficient. The deep learning method is used to
Marketing and advertising aim to communicate, educate
obtain high-level abstraction of the data which can be used
and convince consumers to buy a specific service or
to generate a semantic tag for indexing.
product. Deep Learning can help customers to better search
for products they might need. It can also be used in a
3) Discriminative task recommendation engine to suggest user product which they
To perform discriminative tasks in Big Data, Deep
are more likely to buy. It can also be used for targeting
Learning can be used to extract non-linear features from the
customers for personalized ads for products they are more
data then use these non-linear features to perform a
likely to buy. Netflix and Amazon are some of the popular
discriminative task using a linear model, making the process
companies that make use of deep learning for a
efficient. Using Deep Learning in this process adds the recommendation.
benefit of the inclusion of non-linear features of the data
analysis for accuracy. F. Medical
Medical institutions handle a lot of data daily from
4) Semantic image and video tagging pathological reports, drug reports, diagnosis reports, etc.
Deep Learning techniques can be useful in semantic The selection of a proper diagnosis procedure can be very
tagging. Deep Learning methods can provide segmentation difficult. Deep Learning techniques can help immensely in
and annotation of complex image scenes. It can also be used this endeavor as they can make a more accurate diagnosis of
to tag videos with specific action scenes which can facilitate the patient using the medical report data. Google’s
easy storage and retrieval of images and video later for tasks DeepMind and IBM’s Watson are actively working in the
like searching and recommendation. medical domain and producing excellent results.
3
while enabling collaborative learning across multiple in Big Data. The application of Deep Learning in Big Data at
devices. the moment is very limited,but as it shows greater potential,
Overall, by incorporating federated learning into the meta- more research needs to be conducted to include it in more Big
learning framework for big data analysis, the paper aims to Data processing tasks.
achieve better privacy protection, scalability, performance,
and resource efficiency in analyzing large and sensitive VII. ACKNOWLEDGMENT
datasets
V. USE CASES We would like to express our special thanks to Prof.
Dr.Ujwala Madhav Sav the H.O.D of our Information
Technology MscIT department who gave us the opportunity
Certainly! Here's a real-life example of federated meta- to do this Research Paper because of which we learned new
learning: concepts and their application.
Imagine a scenario where multiple hospitals want to
collaborate and improve the accuracy of their disease Finally, we would like to express our special thanks to
prediction models without sharing patient data. Each Principal gave us the opportunity and facilities to
hospital has its own dataset consisting of patient records, conduct this research paper.
including medical history, symptoms, and test results.
However, due to privacy concerns and legal restrictions, REFERENCES
they cannot directly share this sensitive patient information [1] Ahmad, Jamil & Farman, Haleem & Jan, Zahoor. (2019). Deep
Learning Methods and Applications. 10.1007/978-981-13-3459-7_3.
with each other or a central server.
[2] “Deep learning definition, algorithms, models, applications &
In this case, federated meta-learning can be applied. advantages” by Hebba Soffar, Published APRIL 28, 2019, Last
Each hospital can maintain its data locally and participate in updated AUGUST 29, 2019, https://www.online-
a federated learning setup. The hospitals collaborate by sciences.com/robotics/deep-learning-definition-algorithms-models-
sharing only the model updates rather than the raw data. applications-advantages/
Here's how the process would work: [3] Yu, B., Li, S., et al.: Deep learning: a key of stepping into the era of
big data. J. Eng. Stud. 20–45 (2014).
Initialization: Initially, each hospital trains its own local
[4] “3 applications of Deep Learning in Big Data analytics” by NAVEEN
machine learning model using its local data. This serves as JOSHI, last update: THURSDAY 23, MARCH 2017,
the starting point. https://www.allerin.com/blog/3-applications-of-deep-learning-in-big-
Collaboration: The hospitals then send the model data-analytics.
updates (weights and gradients) to a central coordinating [5] “BIG DATA: THE 3 VS EXPLAINED” by bigdataldn,
server. https://bigdataldn.com/intelligence/big-data-the-3-vs-explained/.
Aggregation: The coordinating server collects the model [6] Socher, Richard & Lin, Cliff & Ng, Andrew & Manning, Christopher.
(2011). Parsing Natural Scenes and Natural Language with Recursive
updates from each hospital and aggregates them using a Neural Networks. Proceedings of the 28th International Conference
meta-learning algorithm. This algorithm takes into account on Machine Learning, ICML 2011. 129-136.
the different models' performances and characteristics to [7] Yan, Yan & xu, Yin & Zhang, Bo-Wen & Yang, Chun & Hao, Hong-
generate a global model update. Wei. (2016). Semantic indexing with deep learning: a case study. Big
Data Analytics. 1. 7. 10.1186/s41044-016-0007-z.
Distribution: The updated global model is sent back to
[8] Li, Jun & Lin, Daoyu & Wang, Yang & Xu, Guangluan & Zhang,
each hospital, and they incorporate the global model update Yunyan & Ding, Chibiao & Zhou, Yanhai. (2020). Deep
into their local models. Discriminative Representation Learning with Attention Map for
Iteration: The process of collaboration, aggregation, and Scene Classification. Remote Sensing. 12. 1366.
distribution is repeated for multiple iterations, allowing the 10.3390/rs12091366.
models to learn from the collective knowledge of all the [9] "Overview and benchmark of traditional and deep learning models in
text classification" by Ahmed Besbes,
hospitals while preserving data privacy. https://www.kdnuggets.com/2018/07/overview-benchmark-deep-
Through this federated meta-learning approach, hospitals can learning-models-text-classification.html
improve their disease prediction models by leveraging the
collective knowledge from diverse datasets while ensuring
patient privacy. Each hospital benefits from the experiences
and insights gained from other hospitals' data without
directly sharing sensitive information. This collaborative
process can lead to enhanced accuracy and robustness of
disease prediction models across multiple healthcare
institutions.
VI. CONCLUSION
Big Data has been successful in managing huge amounts of
data and analyzing it. But with an increase of data at a rapid
rate the traditional methods fail to prove as useful, hence
using deep learning for analysis in big data can help with this.
Deep Learning methods extract complex and non- linear
features of data which helps in increasing analysis accuracy.
From the comparison collected, it can be concluded that Deep
Learning methods offer greater accuracy than the traditional
methods, hence proving them to be more appropriate for use
4