0% found this document useful (0 votes)
10 views

BDCC Unit 1

Uploaded by

spacemonkeys2426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

BDCC Unit 1

Uploaded by

spacemonkeys2426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 165

Established as per the Section 2(f) of the UGC Act, 1956

Approved by AICTE, COA and BCI, New Delhi

UNIT-1
Big Data Science and Machine
Intelligence
School of Computer Science and Engineering

SethuMadhavi.R
[email protected]

AY: 2023-2024
INTRODUCTION TO BIG DATA

➢ What is Data?
The quantities, characters, or symbols on which operations are performed by a computer, which may be
stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical
recording media.

➢ What is Big Data?


• Big Data is also data but with a huge size. Big Data is a term used to describe a collection of
data that is huge in volume and yet growing exponentially with time. In short such data is so large
and complex that none of the traditional data management tools are able to store it or
process it efficiently.

• “Extremely large data sets that may be analyzed computationally to reveal patterns , trends and
association, especially relating to human behavior and interaction are known as Big Data.”
EXAMPLES OF BIG DATA

1. Following are some the examples of Big Data-


❑The New York Stock Exchange generates about one terabyte of new trade
data per day.
❑Social Media

1. The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges, putting
comments etc.
TWITTER
TABULAR REPRESENTATION OF VARIOUS
MEMORY SIZES
TYPES OF DIGITAL DATA

1.Structured

2.Unstructured

3.Semi-structured
STRUCTURED

Structured
1. Any data that can be stored, accessed and processed in the form of fixed
format is termed as a 'structured' data.
2. Over the period of time, talent in computer science has achieved greater
success in developing techniques for working with such kind of data
(where the format is well known in advance) and also deriving value out
of it.
3. However, nowadays, we are foreseeing issues when a size of such data
grows to a huge extent, typical sizes are being in the range of multiple
zettabytes.
EXAMPLES OF STRUCTURED
DATA

1. An 'Employee' table in a database is an example of Structured Data


UNSTRUCTURED

1. Unstructured
▪ Any data with unknown form or the structure is classified as unstructured
data.

▪ In addition to the size being huge, un-structured data poses multiple challenges in
terms of its processing for deriving value out of it.

▪ A typical example of unstructured data is a heterogeneous data source


containing a combination of simple text files, images, videos etc.

▪ Now day organizations have wealth of data available with them but unfortunately,
they don't know how to derive value out of it since this data is in its raw form or
unstructured format.
EXAMPLES OF UN-STRUCTURED DATA

1. The output returned by 'Google Search'


SEMI-STRUCTURED

❑Semi-structured
▪ Semi-structured data can contain both the forms of data.
▪ We can see semi-structured data as a structured in form but it is
actually not defined with e.g. a table definition in relational
DBMS.

▪ Example of semi-structured data is a data represented in an


XML file.
EXAMPLES OF SEMI-STRUCTURED DATA

1. Personal data stored in an XML file-


<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>

<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>

<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>

<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>

<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
BIG DATA ANALYTICS
➢Big Data Analytics:
▪ Big Data analytics is the process of collecting, organizing and analyzing
large sets of data (called Big Data) to discover patterns and other useful
information.
▪ Big Data analytics can help organizations to better understand the
information contained within the data and will also help identify the data
that is most important to the business and future business decisions.
Analysts working with Big Data typically want the knowledge that comes
from analyzing the data.
THE CHALLENGES:
▪ For most organizations, Big Data analysis is a challenge. Consider the sheer
volume of data and the different formats of the
1.data(both structured and unstructured data) that is collected across the entire
organization and the many different ways different types of data can be
combined, contrasted and analyzed to find patterns and other useful business
information.
▪ The first challenge is in breaking down data silos to access all data an
organization stores in different places and often in different systems.
▪ A second challenge is in creating platforms that can pull in unstructured
data as easily as structured data.
▪ This massive volume of data is typically so large that it's difficult to process
using traditional database and software methods.
APPLICATION OF BIG DATA
HERE IS THE LIST OF TOP BIG DATA APPLICATIONS IN
TODAY’S WORLD:
• Big Data in Healthcare

• Big Data in Education

• Big Data in E-commerce

• Big Data in Media and Entertainment

• Big Data in Finance

• Big Data in Travel Industry

• Big Data in Telecom

• Big Data in Automobile


▪ Even a minute detail about any customer has now become significant for them. They are
now closer to their customers than they have ever been. This empowers them to provide
customers with more personalized services and predict their demands in advance.
▪ This helps them in building a loyal customer base. Some of the biggest names in the retail
world like Walmart, Sears and Holdings, Costco, Walgreens, and many more now have Big
Data as an integral part of their organizations.
▪ Explore how Big Data helps to speed up the treatment process – Big Data in
Healthcare.
▪ Smart wearables have gradually gained popularity and are the latest trend
among people of all age groups. This generates massive amounts of real-time
data in the form of alerts which helps in saving the lives of the people.
3. Big Data in Education
▪ When you ask people about the use of the data that an educational institute gathers, the
majority of the people will have the same answer that the institute or the student might
need it for future references.
▪ Even you had the same perception about this data, didn’t you? But the fact is, this data
holds enormous importance. Big Data is the key to shaping the future of the people and
has the power to transform the education system for better.
▪ Some of the top universities are using Big Data as a tool to renovate their academic
curriculum. Additionally, universities can even track the dropout rates of the students
and are taking the required measures to reduce this rate as much as possible.
4. Big Data in E-commerce
▪ One of the greatest revolutions this generation has seen is that of E-commerce. It is now part
and parcel of our routine life. Whenever we need to buy something, the first thought that
provokes our mind is E-commerce. And not your surprise, Big Data has been the face of it.

▪ Some of the biggest E-commerce companies of the world like Amazon, Flipkart, Alibaba, and
many more are now bound to Big Data and analytics is itself an evidence of the level of
popularity Big Data has gained in recent times.

▪ Companies then suggest customers accordingly. Customers now experience more


personalized services than they have ever had. Big Data has completely redefined people’s
online shopping experiences.
5. Big Data in Media and Entertainment
▪ Media and Entertainment industry is all about art and employing Big Data in it is a
sheer piece of art. Art and science are often considered to be the two completely
contrasting domains but when employed together, they do make a deadly duo and Big
Data’s endeavors in the media industry are a perfect example of it.

▪ Viewers these days need content according to their choices only. Content that is
relatively new to what they saw the previous time. Earlier the companies
broadcasted the Ads randomly without any kind of analysis.
▪ Customers are now the real heroes of the Media and entertainment industry -
courtesy to Big Data and Analytics.
6. Big Data in Finance
▪ The functioning of any financial organization depends heavily on its data and to safeguard that
data is one of the toughest challenges any financial firm faces. Data has been the second most
important commodity for them after money.
▪ Digital banking and payments are two of the most trending buzzwords around and Big data
has been at the heart of it. Big Data is bossing the key areas of financial firms such as fraud
detection, risk analysis, algorithmic trading, and customer contentment.
▪ This has brought much-needed fluency in their systems. They are now empowered to focus
more on providing better services to their customers rather than focussing on security issues.
Big Data has now enhanced the financial system with answers to its hardest of the challenges.
7. Big Data in Travel Industry
▪ While Big Data is spreading like wildfire and various industries have been cooking its food
with it, the travel industry was a bit late to realize its worth. Better late than never though.
Having a stress-free traveling experience is still like a daydream for many.
▪ And now Big Data’s arrival is like a ray of hope, that will mark the departure of all the
hindrances in our smooth traveling experience.

▪ From providing them with the best offers to be able to make suggestions in real-time,
Big Data is certainly a perfect guide for any traveler. Big Data is gradually taking the
window seat in the travel industry.
8. Big Data in Telecom
▪ The telecom industry is the soul of every digital revolution that takes place around the world.
With the ever-increasing popularity of smartphones, it has flooded the telecom industry with
massive amounts of data.
▪ And this data is like a goldmine, telecom companies just need to know how to dig it properly.
Through Big Data and analytics, companies are able to provide the customers with smooth
connectivity, thus eradicating all the network barriers that the customers have to deal with.
▪ Companies now with the help of Big Data and analytics can track the areas with the
lowest as well as the highest network traffics and thus doing the needful to ensure
hassle-free network connectivity.
▪ Big Data alike other industries have helped the telecom industry to understand its customers
pretty well.
▪ Telecom industries now provide customers with offers as customized as possible.
▪ Big Data has been behind the data revolution we are currently experiencing.
Enabling Technologies for Big Data
Computing
Data Science and Related
Disciplines
THE EVOLUTION OF BIG DATA
DATA SCIENCE AND RELATED DISCIPLINES

• Big data possesses three important characteristics


➢Volume - data in large volume,
➢Velocity - demanding high velocity to process them,
➢ Variety - many varieties of data types.

• Additional two characteristics


➢ Veracity - refers to the difficulty to trace data or predict data
➢ Value – which can vary drastically if the data are handled
differently
DATA SCIENCE AND RELATED DISCIPLINES
1. In 1968, the data science also called data logy, the science of dealing with
data.
2. In 1997, the data science is equivalent to statistics, such as Bayes
clustering, categorizing the data.
3. KDD is popular since 2001, it has two part,
1. Data mining,
2. Knowledge discovery
4. Big data- The extraction of actionable knowledge directly from data
through a process of discovery, hypothesis, and analytical hypothesis
analysis.
THE EVOLUTION OF DATA SCIENCE
WHAT IS DATA SCIENCE ?
1. Data Science is the extraction of actionable knowledge directly from data
through a process of discovery, hypothesis, and analytical hypothesis
analysis.

2. A Data Scientist is a practitioner who has sufficient knowledge of the


overlapping regimes of expertise in business needs, domain knowledge,
analytical skills and programming expertise to manage the end-to-end
scientific method process through each stage in the big data lifecycle.

3. Big Data refers to digital data volume, velocity and/or variety whose
management requires scalability across coupled horizontal resources
FUNCTIONAL COMPONENTS OF DATA SCIENCE SUPPORTED
BY SOME SOFTWARE LIBRARIES ON THE CLOUD IN 2016

Data
Visualization

Data Mining Medical


Domain Engineering &
Science
Expertise
Machine Learning

Deep Learning (Neural Networks) Analytics Models Natural Language Processing

Data
Social Network &
Science
Graph Analysis
Programming
Statistics
Skills Math
Algorithms Statistics
Hadoop

Distributed
Computing

Linear Algebra &


Spark
Programming
DATA SCIENCE
When ever two areas overlap, they generate three important specialized
fields of interest.
1. The modeling field is formed by intersecting domain expertise with
mathematical statistics.
2. Data analytics field, which has resulted from the intersection of domain
expertise and programming skills.
3. The field of algorithms is the intersection of programming skills and
mathematical statistics.
➢ Summarized below are some open challenges in big data research, development and
applications.

• Structured versus unstructured data with effective indexing;

• Identification, de-identification and re-identification;

• Ontologies and semantics of big data;

• Data introspection and reduction techniques;

• Design, construction, operation and description;

• Data integration and software interoperability;

• Immutability and immortality;

• Data measurement methods;

• Data range, denominators, trending and estimation.


Emerging Technologies in the Next Decade
GARTNER’S 2016 HYPE CYCLE OF EMERGING NEW
TECHNOLOGIES

1. Gartner Research is an authoritative source of new technologies.


2. They identify the hottest emerging new technologies in hype cycles every
year.
3. The time taken for an emerging technology to become mature may take 2
to 10 years to reach its plateau of productivity.
4. The top 12 technologies include cognitive expert advisors, machine
learning, software defined security, connected home, autonomous
vehicles, blockchain, nanotube electronics, smart robots, micro data
centers, gesture control devices, IoT platforms, and drones (commercial
UAVs).
GARTNER’S 2016 HYPE CYCLE OF EMERGING
NEW TECHNOLOGIES
GARTNER’S 2022 EMERGING NEW
TECHNOLOGIES

•Expand immersive
experiences
•Accelerate artificial
intelligence (AI)
automation
•Optimize technologist
delivery
FROM HPC SYSTEMS AND CLUSTERS TO GRIDS, P2P
NETWORKS, CLOUDS, AND THE INTERNET OF THINGS

1. The general computing trend is to leverage more and more on shared web
resources over the Internet.
2. The evolution is from two tracks of system development:
HPC versus HTC systems.
3. On the HPC side, supercomputers are gradually replaced by clusters of
cooperative computers out of a desire to share computing resources.
4. The cluster is often a collection of homogeneous computer nodes that are
physically connected in close range to each other.
HTC
1. On the HTC side, Peer-to-Peer (P2P) networks are formed for distributed
file sharing and content delivery applications.
2. Both P2P, cloud computing and web service platforms place more
emphasis on HTC rather than HPC applications.
3. In the big data era, we are facing a data deluge problem. Data comes from
IoT sensors, lab experiments, simulations, society archives and the web in
all scales and formats.
4. The Internet and WWW are used by billions of people every day. As a
result, large data centers or clouds must be designed to provide not only
big storage but also distributed computing power to satisfy the requests
of a large number of users simultaneously.
HPC: High-Performance
Computing
HTC: High-Throughput
Computing
P2P: Peer to Peer
MPP: Massively Parallel
Processors
RFID: Radio Frequency
Identification
CONVERGENCE OF TECHNOLOGIES

Cloud computing is enabled by the convergence of the four technologies.


1. Hardware virtualization and multicore chips make it possible to have
dynamic configurations in clouds.
2. Utility and grid computing technologies lay the necessary foundation of
computing clouds.
3. Recent advances in service oriented architecture (SOA), Web 2.0 and
mashups of platforms are pushing the cloud to another forward step.
4. Autonomic computing and automated datacenter operations have enabled
cloud computing.
CONVERGENCE OF TECHNOLOGIES

Hardware

Hardware Virtualization
Multi-core chips

SoA,
Distributed Utility Web 2.0 Internet
Computing and Grid Cloud Computing Technology
Services
Computing

Autonomic Computing,
Datacenter Automation

Systems Management
UTILITY COMPUTING
1. Utility computing is based on a business model, by which customers
receive computing resources from cloud or IoT service providers.
2. This demands some technological challenges, including almost all
aspects of computer science and engineering.
3. For example, users may demand new network-efficient processors,
scalable memory and storage schemes, distributed OS, middleware for
machine virtualization, new programming models, effective resource
management and application program development.
4. These hardware and software advances are necessary to facilitate mobile
cloud computing in various IoT application domains.
CLOUD COMPUTING VERSUS ON-PREMISE
COMPUTING
1. On-premise computing differs from cloud computing mainly in resources
control and infrastructure management.
2. In Table, we compare three cloud service models with the on-premise
computing paradigm.
3. We consider hardware and software resources in five types:
storage, servers, virtual machines, networking and application software.
4. In the case of on-premise computing at local hosts, all resources must be
acquired by the users except networking, which is shared between users and
the provider.
Eg: MS office, Adobe
CLOUD COMPUTING VERSUS ON-PREMISE
COMPUTING
TOWARDS A BIG DATA INDUSTRY 1. At the time of 1960-1990 most
data blocks were measured as
MB, GB and TB.
2. Datacenters became widely in
use from 1980 to 2010, with
datasets easily ranging from
TB to PB or even EB.
3. After 2010, big data was
introduced.
4. To process big data in the
future, we expect EB to ZB or
YB. The market size of the big
data industry reached 34
billion in 2013.
Interactive SMACT Technologies
INTERACTIVE SMACT
TECHNOLOGIES

1. Almost all applications demand computing economics, web-scale data


collection, system reliability and scalable performance, such as in the
banking and finance industries.
2. In recent years, five cutting-edge information technologies: namely Social,
Mobile, Analytics, Cloud and IoT, have become more demanding, known
as the SMACT technologies.
INTERACTIVE SMACT
TECHNOLOGIES
THE INTERNET OF THINGS

1. The IoT refers to the networked interconnection of everyday objects,


tools, devices or computers.
2. The things (objects) of our daily life can be large or small.
3. The idea is to tag every object using radio-frequency identification (RFID)
or related sensor or electronic technologies like GPS (global positioning
system).
4. It is estimated that an average person is surrounded by 1000 to 5000
objects on a daily basis.
Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi

UNIT-1
Big Data Science and Machine
Intelligence
School of Computer Science and Engineering

SethuMadhavi.R
[email protected]

AY: 2022-2023
THE INTERNET OF THINGS

1. The IoT refers to the networked interconnection of everyday objects,


tools, devices or computers.
2. The things (objects) of our daily life can be large or small.
3. The idea is to tag every object using radio-frequency identification (RFID)
or related sensor or electronic technologies like GPS (global positioning
system).
4. It is estimated that an average person is surrounded by 1000 to 5000
objects on a daily basis.
THE INTERNET OF THINGS

1. The term Internet of Things (IoT) is a physical concept. The size of the IoT
can be large or small, covering local regions or a wide range of physical
spaces.
2. IoTs are built in the physical world, even though they are logically
addressable in cyberspace.
3. The importance is to connect any things at any time and any place at low
cost.
4. The dynamic connections will grow exponentially into a new universal
network of networks, called IoT. The IoT is strongly tied to specific
application domains.
INTERACTIONS AMONG SMACT SUBSYSTEMS
INTERACTIONS AMONG SMACT SUBSYSTEMS

1. Multiple cloud platforms work closely with many mobile networks to


provide the service core interactively.
2. The IoT networks connect any objects including sensors, computers,
humans and any IP-identifiable objects on the Earth.
3. The social networks, such as Facebook and Twitter, and big data analytics
systems are built within the Internet.
4. All social, analytics and IoT networks are connected to the clouds via the
Internet and mobile networks.
INTERACTIONS AMONG SMACT SUBSYSTEMS

Interactive actions are described for following five purpose:


i) data signal sensing is tied to the interactions among IoT and social
networks with the cloud platforms;
ii) data mining involves the use of cloud power for effective use of captured
data;
iii) aggregation of data takes place between the mobile system;
iv) IoT domains; and
v) the processing clouds. Machine learning forms the basis for big data
analytics.
INTERACTIONS AMONG TECHNOLOGIES
Large amounts of sensor data or digital signals are generated by mobile
systems, social networks and various IoT domains.
1. Data Mining: Data mining involves the discovery, collection, aggregation,
transformation, matching and processing of large datasets. The ultimate purpose is
knowledge discovery from the data. Both numerical, textual, pattern, image and
video data can be mined.
2. Data Aggregation and Integration: This refers to data preprocessing to improve
data quality. Important operations include data cleaning, removing redundancy,
checking relevance, data reduction, transformation and discretization, etc.
INTERACTIONS AMONG TECHNOLOGIES

3. Machine Learning and Big Data Analytics: This is the foundation to use cloud’s
computing power to analyze large datasets scientifically or statistically. Special
computer programs are written to automatically learn to recognize complex patterns
and make intelligent decisions based on the data.
TECHNOLOGY FUSION TO MEET THE FUTURE
DEMAND
1. The joint use of clouds, IoT, mobile devices and social networks is crucial
to capture big data from all sources.
2. This integrated system enables fast, efficient and intelligent interactions
among humans, machines and any objects surrounding us.
3. The combined use of two or more technologies may demand additional
efforts to integrate them for the common purpose.
4. All five SMACT technologies are deployed within the mobile Internet.
5. Social networks and big data analysis subsystems are built in the Internet
with fast database search and mobile access facilities.
6. High storage and processing power are provided by domain-specific
cloud services on dedicated platforms.
Social-Media, Mobile Networks and
Cloud Computing
Social Networks and Web Service
Sites
SOCIAL NETWORKS AND WEB SERVICE SITES

1. Most social networks provide human services such as friendship


connections, personal profiling, professional services, entertainment
2. Users can create a user profile, add other users as “friends”, exchange
messages, post status updates and photos.
3. Users may join common interest user groups, organized by workplace,
school or college, or other characteristics, and categorize their friends
into lists such as “People From Work” or “Close Friends”,
SOCIAL NETWORKS AND WEB
SERVICE SITES

2.9 billion
monthly

574.4 million
monthly wise
users

875 million

365
million
monthly
wise users
EXAMPLE- FACEBOOK PLATFORM ARCHITECTURE AND
SOCIAL SERVICES PROVIDED

1. Facebook keeps huge personal profiles, tags and relationships as social


graphs.
2. Most users are in the US, Brazil, India, Indonesia, etc. The social graphs
are shared by various social groups on the site.
3. This website has attracted over 3 millions active advertisers with $12.5
billion revenue reported in 2014.
4. The Facebook platform is built with a collection of huge datacenters with
a very large storage capacity, intelligent file systems and searching
capabilities.
5. The web must resolve the traffic jams and collisions among all its users.
FACEBOOK INFRASTRUCTURE
EXAMPLE- FACEBOOK PLATFORM ARCHITECTURE AND SOCIAL
SERVICES PROVIDED

1. The platform is formed with a huge cluster of servers.


2. The requests are shown as pages, sites and networks entering the
Facebook server from the top.
3. The social engine is the core of the application server.
4. This social engine handles ID, security, rendering and Facebook
integration operations.
5. Facebook has acquired Instagram, WhatsApp, Qculus VR and PrivateCore
applications. The social engine executes all user applications.
EXAMPLE- FACEBOOK PLATFORM ARCHITECTURE AND
SOCIAL SERVICES PROVIDED

1. The service functionalities of Facebook include six essential items


EXAMPLE- FACEBOOK PLATFORM ARCHITECTURE AND
SOCIAL SERVICES PROVIDED
1. Facebook provides blogging, chat, gifts, marketplace, voice/video calls,
etc.
2. There is a community engine that provides networking services to users.
3. Most Facebook applications are helping users to achieve their social
goals.
Mobile Cellular Core Networks
MOBILE CELLULAR CORE NETWORKS
1. A cellular network or mobile network is a wireless network distributed
over land areas.
2. Evolution of wireless access technologies has just entered the fourth
generation
1. The first generation (1G) - basic mobile voice communication
2. The second generation (2G) - introduced the capacity and coverage.
3. The third generation (3G) - data at higher speeds for truly “mobile
broadband” experience.
4. The fourth generation (4G) - provides access to a wide range of
telecommunication services.
1. The first generation (1G) – analog signals, introduces In US at 1980’s,Only
voice calls
• basic mobile voice communication
• speeds up to 2.4kbps
• poor voice quality
• no data security
2.The second generation (2G) - introduced the capacity and coverage. Text
and Multimedia messages
• Used digitals signals first time, launched by Finland,
• Speeds up to 64kbps
• 2G Used GPRS communication protocol where we get 2.5G(internet)
• 2.7G- edge(E)
3.The third generation (3G) - data at higher speeds for truly “mobile
broadband” experience.
• Started at early 2000
• Data speeds up to 2Mbps (H)
• High speed web browsing
• Multimedia – 3DGaming, video calls,gmails
• Mobile phones are very expensive
• High infrastructure costs are needed – Mobile towers
• Trained personal required for infrastructure required
3.5 G grouped together mobile technology(wireless) and data
technology(wired) (H+)
• Broadband, modems
4.The Fourth generation (4G) – Introduced in 2011.
• Speeds up to 1Gbps(volte, Lte)
• Mobile web access
• High definition mobile tv
• Cloud computing
5. The Fifth generation(5G) - 2022
MOBILE CELLULAR CORE NETWORKS
➢ 5G to enable higher capacity, higher
Data,
1G 2G 3G 4G 5G Connection, rate, more connectivity, higher
User

SMS Mutimedia
Experience reliability, lower latency, larger
versatility and application-domain
Mobile Networks

MIMO specific topologies.


NFV
Voice Throughput: 100Gbps
FDMA
CDMA
OFDM
User Throughput: 100Mbps
Latency: 1ms ➢ Splitting the control and data planes
Mobility: 120km/h
Cellular
Mobile
Peak Data Rate: 10Gbps is an interesting paradigm for 5G,
Mobile

TDMA
Network together with massive multi-input
multi-output (MIMO), advanced
Wireless Communications

antenna systems, software-defined


networking (SDN), Network
Functions Virtualization (NFV),
Internet of Things (IoT) and cloud
computing.
1990 2000 2010 2020
Mobile Devices and Internet Edge
Networks
MOBILE DEVICES AND INTERNET
EDGE NETWORKS

1. Mobile devices appear as smart phones, tablet computers, wearable gear


and industrial tools.
1. The 1G devices - analog phones for voice communication only.
2. The 2G mobile phones - for both voice and data communications.
3. The 3G phones - designed to have 2 Mbps speed for multimedia
communications through the cellular system
4. The 4G LTE (Long-Term Evolution ) - Targeted to achieve a
download speed of 100 Mbps, upload speed of 50 Mbps and a static
speed of 1 Gbps.
MOBILE CORE NETWORKS

1. Milestone mobile core networks for cellular telecommunication.


MOBILE INTERNET EDGE
NETWORKS

1. RANs (Radio Access Networks) are used to access the mobile core
networks, which are connected to the Internet backbone and many
Intranets through mobile Internet edge networks.
2. Such an Internet access infrastructure is also known as the wireless
Internet or mobile Internet.
3. There are several classes of RANs known as WiFi, Bluetooth, WiMax and
Zigbee networks.
4. There are several short-range wireless networks, such as wireless local-
area network (WLAN), wireless home-area network (WHAN), personal
area network (PAN) and body-area network (BAN), etc.
THE INTERACTIONS OF VARIOUS RADIO-ACCESS NETWORKS
(RANS) WITH THE UNIFIED ALL-IP BASED MOBILE CORE
NETWORK, INTRANETS AND THE INTERNET.
BLUETOOTH DEVICES AND NETWORKS
1. Bluetooth is a short-range radio technology, operates in 2.45 GHz
industrial scientific medical band.
2. It transmits omni-directional (360◦) signals with no limit data or voice.
3. It supports up to 8 devices (1 master and 7 slaves) in a PAN called
Piconet.
4. Bluetooth devices have low cost and low power requirements.
5. The device offers a data rate of 1 Mbps in adhoc networking with 10 cm
to 10 meters in range.
6. It supports voice or data communication between phones, computers and
other wearable devices.
WIFI NETWORKS
1. The access point broadcasts its signal in a radius of less than 300 ft.
2. The closer it is to the access point, the faster will be the data rate
experienced.
3. The maximum speed is only possible within 50–175 ft. The peak data
rates of WiFi networks have improved from less than 11 Mbps to 300
Mbps.
4. The network uses OFDG (orthogonal frequency-division multiplexing )
modulation technology with the use of multiple input and multiple output
(MIMO) radio and antenna to achieve its high speed.
5. WiFi enables the fastest WLAN in a mesh of access points or wireless
routers.
Mobile Cloud Computing
Infrastructure
MOBILE CLOUD COMPUTING INFRASTRUCTURE
1. Mobile cloud computing is a model for elastic augmentation of mobile
device capabilities via wireless access to cloud storage and computing
resources.
2. This is further enhanced by context-aware dynamic adaption to the
changes in the operating environment.
3. With the support of mobile cloud computing (MCC), a mobile user has a
new cloud option to execute its application.
4. The user attempts to offload the computation through WiFi, cellular
network or satellite to the distant clouds.
5. The cellphone itself is infeasible to finish some compute-intensive tasks.
Instead, the data related to the computation task is offloaded to the
remote cloud.
THE ARCHITECTURE OF A MOBILE CLOUD COMPUTING
ENVIRONMENT.
Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi

UNIT-1
Big Data Science and Machine
Intelligence
School of Computer Science and Engineering

Prof. Sethumadhavi.R
[email protected]

AY: 2022-2023
BIG DATA ACQUISITION AND ANALYTICS EVOLUTION

1. Big data analytics is the process of examining large amounts of data of a


variety of types (big data) to uncover hidden patterns, unknown
correlations and other useful information.
2. This information can provide competitive advantages over rival
organizations and result in higher business intelligence or scientific
discovery, such as more effective marketing, increased revenue, etc.
3. The primary goal of big data analytics is to help companies make better
business decisions by enabling data scientists and other users to analyze
huge volumes of transaction data that may be left untapped by
conventional business intelligence (BI) programs..
BIG DATA VALUE CHAIN EXTRACTED FROM MASSIVE DATA

1. Data science, data mining, data analytics and knowledge discovery are
closely related terms
2. These big data components form a big data value chain built up of
statistics, machine learning, biology and kernel methods.
3. Statistics cover both linear and logistic regression.
4. Decision trees are typical machine learning tools.
5. Biology refers to artificial neural networks, genetic algorithms and swarm
intelligence. Finally, the kernel method includes the use of support vector
machines.
1. Compared with traditional datasets, big data generally includes masses of
unstructured data that need more real-time analysis.
2. In addition, big data also brings about new opportunities for discovering
new values, helps us to gain an in-depth understanding of the hidden
values, and incurs new challenges.
3. example on how to effectively organize and manage such data. At
present, big data has attracted considerable interest from industry,
academia and government agencies.
4. The rapid growth of big data mainly comes from people’s daily life,
especially related to the Internet, Web and cloud services
1. Big data will have a huge and increasing potential in creating values for
businesses and consumers.
2. The most critical aspect of big data analytics is big data value.
We divide the value chain of big data into four phases:
Data generation,
Data acquisition,
Data storage and
Data analysis.
1. If we take data as a raw material, data generation and data acquisition are
exploitation processes, as data storage must use clouds or data centers.
2. Data analysis is a production process that utilizes the raw material to
create new value.
3. The rapid growth of cloud computing and IoT also triggers the sharp
growth of data. Cloud computing provides safeguarding, access sites and
channels for data assets.
4. In the paradigm of IoT, sensors worldwide are collecting and transmitting
data to be stored and processed in the cloud.
BIG DATA GENERATION

1. The major data types include Internet data, sensory data, etc.
2. This is the first step of big data. Given Internet data as an example, huge
amounts of data in terms of searching entries, Internet forum posts,
chatting records and microblog messages, are generated.
3. Those data are closely related to people’s daily lives, and have similar
features of high value and low density. Such Internet data may be
valueless individually but, through the exploitation of accumulated big
data, useful information such as habits and hobbies of users can be
identified, and it is even possible to forecast users’ behaviors and
emotional moods
DATA QUALITY CONTROL, REPRESENTATION AND DATABASE
MODEL

The quality control of big data involves a circular cycle of four stages:
i) we must identify the important data quality attributes;
ii) to access the data relies on the ability to measure or assess the data
quality level;
iii) then we must be able to analyze the data quality and their major causes;
and finally
iv) we need to improve the data quality by suggesting concrete actions to
take.
ATTRIBUTES FOR DATA QUALITY CONTROL, REPRESENTATION
AND DATABASE OPERATIONS.
BIG DATA ACQUISITION AND PRE-PROCESSING

Loading is the most complex procedure among the three, which includes operations
such as transformation, copy, clearing, standardization.
data integration methods are accompanied with flow processing engines and
search engines:
1) Data Selection: Select a target dataset or subset of data samples on which the
discovery is to be performed.
2) Data Transformation: Simplify the datasets by removing unwanted variables.
Then analyze useful features that can be used to represent the data, depending on
the goal or task.
3) Data Mining:
Searching for patterns of interest in a particular representational form or a
set of such representations as classification rules or trees, regression,
clustering, and so forth.
4) Evaluation and knowledge representation:
Evaluate knowledge pattern, and utilize visualization techniques to present
the knowledge vividly.
BIG DATA ACQUISITION

1. As the second phase, data acquisition also includes data collection, data
transmission and data pre-processing.
2. During big data acquisition, once we collect the raw data, we utilize an
efficient transmission mechanism to send it to a proper storage
management system to support different analytical applications.
3. The collected datasets may sometimes include much redundant or
useless data, which unnecessarily increases storage space and affects
the subsequent data analysis.
SOME BIG DATA ACQUISITION SOURCES AND MAJOR
PREPROCESSING OPERATIONS
LOG FILES

1. log files are record files automatically generated by the data source
system, so as to record activities in designated file formats for
subsequent analysis.
2. Log files are typically used in nearly all digital devices. For example, web
servers record in log files the number of clicks, click rates, visits, and
other property records of web users.
To capture activities of users at the websites, web servers mainly include the
following three log file formats:
public log file format (NCSA)
expanded log format (W3C)
IIS log format (Microsoft).
All three types of log files are in the ASCII text format.
SENSORS

Sensors are common in daily life to measure physical quantities and


transform physical quantities into readable digital signals for subsequent
processing (and storage).
Sensory data may be classified as sound wave, voice, vibration, automobile,
chemical, current, weather, pressure, temperature, etc.
Sensed information is transferred to a data collection point through wired or
wireless networks, for applications that may be easily deployed and
managed, for example video surveillance system
METHODS FOR ACQUIRING NETWORK DATA

1. Network data acquisition is accomplished using a combination of web


crawler, word segmentation system, task system and index system, etc.
2. Web crawler is a program used by search engines for downloading and
storing web pages .
3. Generally speaking, web crawler starts from the uniform resource locator
(URL) of an initial web page to access other linked web pages, during
which it stores and sequences all the retrieved URLs.
4. Web crawler acquires a URL in the order of precedence through a URL
queue and then downloads web pages, and identifies all URLs in the
downloaded web pages, and extracts new URLs to be put in the queue.
BIG DATA STORAGE

1. Big data storage refers to the storage and management of large-scale


datasets while achieving reliability and availability of data accessing.
2. The explosive growth of data has more strict requirements on data
storage and management.
3. The storage infrastructure needs to provide information storage service
with reliable storage space, and it must provide a powerful access
interface for query and analysis of a large amount of data.
Existing storage mechanisms of big data may be classified into three bottom
up levels:
file systems
databases
programming models.
File systems are the foundation of the applications at upper levels.
Google’s GFS is an expandable distributed file system to support large-scale,
distributed, data-intensive applications.
GFS uses cheap commodity servers to achieve fault tolerance and provides
customers with high performance services.
GFS supports large-scale file applications with more frequent reading than
writing. However, GFS also has some limitations, such as a single point of
failure and poor performances for small files. Such limitations have been
overcome by Colossus, the successor of GFS
GFS uses cheap commodity servers to achieve fault tolerance and provides
customers with high performance services.
other companies and researchers also have their solutions to meet the different
demands for storage of big data.
For example, HDFS and Kosmosfs are derivatives of open source codes of GFS.
Microsoft developed Cosmos to support its search and advertisement business.
Facebook utilizes Haystack to store the large amount of small sized photos.
Taobao also developed TFS and FastDFS.
DATA CLEANING

Data cleaning cleanses and preprocesses data by deciding strategies to


handle missing fields and alter the data as per the requirements.
Data cleaning is a process to identify inaccurate, incomplete or unreasonable
data, and then to modify or delete such data to improve data quality.
Generally, data cleaning includes five complementary procedures: defining
and determining error types, searching and identifying errors, correcting
errors, documenting error examples and error types, and modifying data
entry procedures to reduce future errors.
During cleaning, data formats, completeness, rationality and restriction
should be inspected.
Data cleaning is of vital importance to keep data consistency, which is widely
applied in many fields, such as banking, insurance, retail industry,
telecommunications and traffic control.
In e-commerce, most data is electronically collected, which may have serious
data quality problems. Classic data quality problems mainly come from
software defects, customized errors or system mis-configuration. Some
consider data cleaning in e-commerce by using crawlers and regularly re-
copying customer and account information
RADIO FREQUENCY IDENTIFICATION

RFID is widely used in many applications, for example inventory management


and target tracking.
original RFID features low quality, which includes a lot of abnormal data
limited by the physical design and affected by environmental noise.
The probabilistic model was developed to cope with data loss in mobile
environments.
We could build a system to automatically correct errors of input data by
defining global integrity constraints.
DATA INTEGRATION

Data integration is the cornerstone of modern commercial informatics, which


involves the combination of data from different sources and provides users
with a uniform view of data.
This is a mature research field for traditional database. Historically, two
methods have been widely recognized: data warehouse and data federation.
Data warehousing includes a process named ETL (Extract, Transform and
Load). Extraction involves connecting source systems, selecting, collecting,
analyzing and processing necessary data. Transformation is the execution of
a series of rules to transform the extracted data into standard formats.
Loading means importing extracted and transformed data into the target
storage infrastructure
EVOLVING DATA ANALYTICS OVER THE CLOUDS
Big data analytics is the process of examining large amounts of data of a
variety of types (big data) to uncover hidden patterns, unknown correlations
and other useful information.
Such information can provide competitive advantages over rival
organizations and result in higher business intelligence or scientific
discovery, such as more effective marketing, increased revenue, etc.
Big data sources must be protected in web server logs and Internet
clickstream data, social media activity reports, mobile-phone call records and
information captured by sensors or IoT devices.
Big data analytics can be done with the software tools commonly used as
part of advanced analytics disciplines such as predictive analytics and data
mining
THE EVOLUTION FROM BASIC ANALYSIS OF SMALL DATA (MB TO GB) IN THE PAST TO
SOPHISTICATED CLOUD ANALYTICS OVER TODAY’S BIG DATASETS (TB∼PB).
The performance space is divided into four subspaces:
1) The basic analysis of small data relies on historical observations to help
avoid past mistakes and duplicate past successes.
2) The advanced analytics system on small data is improved from the basic
capability to use advanced techniques to analyze the impact of future
scenarios.
3) As we move to cloud computing, most existing clouds provide a better
coordinated analytics workflow in a streamlined and automated fashion, but
still lack predictive or real-time capabilities.
4) For an ideal cloud analytics system, we expect to handle scalable big data
in streaming mode with real-time predictive capabilities.
LAYERED DEVELOPMENT OF CLOUD PLATFORM FOR BIG DATA
PROCESSING AND ANALYTICS APPLICATIONS.
cloud infrastructure management control, , which handles resources
provisioning, deployment of agreed resources, monitoring the overall system
performance and arranging the workflow in the cloud.
All big data elements collected from all sources form the data lake.
Data could be structured or unstructured or come-and-go in streaming mode.
This lake stores not only raw data but also the metadata for data
management.
At the middle layer, we need to provide views and indexes to visualize and
access data smoothly.
This may include geographic data, language translation mechanisms, entity
relationship, graphs analysis and streaming index, etc.
At the next higher level, we have the cloud processing engine which includes
data mining, discovery and analytics mechanisms to perform machine
learning, alerting, and data stream processing operations.
At the top level, we have to report or display the analytics results.
This includes visualization support for reporting with dashboards and query
interfaces. The display may take the form of histograms, bar graphs, charts,
video, etc
MACHINE INTELLIGENCE AND BIG DATA APPLICATIONS

Linking machine intelligence to big data applications:


Machine intelligence is attributed to smart clouds applied IoT sensing, and
data analytics capabilities.
Data Mining and Machine Learning:
We classify data mining into three categories:
association analysis
classification and cluster analysis.
Machine learning techniques are divided into three categories:
supervised learning,
unsupervised learning
other learning methods including
reinforcement learning
active learning
transfer learning and deep learning, etc
DATA MINING VERSUS MACHINE LEARNING

Data mining and machine learning are closely related to each other. Data
mining is the computational process of discovering patterns in large datasets
involving methods at the intersection of artificial intelligence, machine
learning, statistics and database systems.
The overall goal of the data-mining process is to extract information from a
dataset and transform it into an understandable structure for further use.
Aside from the raw analysis step, it involves database and data management
aspects, data pre-processing, model and inference considerations,
interestingness metrics, complexity considerations, postprocessing of
discovered structures, visualization and online updating.
Machine learning explores the construction and study of algorithms that can
learn from and make predictions on data. Such algorithms operate by
building a model from example inputs in order to make data-driven
predictions or decisions, rather than following strictly static program
instructions.
Machine learning is closer to applications and end user.
It focuses on prediction, based on known properties learned from the training
data.
we divide machine learning techniques into three categories: i) supervised
learning such as regression model ,decision tree, etc.
ii) unsupervised learning, which includes clustering, anomaly detection, etc.
iii) other learning, such as reinforcement learning, transfer learning, active
learning and deep learning, etc
THE RELATIONSHIP OF DATA MINING AND MACHINE
LEARNING
DATA MINING TECHNIQUES ARE CLASSIFIED
INTO THREE CATEGORIES

i)association analysis includes Apriori algorithm and FP-growing algorithm;

ii) classification algorithm includes decision tree, support vector machine


(SVM), k-nearest-neighbor, Naıve Bayesian, Bayesian belief network and
artificial-neural-network (ANN), etc.

iii) clustering algorithm includes K-means and density-based spatial


clustering of applications with noise
Many solutions of big data applications claim they can improve data processing
and analysis capacities in all aspects, but there exists no unified evaluation
standard and benchmark to balance the computing efficiency of big data with
rigorous mathematical methods.

The performance can only be evaluated by an implemented and deployed system,


which could not horizontally compare advantages and disadvantages of various
solutions.
Before and after the use of big data, the efficiencies are also hard to compare. In
addition, since data quality is an important basis of data preprocessing,
simplification and screening, it is another urgent problem to effectively evaluate
data quality.
The emergence of big data triggers the development of algorithm design,
which has transformed from a computing-intensive approach into a data-
intensive approach.

Data transfer has been a main bottleneck of big data computing.


Therefore, many new computing models tailored for big data have emerged
and more such models are on the horizon.

Machine intelligence is critical to solve challenging issues existing in big data


applications. Machine intelligence is obtained through machine learning
Supervised Machine Learning: includes the following categories:
a) Regression Model: Decision Tree, SVM;
b) Bayesian Classifier: Hidden Markov Model;

Unsupervised Machine Learning:


a) Dimension Reduction: Principal component analysis (PCA);
b) Clustering: finding a partition of the observed data in the absence of explicit
labels indicating a desired partition.
MACHINE LEARNING ALGORITHMS
Other Machine Learning techniques:
a) Reinforcement Learning: Markov decision processes (MDPs) provide a
mathematical framework for modeling decision making in situations
where the outcomes are partly random and partly under the control of a
decision maker.

b) Transfer Learning: Through transfer learning, the time-consuming and


laborintensive processing costs can be reduced extensively. After a certain
time of labeling and validation through transfer learning, the training sets are
established. Among various key big data technologies, machine intelligence
is the key component.
BIG DATA APPLICATIONS – AN OVERVIEW- COMMERCIAL
APPLICATIONS
Application categories of big data: from TBs to PBs (NIST 2013).
Abundant products and customer information, including click stream data logs and user
behavior, etc., can be acquired from the websites.

Product layout optimization, customer trade analysis, product suggestions and market
structure analysis can be conducted by text analysis and website mining technologies.

The quantity of mobile phones and tablet PC first surpassed that of laptops and PCs in
2011. Mobile phones and Internet of Things based on sensors are opening a new
generation of innovation applications, and searching for larger capacity of supporting
location sensing, people oriented and context operation
NETWORK APPLICATIONS
The early Internet mainly provided email and webpage services.
Text analysis, data mining and webpage analysis technologies have been
applied to the mining of email content and building search engines.
Nowadays, most applications are web-based, regardless of their application
field and design goals.
Network data accounts for a major percentage of the global data volume.

Web has become the common platform for interconnected pages, full of
various kinds of data, such as text, images, videos, pictures and interactive
content, etc.

Advanced technologies are in great demand in semi-structured or unstructured


data.
For example, the image analysis technology may extract useful information
from pictures, for example face recognition.

Multimedia analysis technologies are applied to automated video


surveillance systems for business, law enforcement and military applications.

Online social media applications, such as Internet forums, online


communities, blogs, social networking services and social multimedia
websites, etc., provide users with great opportunities to create, upload and
share content.

Different user groups may search for daily news and publish their opinions
with timely feedback.
BIG DATA IN SCIENTIFIC APPLICATIONS

Scientific research in many fields is acquiring massive data with high-


throughput sensors and instruments, such as astrophysics, oceanology,
genomics and environmental research.
The US National Science Foundation (NSF) has recently announced the
BIGDATA Research Initiative to promote research efforts to extract
knowledge and insights from large and complex collections of digital data.
For example, in biology, iPlant applies network infrastructure, physical
computing resources, coordination environment, virtual machine resources,
and inter-operative analysis software and data service to assist research,
educators and students, in enriching all plant sciences.
INTERACTIONS AMONG SMACT SUBSYSTEMS

iPlant datasets have high varieties in form, including specification or


reference data, experimental data, analog or model data, observation data
and other derived data. Big data has been applied in the analysis of
structured data, text data, website data, multimedia data, network data and
mobile data.
APPLICATION OF BIG DATA IN ENTERPRISES
The application of big data in business enterprises can enhance their
production efficiency and competitiveness in many aspects.
In marketing, with correlation analysis of big data, business enterprises can
accurately predict the behavior of consumers.
On sales planning, after comparison of massive data, business enterprises can
optimize their commodity prices.
On operation, such enterprises can improve their operation efficiency and
operation satisfaction, optimize the input of the labor force, accurately forecast
personnel allocation requirements, avoid excess production capacity and
reduce labor costs. On supply chain, using big data, business enterprises may
conduct inventory optimization, logistic optimization and supplier coordination,
etc., to mitigate the gap between supply and demand, control budgets and
improve services.
BANKING USE OF BIG DATA IN FINANCING AND
E-COMMERCE APPLICATIONS

In the finance community, the application of big data has grown rapidly in
recent years. For example, China Merchants Bank utilizes data analysis to
recognize that such activities as “Multi-times score accumulation” and
“score exchange in shops,” are effective for attracting quality customers.
By building a customer loss early warning model, the bank can sell high-yield
financial products to the top 20% customers in loss ratio so as to retain
them. As a result, the loss ratios of customers with Gold Cards and Sunflower
Cards have been reduced by 15% and 7%, respectively
EX-ALIBABA
The credit loan of Alibaba automatically analyzes and judges whether to
provide loans to business enterprises through the acquired enterprise
transaction data by virtue of big data technologies, while manual intervention
does not occur in the entire process.
It is disclosed that, so far, Alibaba has lent more than RMB 30 billion Yuan,
with the rate of bad loans at only about 0.3\%, which is a great deal lower
than those of other commercial banks.
HEALTHCARE AND MEDICAL APPLICATIONS
The healthcare industry is growing rapidly and medical data is a continuously
and rapidly growing complex data, containing abundant and various
information values.
Big data has unlimited potential for effectively storing, processing, querying
and analyzing medical data.
The application of medical big data will profoundly influence human health.
The IoT is revolutionizing the healthcare industry.
Sensors collect patient data, then microcontrollers process, analyze and
communicate the data over wireless Internet. Microprocessors enable rich
graphical user interfaces. Healthcare clouds and gateways help analyze the
data with statistical accuracy.
Microprocessors enable rich graphical user interfaces.
Healthcare clouds and gateways help analyze the data with statistical
accuracy.
COLLECTIVE INTELLIGENCE
With the rapid development of wireless communication and sensor
technologies, mobile phones and tablet computers have integrated more and
more sensors, with increasingly stronger computing and sensing capacities.
As a result, crowd sensing is taking to the center stage of mobile computing.
In crowd sensing, a large number of general users utilize mobile devices as
basic sensing units to conduct coordination with mobile networks for
distribution of sensed tasks and collection and utilization of sensed data.
The goal is to complete large-scale and complex social sensing tasks. In
crowd sensing, participants who complete complex sensing tasks do not
need to have professional skills.
Crowd sensing modes represented by Crowdsourcing have been successfully
applied to geotagged photograph, positioning and navigation, urban road
traffic sensing, market forecasting, opinion mining and other labor-intensive
applications.
Crowdsourcing, a new approach to problem solving, takes a large number of
general users as the foundation and distributes tasks in a free and voluntary
way.
Crowdsourcing can be useful for labor-intensive applications, such as picture
marking, language translation and speech recognition
The main idea of Crowdsourcing is to distribute tasks to general users and to
complete tasks that users could not individually complete or do not
anticipate to complete.
In the big data era, Spatial Crowdsourcing is a hot topic.
The operation framework of Spatial Crowdsourcing is shown as follows.
A user may request the service and resources related to a specified location.
Then the mobile users who are willing to participate in the task will move to
the specified location to acquire related data (i.e. video, audio or pictures).
Finally, the acquired data will be sent to the service requester. With the rapid
growth of mobile devices and the increasingly complex functions provided by
such devices, it is forecast that Spatial Crowdsourcing will be more prevalent
than traditional Crowdsourcing, for example Amazon Turk and Crowdflower
COGNITIVE COMPUTING – AN INTRODUCTION
The term cognitive computing is derived from cognitive science and artificial
intelligence. For years, we have wanted to build a “computer” that can
compute as well as learn by training, to achieve some human-like senses or
intelligence.
It has been called a “brain-inspired computer” or a “neural computer”. Such a
computer will be built with special hardware and/or software, which can
mimic basic human brain functions such as handling fuzzy information and
perform affective, dynamic and instant responses.
It can handle some ambiguity and uncertainty beyond traditional computers.
we want a cognitive machine that can model the human brain with the
cognitive power to learn, memorize, reason and respond to external stimulus,
autonomously and tirelessly. This field has been also called
“neuroinformatics”.
Cognitive computing hardware and applications could be more affective and
influential by design choices to make a new class of problems computable.
Such a system offers a synthesis, not just of information sources but of
influences, contexts and insights.
SYSTEM FEATURES OF COGNITIVE COMPUTING
cognitive system redefines the relationship between humans and their
pervasive digital environment.
They may play the role of assistant or coach for the user, and they may act
virtually autonomously in many situations.
The computing results of a cognitive system could be suggestive,
prescriptive or instructive in nature.
LISTED BELOW ARE SOME CHARACTERISTICS OF COGNITIVE
COMPUTING SYSTEMS:
Adaptive in learning: They may learn as information changes, and as goals
and requirements evolve. They may resolve ambiguity and tolerate
unpredictability. They may be engineered to feed on dynamic data in real
time, or near real time.
Interactive with users: Users can define their needs as a trainer of the
cognitive system. They may also interact with other processors, devices and
cloud services, as well as with people
Iterative and stateful: They may redefine a problem by asking questions or
finding additional source input if a problem statement is ambiguous or
incomplete. They may “remember” previous interactions iteratively.
Contextual in information discovery: They may understand, identify and
extract contextual elements such as meaning, syntax, time, location,
appropriate domain, regulations, user’s profile, process, task and goal. They
may draw on multiple sources of information, including both structured and
unstructured digital information, as well as sensory inputs such as visual,
gestural, auditory or sensor provided.
DIFFERENCES WITH CURRENT COMPUTERS
Cognitive systems differ from current computing applications in that they
move beyond tabulating and calculating based on preconfigured rules and
programs. Although they are capable of basic computing, they can also infer
and even reason based on broad objectives.
Cognitive computing systems can be extended to integrate or leverage
existing information systems and add domain or task-specific interfaces and
tools. Cognitive systems leverage today’s IT resources and coexist with
legacy systems into the future. The ultimate goal is to bring computing even
closer to human thinking and become a fundamental partnership in human
endeavour.
RELATED FIELDS TO NEUROINFORMATICS AND COGNITIVE
COMPUTING.
Cognitive science is interdisciplinary in nature. It covers the areas of
psychology artificial intelligence, neuroscience and linguistics, etc.
It spans many levels of analysis from low-level machine learning and
decision mechanisms to high-level neural circuitry to build brain-modeled
computers.
APPLICATIONS OF COGNITIVE MACHINE LEARNING
Cognitive computing platforms have emerged and become commercially
available, and evidence of real-world applications is starting to surface.
Organizations have adopted and used these cognitive computing platforms
for the purpose of developing applications to address specific use cases,
with each application utilizing some combination of available functionality.
Examples of such real-world cases include:
i) speech understanding;
ii) sentiment analysis;
iii) face recognition;
iv) election insights;
v) autonomous driving;
vi) deep learning applications.
Many more examples are available in cognitive computing services. These
demystify the possibilities into real-world applications.
MACHINE AND DEEP LEARNING APPLICATIONS CLASSIFIED IN
16 CATEGORIES.
Among these big data applications:
a) object recognition;
b) video interpretation;
c) image retrieval;
are related to machine vision applications.
1. Text and document tasks include:
a) fact extraction; b) machine translation; and c) text comprehension.
2.On the audio and emotion detection side, we have:
a) speech recognition; b) natural language processing, and c) sentiment
analysis tasks.
3) In medical or healthcare applications,
we have: a) cancer detection; b) drug discovery; c) toxicology and radiology;
and d) bioinformatics.
Additional information on cognitive machine learning applications can be
found on the youtube website: www.youtube.com/playlist?
list =PLjJh1vlSEYgvGod9wWiydum Yl8hOXixNu
In business and financial applications, we have (n) digital advertising, (o)
fraud detection and (p) sell and buy prediction in market analysis. Many of
these cognitive tasks are awaiting automation.

You might also like