0% found this document useful (0 votes)

48 views

Terminologies Used in Big Data Environments

The document outlines key terminologies used in Big Data environments, including concepts such as as-a-service infrastructure, data science, and data mining. It also covers important technologies and frameworks like Hadoop and Spark, as well as programming languages like Python and R, which are essential for data analysis. Additionally, it discusses the significance of structured and unstructured data, predictive modeling, and visualization in deriving insights from large datasets.

Uploaded by

sharma51manu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Terminologies Used in Big Data Environments

Uploaded by

sharma51manu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Terminologies Used In Big Data Environments

 As-a-service infrastructure

Data-as-a-service, software-as-a-service, platform-as-a-service – all refer to the idea that rather than
selling data, licences to use data, or platforms for running Big Data technology, it can be provided “as a
service”, rather than as a product. This reduces the upfront capital investment necessary for customers to
begin putting their data, or platforms, to work for them, as the provider bears all of the costs of setting up
and hosting the infrastructure. As a customer, as-a-service infrastructure can greatly reduce the initial cost
and setup time of getting Big Data initiatives up and running.

 Data science

Data science is the professional field that deals with turning data into value such as new insights or
predictive models. It brings together expertise from fields including statistics, mathematics, computer
science, communication as well as domain expertise such as business knowledge. Data scientist has
recently been voted the No 1 job in the U.S., based on current demand and salary and career
opportunities.

 Data mining

Data mining is the process of discovering insights from data. In terms of Big Data, because it is so large,
this is generally done by computational methods in an automated way using methods such as decision
trees, clustering analysis and, most recently, machine learning. This can be thought of as using the brute
mathematical power of computers to spot patterns in data which would not be visible to the human eye
due to the complexity of the dataset.

 Hadoop

Hadoop is a framework for Big Data computing which has been released into the public domain as open
source software, and so can freely be used by anyone. It consists of a number of modules all tailored for a
different vitalstep ofthe Big Data process – from file storage (Hadoop File System – HDFS) to database
(HBase) to carrying out data operations (Hadoop MapReduce – see below). It has become so popular due
to its power and flexibility that it has developed its own industry of retailers (selling tailored versions),
support service providers and consultants.

 Predictive modeling

At its simplest, this is predicting what will happen next based on data about what has happened
previously. In the Big Data age, because there is more data around than ever before, predictions are
becoming more and more accurate. Predictive modelling is a core component of most Big Data initiatives,
which are formulated to help us choose the course of action which will lead to the most desirable
outcome. The speed of modern computers and the volume of data available means that predictions can be
made based on a huge number of variables, allowing an ever-increasing number of variables to be
assessed for the probability that it will lead to success.

 MapReduce
MapReduce is a computing procedure for working with large datasets, which was devised due to
difficulty of reading and analysing really Big Data using conventional computing methodologies. As its
name suggest, it consists of two procedures – mapping (sorting information into the format needed for
analysis – i.e. sorting a list of people according to their age) and reducing (performing an operation, such
checking the age of everyone in the dataset to see who is over 21).

 NoSQL

NoSQL refers to a database format designed to hold more than data which is simply arranged into tables,
rows, and columns, as isthe case in a conventional relational database. This database format has proven
very popular in Big Data applications because Big Data is often messy, unstructured and does not easily
fit into traditional database frameworks.

 Python

Python is a programming language which has become very popular in the Big Data space due to its ability
to work very well with large, unstructured datasets (see Part II for the difference between structured and
unstructured data). It is considered to be easier to learn for a data science beginner than other languages
such as R (see also Part II) and more flexible.

 R Programming

R is another programming language commonly used in Big Data, and can be thought of as more
specialised than Python, being geared towards statistics. Its strength lies in its powerful handling of
structured data. Like Python, it has an active community of users who are constantly expanding and
adding to its capabilities by creating new libraries and extensions.

 Recommendation engine

A recommendation engine is basically an algorithm, or collection of algorithms, designed to match an

entity (for example, a customer) with something they are looking for. Recommendation engines used by
the likes of Netflix or Amazon heavily rely on Big Data technology to gain an overview of their
customers and, using predictive modelling, match them with products to buy or content to consume. The
economic incentives offered by recommendation engines has been a driving force behind a lot of
commercial Big Data initiatives and developments over the last decade.

 Real-time

Real-time means “as it happens” and in Big Data refers to a system or process which is able to give data-
driven insights based on what is happening at the present moment. Recent years have seen a large push
for the development of systems capable of processing and offering insights in real-time (or near-real-
time), and advances in computing power as well as development of techniques such as machine learning
have made it a reality in many applications today.

 Reporting

The crucial “last step” of many Big Data initiative involves getting the right information to the people
who need it to make decisions, at the right time. When this step is automated, analytics is applied to the
insights themselves to ensure that they are communicated in a way that they will be understood and easy
to act on. This will usually involve creating multiple reports based on the same data or insights but each
intended for a different audience (for example, in-depth technical analysis for engineers, and an overview
of the impact on the bottom line for c-level executives).

 Spark

Spark is another open source framework like Hadoop but more recently developed and more suited to
handling cutting-edge Big Data tasks involving real time analytics and machine learning. Unlike Hadoop
it does not include its own filesystem, though it is designed to work with Hadoop’s HDFS or a number of
other options. However, for certain data related processes it is able to calculate at over 100 times the
speed of Hadoop, thanks to its in-memory processing capability. This means it is becoming an
increasingly popular choice for projects involving deep learning, neural networks and other compute-
intensive tasks.

 Structured Data

Structured data is simply data that can be arranged neatly into charts and tables consisting of rows,
columns or multi-dimensioned matrixes. This is traditionally the way that computers have stored data,
and information in this format can easily and simply be processed and mined for insights. Data gathered
from machines is often a good example ofstructured data, where various data points – speed, temperature,
rate of failure, RPM etc. – can be neatly recorded and tabulated for analysis.

 Unstructured Data

Unstructured data is any data which cannot easily be put into conventional charts and tables. This can
include video data, pictures, recorded sounds, text written in human languages and a great deal more. This
data has traditionally been far harder to draw insight from using computers which were generally
designed to read and analyze structured information. However, since it has become apparent that a huge
amount of value can be locked away in this unstructured data, great efforts have been made to create
applications which are capable of understanding unstructured data – for example visual recognition and
natural language processing.

 Visualization

Humans find it very hard to understand and draw insights from large amounts of text or numerical data –
we can do it, but it takes time, and our concentration and attention is limited. For this reason effort has
been made to develop computer applications capable of rendering information in a visual form – charts
and graphics which highlight the most important insights which have resulted from our Big Data projects.
A subfield of reporting (see above), visualizing is now often an automated process, with visualizations
customized by algorithm to be understandable to the people who need to act or take decisions based on
them.

Data Science and Big Data Computing - Frameworks and Methodologies
90% (10)
Data Science and Big Data Computing - Frameworks and Methodologies
332 pages
Data Science and Big Data Computing PDF
100% (2)
Data Science and Big Data Computing PDF
332 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Big Data
No ratings yet
Big Data
190 pages
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
Unit-1 Introduction to Data Analytics.pptx
No ratings yet
Unit-1 Introduction to Data Analytics.pptx
35 pages
Big Data Lec4
No ratings yet
Big Data Lec4
38 pages
Analyzing Limitations and Solutions of Existing Data Analytics
No ratings yet
Analyzing Limitations and Solutions of Existing Data Analytics
21 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
No ratings yet
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
3 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Unit 1-BigDataTools
No ratings yet
Unit 1-BigDataTools
69 pages
2 emerging
No ratings yet
2 emerging
10 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Unit I LM
No ratings yet
Unit I LM
12 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
BCE Report
No ratings yet
BCE Report
14 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Module 3 - Data Science
No ratings yet
Module 3 - Data Science
22 pages
Mtech Scheme
No ratings yet
Mtech Scheme
54 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Data Analytics
No ratings yet
Data Analytics
69 pages
BIG DATA_UNIT-I
No ratings yet
BIG DATA_UNIT-I
17 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Dsbda Unit -1 - Copy
No ratings yet
Dsbda Unit -1 - Copy
21 pages
A Review On Data Science Technologies
No ratings yet
A Review On Data Science Technologies
3 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Module 1
No ratings yet
Module 1
54 pages
Big Data Analytics
100% (3)
Big Data Analytics
79 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Big-Data-A-Comprehensive-Overview
No ratings yet
Big-Data-A-Comprehensive-Overview
25 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Understanding Big Data and NoSQL
No ratings yet
Understanding Big Data and NoSQL
31 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
Lec1 Special
No ratings yet
Lec1 Special
21 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
27 pages
Introduction To Big Data Ecosystem V 2.0
No ratings yet
Introduction To Big Data Ecosystem V 2.0
76 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Bigdata Unit1
No ratings yet
Bigdata Unit1
62 pages
0 Principles of Big Data
No ratings yet
0 Principles of Big Data
70 pages
BDA-UNIT-I-LM
No ratings yet
BDA-UNIT-I-LM
14 pages
Big Data Concepts
No ratings yet
Big Data Concepts
15 pages
Articulo Examen Global - Ingles PROTEGIDO
No ratings yet
Articulo Examen Global - Ingles PROTEGIDO
10 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
01 Unit-I Introduction To Big Data
No ratings yet
01 Unit-I Introduction To Big Data
11 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Big Data Analytics and Its Applications
No ratings yet
Big Data Analytics and Its Applications
4 pages
Managing Information Technology
No ratings yet
Managing Information Technology
18 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
16 pages
2 - Unms - Building A Data Warehouse To Support Active Student Management Analysis and Design PDF
No ratings yet
2 - Unms - Building A Data Warehouse To Support Active Student Management Analysis and Design PDF
6 pages
Arc Report Operational Historian
No ratings yet
Arc Report Operational Historian
6 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
Meghna Kumar
No ratings yet
Meghna Kumar
1,100 pages
Mostafa Magdy - Software Engineer
No ratings yet
Mostafa Magdy - Software Engineer
1 page
AI First unit notes
No ratings yet
AI First unit notes
4 pages
Byte - Wikipedia
No ratings yet
Byte - Wikipedia
14 pages
Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke
No ratings yet
Jomo Kenyatta University OF Agriculture & Technology: P.O. Box 62000, 00200 Nairobi, Kenya E-Mail: Elearning@jkuat - Ac.ke
14 pages
Deep Learning QP
No ratings yet
Deep Learning QP
4 pages
RK Lekha Singh -Resume
No ratings yet
RK Lekha Singh -Resume
2 pages
Lecture 4 - Computer Vision Notes
No ratings yet
Lecture 4 - Computer Vision Notes
47 pages
EHR Standards 2016 MoHFW
No ratings yet
EHR Standards 2016 MoHFW
48 pages
How Big Data Is Different
No ratings yet
How Big Data Is Different
5 pages
Simran Kaur Kullar: Western University
No ratings yet
Simran Kaur Kullar: Western University
2 pages
CV_Daniel2025
No ratings yet
CV_Daniel2025
12 pages
1 s2.0 S1877050921023590 Main
No ratings yet
1 s2.0 S1877050921023590 Main
7 pages
Detection of Malicious Android Apps Using Machine Learning Techniques
No ratings yet
Detection of Malicious Android Apps Using Machine Learning Techniques
7 pages
Lab: Custom Resource Definition: Ensure You Have Running Cluster Deployed
No ratings yet
Lab: Custom Resource Definition: Ensure You Have Running Cluster Deployed
5 pages
Évaluation Certificative - CESS - 2014 - Histoire - Dossier de L'enseignant Et Guide de Correctio (Ressource 10701)
No ratings yet
Évaluation Certificative - CESS - 2014 - Histoire - Dossier de L'enseignant Et Guide de Correctio (Ressource 10701)
16 pages
The Illustrated Transformer – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
No ratings yet
The Illustrated Transformer – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
5 pages
Computer Science - IT Courses in Germany (English)
No ratings yet
Computer Science - IT Courses in Germany (English)
11 pages
HCI Objective QuestionBank Mid1
No ratings yet
HCI Objective QuestionBank Mid1
2 pages
Database Systems Course Content
No ratings yet
Database Systems Course Content
7 pages
Design and Implementation of A Laundry Management
No ratings yet
Design and Implementation of A Laundry Management
9 pages
Kushi Resume
No ratings yet
Kushi Resume
2 pages
Tybscit Sem 5&6 Syllabus
No ratings yet
Tybscit Sem 5&6 Syllabus
31 pages
Cartography: Visualization of Geospatial Data 4th Edition Menno-Jan Kraakpdf download
100% (2)
Cartography: Visualization of Geospatial Data 4th Edition Menno-Jan Kraakpdf download
24 pages
KashishRana_resume-1
No ratings yet
KashishRana_resume-1
1 page

Terminologies Used in Big Data Environments

Uploaded by

Terminologies Used in Big Data Environments

Uploaded by

Terminologies Used In Big Data Environments

A recommendation engine is basically an algorithm, or collection of algorithms, designed to match an

You might also like