29/08/2019 Data Science Terminology Flashcards | Quizlet
Data Science Terminology
STUDY
Flashcards
Learn
Write
Spell
Test
PLAY
Match
Created by
obscrivn TEACHER
Terms in this set (88)
Data Analyst Collects data, visualizes data with various
tools and looks for patterns and insights.
Knows basic statistics and has
business/domain knowledge
Data Engineer Develops and manages infrastructure
that deals with big data. A specialist in
Data Wrangling. Well versed with tools
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 1/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
such as Hadoop, NoSQL and
MapReduce. Sets up data pipelines
Data Wrangling Transform raw data into a form suitable
for analysis. For example, combining
multiple datasets, removing
inconsistencies, converting into a specific
format
Data Cleansing Raw data with missing values, bad
delimiters or inconsistent records is
repaired to syntactic and semantic
correctness
EDA Exploratory Data Analysis - a first step in
exploring data without statistical
modeling and inference
Aggregation the process through which data is
searched, gathered and presented.
Algorithm a mathematical process that can perform
a specific analysis or transformation on a
piece of data.
Analytics the discovery and communication of
insights derived from data, or the use of
software-based algorithms and statistics
to derive meaning from data.
Analytics Platform software and/or hardware that provide
the tools and computational power
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 2/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
needed to build and perform many
different analytical queries.
Anomaly Detection the systematic search for data items in a
dataset that deviate from a projected
pattern or expected behavior.
Artificial Intelligence the field of computer science related to
(A.I.) the development of machines and
software that are capable of perceiving
their environment and taking appropriate
action when required (in real-time), even
learning from those actions.
Behavioral Analytics investigates humanized patterns in the
data.
Big Data data sets with sizes beyond the ability of
commonly used software tools to
capture, curate, manage and process
Business Intelligence the theories, methodologies and
processes to make data, understandable
and more actionable.
Byte (B) an acronym for "binary term." A sequence
of bits that represents a character.
Central Processing Unit the brains of an information processing
(CPU) system; the processing component that
controls the interpretation and execution
of instructions in a computer.
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 3/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Classification Analysis a systematic process for obtaining
important and relevant information about
data using classification algorithms.
Cloud a broad term that refers to any Internet-
based application or service that is
hosted remotely.
Cloud Computing a computing system whose processing is
distributed over a network that uses
server farms to store data in a distant
location (see also, data centers).
Clustering Analysis the process of identifying objects that
are similar to each other and grouping
them in order to understand the
differences and the similarities within the
data.
Comparative Analysis a process that ensures a step-by-step
procedure of comparisons and
calculations to detect patterns within
very large data sets.
Correlation Analysis a statistical technique for determining a
relationship between variables and
whether that relationship is negative or
positive.
Customer Relationship managing sales and business processes.
Management (CRM)
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 4/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Dashboard a graphical representation of the
analyses performed by algorithms,
usually in the form of plots and gauges.
Data a quantitative or qualitative value.
Data Access the act or method of viewing or
retrieving stored data.
Data Aggregation Tools methods for transforming scattered data
from numerous sources into a new, single
source.
Data Analytics the application of software to derive
information or meaning from data. The
end result might be a report, an
indication of status or an action taken
automatically based on the information
received.
Data Analyst someone who analyzes, models,
cleanses, and/or processes data.
Database a digital collection of data and the
structure in which the data is organized
(structured).
Database Management collecting, storing and providing access
System (DBMS) of data through integrated software that
is practical to use even by non-
specialists.
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 5/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Data Cleansing the process of reviewing and revising
data in order to delete duplicates,
correct errors and provide consistency.
Data Mining the process of finding certain patterns or
information from data sets in an
automated way. This is one popular way
to perform data exploration.
Data Modeling development of a graphic representation
defining the structure of data for the
purpose of communicating the data
needed for business processes between
functional and technical people or for
communicating a plan to develop how
data is stored and accessed among
application development team members.
Data Science a recent term that has multiple definitions
but is generally accepted as a discipline
that incorporates statistics, data
visualization, computer programming,
data mining, machine learning and
database engineering to solve complex
problems.
Discriminant Analysis a statistical analysis that takes advantage
of known groups or clusters in data to
derive the classification rule. It involves
cataloguing the data as well as
distributing it into groups, classes or
categories.
Event Analytics a process that shows the series of steps
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 6/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
that led to an action.
Exploratory Analysis finding patterns within data without
standard procedures or methods. It is a
means of discovering the data and
finding the data set's main characteristics,
it constitutes an important part of the
data science process.
Extract, Transform and a process for populating data in a
Load (ETL) database and data warehouse.
Hypertext a technology that links text in one part of
a document with related text in another
part of the document or in other
documents. A user can quickly find the
related text by clicking on the
appropriate keyword, key phrase, icon or
button.
Hypertext Transfer the protocol used on the World Wide
Protocol (HTTP) Web that permits Web clients (Web
browsers) to communicate with Web
servers. This protocol allows
programmers to embed hyperlinks in
Web documents using hypertext markup
language (HTML).
Internet of Things (IoT) ordinary devices that are connected to
the Internet at anytime and anywhere via
sensors. IoT is expected to contribute
substantially to the growth of big data.
Machine Learning (ML) the field of computer science related to
the development and use of algorithms
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 7/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
to enable machines to learn from what
they are doing and become better over
time.
Natural Language a field of computer science involved with
Processing (NLP) interactions between computers and
human languages.
Online Analytical the process of analyzing
Processing (OLAP) multidimensional data using three
operations: consolidation (the
aggregation of available data), drill-
down (the ability for users to see the
underlying details) and slice and dice
(the ability for users to select subsets and
view them from different perspectives).
Ontology ontology represents knowledge as a set
of concepts within a domain and the
relationships between those concepts.
Very useful when designing a database.
Outlier an object that deviates significantly from
the general average within a dataset or a
combination of data. It is numerically
distant from the rest of the data and
therefore indicates that something is
going on that requires additional
analysis.
Parallel Data Analysis breaking up an analytical problem into
smaller components and running
algorithms on each of those components
at the same time.
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 8/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Predictive Analysis the most valuable analysis within big data
(Predictive Analytics) as it helps predict what someone is likely
to buy, visit or do as well as how
someone will behave in the (near) future.
It uses a variety of different data sets
such as historical, transactional, social, or
customer profile data to identify risks
and opportunities.
Predictive Modeling the process of developing a model to
predict a trend or outcome.
R an open-source programming language
and software environment for statistical
computing and graphics. The R language
is widely used among statisticians and
data miners for developing statistical
software and data analysis. R's popularity
has increased substantially in recent
years.
Regression Analysis a statistical technique for defining the
dependency between continuous
variables. It assumes a one-way causal
effect from one variable to the response
of another variable.
Risk Analysis the application of statistical methods on
one or more datasets to determine the
likely risk of a project, action or decision.
Root-Cause Analysis the process of determining the main
cause of an event or problem.
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 9/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Semi-Structured Data a form a structured data that does not
conform to a formal structure the way
structured data does. It contains tags or
other markers to enforce a hierarchy of
records. usually found in .JSON objects.
Signal Analysis the analysis of measurement of time
varying or spatially varying physical
quantities to analyze the performance of
a product. .
Structured Data data that is identifiable because it is
organized in a structure such as rows and
columns. The data resides in fixed fields
within a record or file, or the data is
tagged correctly and can be accurately
identified.
Structured Query a programming language for retrieving
Language (SQL) data from a relational database. SQL is
not directly applicable in the big data
domain.
Text Analytics the application of statistical, linguistic
and machine learning techniques on
text-based sources to derive meaning or
insight.
Thread a series of posted messages that
represents an ongoing discussion of a
specific topic in a bulletin board system,
a newsgroup or a Web site.
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 10/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Time Series Analysis the process of analyzing well-defined
data obtained through repeated
measurements of time. The data has to
be well-defined and measured at
successive points in time spaced at
identical time intervals.
Topological Data focusing on the shape of complex data
Analysis and identifying clusters and any
statistical significance that is present
within that data.
Unstructured Data data that is text heavy, in general, but
may also contain dates, numbers and
facts.
Variable A characteristic of quantity of interest
that can take on different values
Observation A set of values corresponding to a set of
variables
Variation Differences in values of a variable over
observations
Random variable A quantity whose values are not known
with certainty
Population The set of all elements of interest in a
particular study
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 11/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Sample A subset of the population
Random sampling The act of collecting a sample that
ensures that (1) each element selected
comes from the same population and (2)
each element is selected independently
Quantitative data Data where numerical values are used to
indicate magnitude, such as how many or
how much. Arithmetic operations such as
addition, subtraction, and multiplication
can be performed on quantitative data
Categorical data Data where categories of like items are
identified by labels or names. Arithmetic
operations cannot be performed on
categorical data
Frequency distribution A tabular summary of data showing the
number (frequency) of data values in
each of several non-overlapping bins
Relative frequency A tabular summary of data showing the
distribution fraction or proportion of data values in
each of several non-overlapping bins
Percent frequency A tabular summary of data showing the
distribution percentage of data values in each of
several non-overlapping bins
Skewness A measure of the lack of symmetry in a
distribution
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 12/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Cumulative frequency A tabular summary of quantitative data
distribution showing the number of data values that
are less than equal to the upper class
limit of each bin
Mean (Arithmetic Mean) A measure of central location computed
by summing the data values and dividing
by the number of observations
Median A measure of central location provided
by the value in the middle when the data
are arranged in ascending order
Mode A measure of location, defined as the
value that occurs with greatest
frequency
Geometric Mean A measure of location that is calculated
by finding the nth root of the product of
n values
Range A measure of variability, defined to be
the largest value minus the smallest value
Variance A measure of variability based on the
squared deviations of the data values
about the mean
Standard deviation A measure of variability computed by
taking the positive square root of the
variance
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 13/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Percentile A value such that approximately p
percent of the observations have values
les than the pth percentile; hence,
approximately (100p) percent of the
observations have values greater than
the pth percentile. The 50th percentile is
the median
Quartile The 25th, 50th, 75th percentiles, referred
to as the first quartile, the second
quartile (median), and third quartile,
respectively. The quartiles can be used to
divide a data set into four parts, with
each part containing approximately 25
percent of the data
Interquartile range The difference between the third and
first quartiles
Covariance A measure of linear association between
two variables. Positive values indicate a
positive relationship; negative values
indicate a negative relationship
YOU MIGHT ALSO LIKE...
STUDY GUIDE
Academic MSIS Chapter 6.2
Word Lists - 46 terms
AWL Sublists
10 sets
giflingua $12.99 arieldiane7
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 14/15
29/08/2019 Data Science Terminology Flashcards | Quizlet
Chapter 6.2 Learnsmart 6.2 BIS
61 terms 29 terms
rreb32 emmalucky
1/3
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 15/15