0% found this document useful (0 votes)
159 views

Digital Fluency Notes

The document discusses digital fluency and provides information about data science, big data analytics, and related topics. It defines data science and big data, discusses their importance and applications, describes the data analysis process, and lists sources and tools for big data.

Uploaded by

anushatanga7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views

Digital Fluency Notes

The document discusses digital fluency and provides information about data science, big data analytics, and related topics. It defines data science and big data, discusses their importance and applications, describes the data analysis process, and lists sources and tools for big data.

Uploaded by

anushatanga7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Digital fluency

2. DATABASE MANAGEMENT FOR DATA SCIENCE, BIG DATA ANALYTICS

Q 1 What is data science?

 Data science is an interdisciplinary field that uses scientific methods, processes,


algorithms and systems to extract knowledge and insights from noisy, structured and
unstructured data and apply knowledge and actionable insights from data across a broad
range of application domains

Q 2 What is the need for Data Science? / What is the importance of Data
Science?

 Data science is the ability to process and interpret data. This enables
companies to make informed decisions around growth, optimization, and
performance.

 Data Science enables companies to efficiently understand gigantic data from


multiple sources and derive valuable insights to make smarter data-driven decisions

 Data Science enables enterprises to measure, track, and record performance metrics
for facilitating enterprise-wide enhanced decision making

Q 3 What is Data Science useful for? / Write the applications data Science

Uses /applications data Science are:


1 Banking
2 Finance
3 Manufacturing
4 Transport
5 Healthcare
6 E-commerce
1 Banking
 With Data Science, banks can manage their resources efficiently, banks
can make smarter decisions through fraud detection, management of
customer data
 Banks assess the customer lifetime value that allows them to monitor the
number of customers that they have
 Banks have the ability to risk modeling through data science through
which they can assess their overall performance
2 Finance
 Data Science has played a key role in automating various financial tasks
 Finance industries uses data science for risk analytics in order to carry out
strategic decisions for the company
 Finance institutions uses machine learning for predictive analytics
 It allows the companies to predict customer lifetime value and their stock
market moves
3 Manufacturing
 Data Science is used in manufacturing industries for optimizing
production, reducing costs and boosting the profits
 Data Science helps the companies to predict potential problems, monitor
systems and analyze the continuous stream of data
4 Transport
 In the transportation sector, data Science helps in making safer driving
environments for the drivers
 It plays a key role in optimizing vehicle performance and adding greater
autonomy to the drivers
 One can create better logistical routes with the help of data science
5 Healthcare
 The various industries in health-care making use of data science are:
o Medical Image Analysis
o Genetics and Genomics
o Drug Discovery
o Predictive Modeling for Diagnosis
o Health bots or virtual assistants
6 E-commerce

 For identifying a potential customer base, data science is being heavily


utilized
 Usage of predictive analytics for forecasting the goods and services by
data science
 Data Science is used for identifying styles of popular products and
predicting their trends
 With data science, companies are optimizing their pricing structures for
their consumers

Q 4 What is big data?


 Big data refers to data sets that are too large and complex for traditional
data processing and data management applications
 Big data is a collection of data that is huge in volume, yet growing
exponentially with time
 Big data is a field that deal with data sets that are too large or complex to
be dealt with by traditional data-processing application software
Q 5 What is big data analytics?
 The process of analysis of large volumes of diverse data sets, using
advanced analytic techniques is called as big data analytics

 Big data analytics is a process used to extract meaningful insights, such


as hidden patterns, unknown correlations, market trends, and customer
preferences

Q 6 What are the benefits & advantages big data analytics?


1. Risk Management

 Big data analytics is used in banking companies to identify fraudulent


activities and discrepancies

2. Product Development and Innovations

 Big data analytics is used in jet engine manufacturing to analyze the


efficiency of the engine designed

3 Quicker and better decision making within organizations

 Big data analytics is used in Organizations to make strategic decisions


 Organizations will analyze several factors such as population,
demographics, accessibility of the location, etc.

4. Improve customer experience

 Big data analysis is used in Airlines to improve customer experiences


 It monitors the tweets to find out their customers’ experience regarding
their journeys, delays, and so on

5. Complex Supplier Networks

 Through big data, companies provide supplier networks called B2B


communities
 Big data analytics allows suppliers to escape the constraints they encounter

6. Focused and Targeted Campaigns


 Big data analytics aids businesses in executing a sophisticated analysis of
customer trends
Q 7 What is data analysis?
 Data Analysis is a process of collecting, transforming, cleaning, and
modeling data with the goal of discovering the required information

Q 8 Explain the Data analytics process / process of Data analytics

Data Analysis Process consists of the following phases / steps

1 Data Requirements Specification


2 Data Collection
3 Data Processing
4 Data Cleaning
5 Data Analysis
6 Communication

1.Data Requirements Specification

 The data required for analysis is based on a question or an experiment


 Based on the requirements of those directing the analysis, the data
necessary as inputs to the analysis is identified

2.Data Collection

 Data Collection is the process of gathering information on targeted


variables identified as data requirements
 Data is collected from various sources ranging from organizational
databases to the information in web pages
3.Data Processing

 The data that is collected must be processed or organized for analysis


 This includes structuring the data as required for the relevant analysis
tools
4.Data Cleaning

 Data Cleaning is the process of preventing and correcting these errors


 There are several types of Data Cleaning that depend on the type of data

5.Data Analysis

 Data that is processed, organized and cleaned would be ready for the
analysis
 Various data analysis techniques are used to understand, interpret, and
derive conclusions based on the requirements
 Data Visualization is used to examine the data in graphical format to
obtain additional insight regarding the messages within the data.

6.Communication

 The results of the data analysis are to be reported in a format as required


by the users to support their decisions and further action
 The data analysts use data visualization techniques that helps in communicating
the message clearly and efficiently to the users

Q 9 What are the sources of big data?


Sources of big data are:
1.Social media
2.Cloud
3.Web
4.IoT
5.Databases
6.Telematics
7. Business transactions
8. Electronic Files
9. Social networks
10. Sensors
1.Social media
 Media is the most popular source of big data, as it provides valuable insights on
consumer preferences and changing trends
 Media includes social media and interactive platforms, like Google, Facebook,
Twitter, YouTube, Instagram, as well as generic media like images, videos,
audios will create data

2.Cloud
 Cloud storage accommodates structured and unstructured data and provides
business with real-time information and on-demand insights
 Cloud makes an efficient and economical big data source

3.Web
 The public web constitutes big data that is widespread and easily accessible
 Data on the Web or ‘Internet’ is commonly available to individuals and
companies
 Web services such as Wikipedia provide free and quick informational insights to
everyone

4.IOT
 Data created from IOT constitute a valuable source of big data
 This data is usually generated from the sensors that are connected to electronic
devices
 With IOT, data can now be sourced from medical devices, vehicular processes
etc.

5.Databases
 Businesses uses databases to acquire relevant big data
 Popular databases include a big data sources are MS Access, DB2,
Oracle, SQL, and Amazon Simple etc.
6.Telematics
 GPS in the vehicle that helps in monitoring movement of the vehicle to
shorten the path for a destination to cut fuel, time consumption
 This system creates huge data of vehicle position and movement
7. Business transactions
 Data produced as a result of business activities can be recorded in
databases is the big data source
 In e-commerce transaction, banking, and the stock market, lots of records
stored and they are sources of big data
 Payment through credit card and debit card are big data source
8. Electronic Files
 Documents produced are stored as electronic files like internet pages, videos,
audios, pdf files, etc. are big data source
9. Social networks
 Data produced by human interactions through a network like internet is
big data source
 The most common is the data produced in social networks
10. Sensors
 Sensor placed in various place of the city that gathers data on
temperature, humidity etc.
 A camera placed beside road gather information about traffic condition, it
creates data
 Security camera placed in a sensitive area like airport, railway station,
shopping mall create a lot of data
Q 10 What are the tools and technologies used in big data?
Big data tools and technologies are:
1.Apache Storm 2. MongoDB 3. Cassandra 4. Cloudera 5. OpenRefine 6.
Apache Spark 7. Apache Hive 8. Apache Mahout 9 Apache Pig 10 Apache
Thrift 11 Apache Zookeeper 12 NoSQL 13 Flink 14 Kafka 15 Tableau

Q 11 Explain Uses recommendation-based system (RBS)


Uses recommendation-based system (RBS)
 Recommendation Systems (RS) have been widely used in many Internet
activities and their importance is increasing due to the "Information
Overload" problem arising from Internet
 It provides the facility to understand a person's taste and find new,
desirable content for them automatically based on the pattern between
their likes and rating of different item
 It can help the user to find the right product
 It can increase the user engagement
 It helps the item providers to deliver the items to the right user
 It helps to make the contents more personalized
Q 12 How Amazon uses Data Science?
UTUF
1)Uses recommendation based system(RBS) -
 Through this technology, it gathers data from their customers
 RBS seeks and predicts the “rating” or “preference” a user would give to
an item
 Data science helps Amazon to understand the needs and instead of the
customers searching for similar products

2)Tracking the user to understand the mindset

 It has track of almost everything- starting from your needs, what you
have searched, what you will need in future, your personal details
 It also keeps a check on the feedback habits and studies that as well.
(3) Understands the technicalities(habits)
 Amazon tries to understand the habits and the time one devotes to each
platform for browsing
4)Faster process of shipping
 Amazon has made the process of shipping a lot easier
 Through the help of big data analytics insights, it has reached through a
position where it can predict who will order what and when. This has
increased the experience of online shopping
Q 13 Database Management for Data Science
Data
 Data are a set of values of qualitative or quantitative variables
about one or more persons or objects
Data base
 Database is an organized collection of structured information, or data,
which stored in a computer system
 Database is defined as a structured set of data held in a computer’s
memory or on the cloud that is accessible in various ways
 Example: A student database in a college, a company database
Database Management Systems (DBMS)
 Database management system is a software which is used to manage the
database
 DBMS refer to the technology solution used to optimize and
manage
the storage and retrieval of data from databases
 DBMS provides an interface to perform various operations like database
creation, storing data in it, updating data, creating a table in the database
Types of database
1.Relational database
2.Centralized database
3.Distributed database
4. NoSQL database
5. Cloud database
6. Object-oriented database
7. Hierarchical database
8. Network Databases
1.Relational database
 Relational database is based on the relational data model, which
stores data in the form of rows(tuple) and columns(attributes), and
together forms a table(relation)
 A relational database uses SQL for storing, manipulating, as well
as maintaining the data
 Examples of Relational databases are: MySQL, Microsoft SQL
Server, Oracle, DB2, PostgreSQL etc.
2.Centralized database
 It is the type of database that stores data at a centralized database
system
 It helps the users to access the stored data from different locations
through several applications
 Example: Central database library in a college
3.Distributed database
 It is the type of database in which the data is distributed among
different database systems of an organization
 These database systems are connected via communication links
 Example: Apache Cassandra, HBase, Ignite, etc.
4. NoSQL Database
 NoSQL is Non-SQL/Not Only SQL; it is a type of database that is
used for storing a wide range of data sets
 It stores data not only in tabular form but in several different ways
 Example: RabbitMQ, MongoDB, JanusGraph
 It is divided into four types:
1. Key-value storage
2. Document-oriented database
3. Graph databases
4. Wide-column stores
5.Cloud database
 It is a type of database in which the data is stored in a virtual
environment and executes over the cloud computing platform
 It provides users with various cloud computing services (SaaS,
PaaS, IaaS, etc.) for accessing the database
 Example: PhonixNAP, Google Cloud SQL, Microsoft Azure
6.Object-oriented database
 It is a type of database that uses the object-based data model
approach for storing data in the database system
7.Hierarchical Databases
 It is a type of database that stores data in the form of parent-
children relationship nodes
 It organizes data in a tree-like structure
8.Network Databases
 It is a type of database that follows the network data model
 the representation of data is in the form of nodes connected via
links between them
Q14. Why do we use databases? / Advantages of database
(DBMS)
 Data entry, update, read and delete cost is reduced
 Reduced data redundancy
 Data sharing is made easy
 Data inconsistency is reduced
 Decision making with data is improved
 Manages large amount of data
 Accurate
 Easy to research the data
 Easy to update the data
 Improved data security
 Better data integration
 Greater data independence

You might also like