0% found this document useful (0 votes)
13 views10 pages

Data science

Uploaded by

Padma Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Data science

Uploaded by

Padma Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data science

1. Artificial Intelligence is a technology which completely


depends on data. It is the data which is fed into the
machine which makes it intelligent.
2. depending upon the type of data AI can be classified into
three broad domains:
DATA SCIENCE : working around numeric and alpha
numeric data
COMPUTER VISION : working around image & visual data
NATURAL LANGUAGE PROCESSING : working around
textual and speech based data
3. Each domain has its own type of data which gets fed into
the machine and hence has its own way of working
around it.
4. Talking about Data Sciences, it is a concept to unify
statistics, data analysis, machine learning and their
related methods in order to understand and analyse
actual phenomena with data.
5. It employs techniques and theories drawn from many
fields within the context of Mathematics, Statistics,
Computer Science, and Information Science.
6. Applications of Data Sciences :
Data Science is not a new field. Data Sciences majorly
work around analysing the data and when it comes to AI,
the analysis helps in making the machine intelligent
enough to perform tasks by itself.
There exist various applications of Data Science in
today’s world. Some of them are ,
7. Fraud and Risk Detection*: The earliest applications of
data science were in Finance. Companies were fed up of
bad debts and losses every year.
However, they had a lot of data which use to get
collected during the initial paperwork while sanctioning
loans.
They decided to bring in data scientists in order to
rescue them from losses.
Over the years, banking companies learned to divide and
conquer data via customer profiling, past expenditures,
and other essential variables to analyse the probabilities
of risk and default.
Moreover, it also helped them to push their banking
products based on customer’s purchasing power.
8. Genetics & Genomics*: Data Science applications also
enable an advanced level of treatment personalization
through research in genetics and genomics.
The goal is to understand the impact of the DNA on our
health and find individual biological connections
between genetics, diseases, and drug response.
Data science techniques allow integration of different
kinds of data with genomic data in disease research,
which provides a deeper understanding of genetic issues
in reactions to particular drugs and diseases.
As soon as we acquire reliable personal genome data,
we will achieve a deeper understanding of the human
DNA.
The advanced genetic risk prediction will be a major
step towards more individual care.
9. Internet Search*: When we talk about search engines,
we think ‘Google’. But there are many other search
engines like Yahoo, Bing, Ask, AOL, and so on.
All these search engines (including Google) make use of
data science algorithms to deliver the best result for our
searched query in the fraction of a second.
Considering the fact that Google processes more than 20
petabytes of data every day, had there been no data
science, Google wouldn’t have been the ‘Google’ we
know today.
10. Targeted Advertising*: If you thought Search would
have been the biggest of all data science applications,
here is a challenger – the entire digital marketing
spectrum.
Starting from the display banilrs on various websites to
the digital billboards at the airports – almost all of them
are decided by using data science algorithms.
This is the reason why digital ads have been able to get a
much higher CTR (Call-Through Rate) than traditional
advertisements.
They can be targeted based on a user’s past behaviour.
11. Website Recommendations: Aren’t we all used to
the suggestions about similar products on Amazon?
They not only help us find relevant products from
billions of products available with them but also add a
lot to the user experience.
A lot of companies have fervidly used this engine to
promote their products in accordance with the user’s
interest and relevance of information.
Internet giants like Amazon, Twitter, Google Play, Netflix,
LinkedIn, IMDB and many more use this system to
improve the user experience.
The recommendations are made based on previous
search results for a user.
12. Airline Route Planning*: The Airline Industry across
the world is known to bear heavy losses.
Except for a few airline service providers, companies are
struggling to maintain their occupancy ratio and
operating profits.
With high rise in air-fuel prices and the need to offer
heavy discounts to customers, the situation has got
worse.
It wasn’t long before airline companies started using
Data Science to identify the strategic areas of
improvements.
Now, while using Data Science, the airline companies
can:
• Predict flight delay
• Decide which class of airplanes to buy
• Whether to directly land at the destination or take a
halt in between (For example, A flight can have a direct
route from New Delhi to New York. Alternatively, it can
also choose to halt in any country.)
• Effectively drive customer loyalty programs
13. Data Sciences is a combination of Python and
Mathematical concepts like Statistics, Data Analysis,
probability, etc. Concepts of Data Science can be used in
developing applications around AI as it gives a strong
base for data analysis in Python.
14. Data Collection : Data collection is nothing new
which has come up in our lives. It has been in our society
since ages.
Even when people did not have fair knowledge of
calculations, records were still maintained in some way
or the other to keep an account of relevant things.
Data collection is an exercise which does not require
even a tiny bit of technological knowledge.
But when it comes to analysing the data, it becomes a
tedious process for humans as it is all about numbers
and alpha-numerical data.
15. That is where Data Science comes into the picture.
It not only gives us a clearer idea around the dataset,
but also adds value to it by providing deeper and clearer
analyses around it.
as AI gets incorporated in the process, predictions and
suggestions by the machine become possible on the
same.
16. For the data domain-based projects, majorly the
type of data used is in numerical or alpha-numerical
format and such datasets are curated in the form of
tables.
17. Such databases are very commonly found in any
institution for record maintenance and other purposes.
Some examples of datasets which you must already be
aware of are:
 Banks - Databases of loans issued, account holder,
locker owners, employee registrations, bank
visitors, etc.
 Atm machines - Usage details per day, cash
denominations transaction details, visitor details,
etc.
 Movie theatres - Movie details, tickets sold offline,
tickets sold online, refreshment purchases, etc.
18. Sources of Data : There exist various sources of data
from where we can collect any type of data required and
the data collection process can be categorised in two
ways: Offline and Online.
19. While accessing data from any of the data sources,
following points should be kept in mind:
1. Data which is available for public usage only should
be taken up.
2. Personal datasets should only be used with the
consent of the owner
3. One should never breach someone’s privacy to collect
data.
4. Data should only be taken form reliable sources as the
data collected from random sources can be wrong or
unusable.
5. Reliable sources of data ensure the authenticity of
data which helps in proper training of the AI model.

20. Types of Data : For Data Science, usually the data is


collected in the form of tables. These tabular datasets
can be stored in different formats.
21. Some of the commonly used formats are:
1. CSV: CSV stands for comma separated values.
It is a simple file format used to store tabular data.
Each line of this file is a data record and reach record
consists of one or more fields which are separated by
commas.
Since the values of records are separated by a
comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a
computer program which is used for accounting and
recording data using rows and columns into which
information can be entered.
Microsoft excel is a program which helps in creating
spreadsheets.
3. SQL: SQL is a programming language also known as
Structured Query Language.
It is a domain specific language used in programming
and is designed for managing data held in different
kinds of DBMS (Database Management System)
It is particularly useful in handling structured data.

22. Data Access : After collecting the data, to be able to


use it for programming purposes, we should know how
to access the same in a Python code.
To make our lives easier, there exist various Python
packages which help us in accessing structured data (in
tabular form) inside the code.
# NumPy - which stands for Numerical Python, is the
fundamental package for Mathematical and logical
operations on arrays in Python. It is a commonly used
package when it comes to working around numbers.
NumPy gives a wide range of arithmetic operations
around numbers giving us an easier approach in working
with them. NumPy also works with arrays, which is
nothing but a homogenous collection of Data.
# An array - is nothing but a set of multiple values which
are of same datatype. They can be numbers, characters,
booleans, etc. but only one datatype can be accessed
through an array.
In NumPy, the arrays used are known as ND-arrays (N-
Dimensional Arrays) as NumPy comes with a feature of
creating n-dimensional arrays in Python.
# Pandas - is a software library written for the Python
programming language for data manipulation and
analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time
series. The name is derived from the term "panel data",
an econometrics term for data sets that include
observations over multiple time periods for the same
individuals.
Pandas is well suited for many different kinds of data:
• Tabular data with heterogeneously-typed columns, as
in an SQL table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-
frequency) time series data.
• Arbitrary matrix data (homogeneously typed or
heterogeneous) with row and column labels
• Any other form of observational / statistical data sets.
The data actually need not be labelled at all to be placed
into a Pandas data structure.
23. The two primary data structures of Pandas
Series (1-dimensional) and DataFrame (2-dimensional)
handle the vast majority of typical use cases in finance,
statistics, social science, and many areas of engineering.
Pandas is built on top of NumPy and is intended to
integrate well within a scientific computing environment
with many other 3rd party libraries.
24. few of the things that pandas does well:
• Easy handling of missing data (represented as NaN) in
floating point as well as non-floating point data
• Size mutability: columns can be inserted and deleted
from DataFrame and higher dimensional objects
• Automatic and explicit data alignment: objects can be
explicitly aligned to a set of labels, or the user can simply
ignore the labels and let Series, DataFrame, etc.
automatically align the data for you in computations
• Intelligent label-based slicing, fancy indexing, and
subsetting of large data sets
• Intuitive merging and joining data sets
• Flexible reshaping and pivoting of data sets
25. Matplotlib - is an amazing visualization library in
Python for 2D plots of arrays.
Matplotlib is a multiplatform data visualization library
built on NumPy arrays.
One of the greatest benefits of visualization is that it
allows us visual access to huge amounts of data in easily
digestible visuals.
Matplotlib comes with a wide variety of plots. Plots
helps to understand trends, patterns, and to make
correlations.
They’re typically instruments for reasoning about
quantitative information.
Some types of graphs that we can make with this
package are listed below: Not just plotting, but you can
also modify your plots the way you wish.
You can stylise them and make them more descriptive
and communicable.
These packages help us in accessing the datasets we
have and also in exploring them to develop a better
understanding of them.

You might also like