We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10
Data science
1. Artificial Intelligence is a technology which completely
depends on data. It is the data which is fed into the machine which makes it intelligent. 2. depending upon the type of data AI can be classified into three broad domains: DATA SCIENCE : working around numeric and alpha numeric data COMPUTER VISION : working around image & visual data NATURAL LANGUAGE PROCESSING : working around textual and speech based data 3. Each domain has its own type of data which gets fed into the machine and hence has its own way of working around it. 4. Talking about Data Sciences, it is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyse actual phenomena with data. 5. It employs techniques and theories drawn from many fields within the context of Mathematics, Statistics, Computer Science, and Information Science. 6. Applications of Data Sciences : Data Science is not a new field. Data Sciences majorly work around analysing the data and when it comes to AI, the analysis helps in making the machine intelligent enough to perform tasks by itself. There exist various applications of Data Science in today’s world. Some of them are , 7. Fraud and Risk Detection*: The earliest applications of data science were in Finance. Companies were fed up of bad debts and losses every year. However, they had a lot of data which use to get collected during the initial paperwork while sanctioning loans. They decided to bring in data scientists in order to rescue them from losses. Over the years, banking companies learned to divide and conquer data via customer profiling, past expenditures, and other essential variables to analyse the probabilities of risk and default. Moreover, it also helped them to push their banking products based on customer’s purchasing power. 8. Genetics & Genomics*: Data Science applications also enable an advanced level of treatment personalization through research in genetics and genomics. The goal is to understand the impact of the DNA on our health and find individual biological connections between genetics, diseases, and drug response. Data science techniques allow integration of different kinds of data with genomic data in disease research, which provides a deeper understanding of genetic issues in reactions to particular drugs and diseases. As soon as we acquire reliable personal genome data, we will achieve a deeper understanding of the human DNA. The advanced genetic risk prediction will be a major step towards more individual care. 9. Internet Search*: When we talk about search engines, we think ‘Google’. But there are many other search engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in the fraction of a second. Considering the fact that Google processes more than 20 petabytes of data every day, had there been no data science, Google wouldn’t have been the ‘Google’ we know today. 10. Targeted Advertising*: If you thought Search would have been the biggest of all data science applications, here is a challenger – the entire digital marketing spectrum. Starting from the display banilrs on various websites to the digital billboards at the airports – almost all of them are decided by using data science algorithms. This is the reason why digital ads have been able to get a much higher CTR (Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behaviour. 11. Website Recommendations: Aren’t we all used to the suggestions about similar products on Amazon? They not only help us find relevant products from billions of products available with them but also add a lot to the user experience. A lot of companies have fervidly used this engine to promote their products in accordance with the user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play, Netflix, LinkedIn, IMDB and many more use this system to improve the user experience. The recommendations are made based on previous search results for a user. 12. Airline Route Planning*: The Airline Industry across the world is known to bear heavy losses. Except for a few airline service providers, companies are struggling to maintain their occupancy ratio and operating profits. With high rise in air-fuel prices and the need to offer heavy discounts to customers, the situation has got worse. It wasn’t long before airline companies started using Data Science to identify the strategic areas of improvements. Now, while using Data Science, the airline companies can: • Predict flight delay • Decide which class of airplanes to buy • Whether to directly land at the destination or take a halt in between (For example, A flight can have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any country.) • Effectively drive customer loyalty programs 13. Data Sciences is a combination of Python and Mathematical concepts like Statistics, Data Analysis, probability, etc. Concepts of Data Science can be used in developing applications around AI as it gives a strong base for data analysis in Python. 14. Data Collection : Data collection is nothing new which has come up in our lives. It has been in our society since ages. Even when people did not have fair knowledge of calculations, records were still maintained in some way or the other to keep an account of relevant things. Data collection is an exercise which does not require even a tiny bit of technological knowledge. But when it comes to analysing the data, it becomes a tedious process for humans as it is all about numbers and alpha-numerical data. 15. That is where Data Science comes into the picture. It not only gives us a clearer idea around the dataset, but also adds value to it by providing deeper and clearer analyses around it. as AI gets incorporated in the process, predictions and suggestions by the machine become possible on the same. 16. For the data domain-based projects, majorly the type of data used is in numerical or alpha-numerical format and such datasets are curated in the form of tables. 17. Such databases are very commonly found in any institution for record maintenance and other purposes. Some examples of datasets which you must already be aware of are: Banks - Databases of loans issued, account holder, locker owners, employee registrations, bank visitors, etc. Atm machines - Usage details per day, cash denominations transaction details, visitor details, etc. Movie theatres - Movie details, tickets sold offline, tickets sold online, refreshment purchases, etc. 18. Sources of Data : There exist various sources of data from where we can collect any type of data required and the data collection process can be categorised in two ways: Offline and Online. 19. While accessing data from any of the data sources, following points should be kept in mind: 1. Data which is available for public usage only should be taken up. 2. Personal datasets should only be used with the consent of the owner 3. One should never breach someone’s privacy to collect data. 4. Data should only be taken form reliable sources as the data collected from random sources can be wrong or unusable. 5. Reliable sources of data ensure the authenticity of data which helps in proper training of the AI model.
20. Types of Data : For Data Science, usually the data is
collected in the form of tables. These tabular datasets can be stored in different formats. 21. Some of the commonly used formats are: 1. CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each line of this file is a data record and reach record consists of one or more fields which are separated by commas. Since the values of records are separated by a comma, hence they are known as CSV files. 2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which is used for accounting and recording data using rows and columns into which information can be entered. Microsoft excel is a program which helps in creating spreadsheets. 3. SQL: SQL is a programming language also known as Structured Query Language. It is a domain specific language used in programming and is designed for managing data held in different kinds of DBMS (Database Management System) It is particularly useful in handling structured data.
22. Data Access : After collecting the data, to be able to
use it for programming purposes, we should know how to access the same in a Python code. To make our lives easier, there exist various Python packages which help us in accessing structured data (in tabular form) inside the code. # NumPy - which stands for Numerical Python, is the fundamental package for Mathematical and logical operations on arrays in Python. It is a commonly used package when it comes to working around numbers. NumPy gives a wide range of arithmetic operations around numbers giving us an easier approach in working with them. NumPy also works with arrays, which is nothing but a homogenous collection of Data. # An array - is nothing but a set of multiple values which are of same datatype. They can be numbers, characters, booleans, etc. but only one datatype can be accessed through an array. In NumPy, the arrays used are known as ND-arrays (N- Dimensional Arrays) as NumPy comes with a feature of creating n-dimensional arrays in Python. # Pandas - is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. Pandas is well suited for many different kinds of data: • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet • Ordered and unordered (not necessarily fixed- frequency) time series data. • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels • Any other form of observational / statistical data sets. The data actually need not be labelled at all to be placed into a Pandas data structure. 23. The two primary data structures of Pandas Series (1-dimensional) and DataFrame (2-dimensional) handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. Pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries. 24. few of the things that pandas does well: • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data • Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets • Intuitive merging and joining data sets • Flexible reshaping and pivoting of data sets 25. Matplotlib - is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multiplatform data visualization library built on NumPy arrays. One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib comes with a wide variety of plots. Plots helps to understand trends, patterns, and to make correlations. They’re typically instruments for reasoning about quantitative information. Some types of graphs that we can make with this package are listed below: Not just plotting, but you can also modify your plots the way you wish. You can stylise them and make them more descriptive and communicable. These packages help us in accessing the datasets we have and also in exploring them to develop a better understanding of them.