Enterprise Data Science: Smarter Decisions with Big Data
By Vidhur Gupta
()
About this ebook
Enterprise Data Science: Smarter Decisions with Big Data offers a comprehensive guide to leveraging data science for actionable insights in enterprises. We explore the core principles and contemporary approaches to handling large volumes of data, emphasizing the entire data lifecycle. The book compares data science to business intelligence, highlighting their different methodologies and applications.
We delve into the emerging trends in data science, showcasing how various organizations are adapting to these technologies. Topics include the integration of artificial intelligence, practical implementation of data science, and the use of modern tools like the Hadoop system. Each chapter is thoroughly revised and updated, featuring eye-catching diagrams, charts, and tables for better understanding.
Designed for accessibility, this book caters to both beginners and experienced data scientists, providing a user-friendly layout and practical insights into the evolving field of data science.
Read more from Vidhur Gupta
Systems for Enterprise Resource Planning Rating: 0 out of 5 stars0 ratingsSocial Media Data Mining: Insights and Strategies Rating: 0 out of 5 stars0 ratings
Related to Enterprise Data Science
Related ebooks
Big Data and Data Science: Analytics for the Future Rating: 0 out of 5 stars0 ratingsArchitecting Big Data & Analytics Solutions - Integrated with IoT & Cloud Rating: 5 out of 5 stars5/5A.I: The Path towards Logical and Rational Agents: Thinking Machines Rating: 4 out of 5 stars4/5A Technical Excellence Framework for Innovative Digital Transformation Leadership Rating: 5 out of 5 stars5/5Business Models in Emerging Technologies: Data Science, AI, and Blockchain Rating: 0 out of 5 stars0 ratingsGetting Data Science Done: Managing Projects From Ideas to Products Rating: 0 out of 5 stars0 ratingsMastering Data Science: A Comprehensive Guide to Techniques and Applications Rating: 0 out of 5 stars0 ratingsData Engineering Best Practices: Architect robust and cost-effective data solutions in the cloud era Rating: 0 out of 5 stars0 ratingsBig Data Strategies for Modern Businesses Rating: 0 out of 5 stars0 ratingsBigData Analytics: Solution Or Resolution? Rating: 3 out of 5 stars3/5Hands-on Cloud Analytics with Microsoft Azure Stack Rating: 0 out of 5 stars0 ratingsMarkov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python Rating: 2 out of 5 stars2/5Kranti Nation: India and the Fourth Industrial Revolution Rating: 0 out of 5 stars0 ratings"Big Data Science" Basic Concepts and Applications Rating: 0 out of 5 stars0 ratingsData Science, AI, and Blockchain: Integrated Approaches Rating: 0 out of 5 stars0 ratingsDeep Learning For Dummies Rating: 0 out of 5 stars0 ratingsThe Future of IoT: Leveraging the Shift to a Data Centric World Rating: 1 out of 5 stars1/5Data-Centric Machine Learning with Python: The ultimate guide to engineering and deploying high-quality models based on good data Rating: 0 out of 5 stars0 ratingsWhat Is Data Analytics? A Complete Guide For Beginners Rating: 0 out of 5 stars0 ratingsAI Fundamentals for Business Leaders: Up to Date with Generative AI: Byte-Sized Learning Series, #1 Rating: 0 out of 5 stars0 ratingsData Analytics. Fast Overview. Rating: 3 out of 5 stars3/5Understanding Enterprise AI: fundae University AI, #1 Rating: 0 out of 5 stars0 ratingsData Science For Dummies Rating: 5 out of 5 stars5/5Unveiling Insights: Mastering Data Mining and Knowledge Discovery in the Digital Age: O6.0 TRANSFORM DATA Rating: 0 out of 5 stars0 ratingsData Science Essentials: Machine Learning and Natural Language Processing Rating: 0 out of 5 stars0 ratings
Computers For You
Elon Musk Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsUX/UI Design Playbook Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Technical Writing For Dummies Rating: 0 out of 5 stars0 ratingsQuantum Computing For Dummies Rating: 3 out of 5 stars3/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsThe Insider's Guide to Technical Writing Rating: 0 out of 5 stars0 ratingsComputer Science I Essentials Rating: 5 out of 5 stars5/5Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5
Reviews for Enterprise Data Science
0 ratings0 reviews
Book preview
Enterprise Data Science - Vidhur Gupta
Enterprise Data Science Smarter Decisions with Big Data
Enterprise Data Science Smarter Decisions with Big Data
Vidhur Gupta
Enterprise Data Science
Smarter Decisions with Big Data
Vidhur Gupta
ISBN - 9789361527357
COPYRIGHT © 2025 by Educohack Press. All rights reserved.
This work is protected by copyright, and all rights are reserved by the Publisher. This includes, but is not limited to, the rights to translate, reprint, reproduce, broadcast, electronically store or retrieve, and adapt the work using any methodology, whether currently known or developed in the future.
The use of general descriptive names, registered names, trademarks, service marks, or similar designations in this publication does not imply that such terms are exempt from applicable protective laws and regulations or that they are available for unrestricted use.
The Publisher, authors, and editors have taken great care to ensure the accuracy and reliability of the information presented in this publication at the time of its release. However, no explicit or implied guarantees are provided regarding the accuracy, completeness, or suitability of the content for any particular purpose.
If you identify any errors or omissions, please notify us promptly at "[email protected] &
[email protected]" We deeply value your feedback and will take appropriate corrective actions.
The Publisher remains neutral concerning jurisdictional claims in published maps and institutional affiliations.
Published by Educohack Press, House No. 537, Delhi- 110042, INDIA
Email: [email protected] & [email protected]
Cover design by Team EDUCOHACK
Preface
The book is written to keep pace with newer development in data science for enterprises and to cater to the contemporary needs of users. With the advent of the Internet, and later mobile devices and IoT, it became possible for private companies to truly use data at scale, building massive stores of consumer data based on the growing number of touchpoints they now shared with their customers. The world is firmly in the age of big data. As a result, enterprises are scrambling to integrate capabilities that can address advanced analytics such as artificial intelligence and machine learning to best leverage their data.
The need to draw out insights to improve business performance in the marketplace is nothing less than mandatory. As a result, recent data management concepts such as the data lake have emerged to help enterprises store and manage data. In many ways, the data lake was a stark contrast to its forerunner, the enterprise data warehouse. Typically, the EDW accepted data that had already been deemed useful, and its content was organized in a highly systematic way. When misused, a data lake serves as nothing more than a hoarding ground for terabytes and petabytes of unstructured and unprocessed data. Much of it is never to be used. However, a data lake can be meaningfully leveraged to benefit advanced analytics and machine learning models.
Analysis reveals that the higher failure rate for data lakes and big data initiatives has been attributed not to the technology itself but to how the technologists have applied it. For example, it often happens that a department within an organization needs a repository for its data, but its requirements are not satisfied by previous data storage efforts. So instead of attempting to reform or update older data warehouses or lakes, the department creates a new data store. The result is an assortment of data storage solutions that don't always play well together, resulting in lost opportunities for data analysis.
Obviously, new technologies can provide many tangible benefits, but those benefits cannot be realized unless the technologies are deployed and managed with care. Unlike designing a building as in traditional architecture, information architecture is not a set-it-and-forget-it prospect. While an organization can control how data is ingested, your organization can't always control how the data it needs changes over time. Organizations tend to be fragile in that they can break when circumstances change. Only flexible, adaptive information architectures can adjust to new environmental conditions. Designing and deploying solutions against a moving target is difficult, but the challenge is not impossible.
The glib assertion that garbage in will equal garbage out is treated as being pass by many IT professionals. While, in truth, garbage data has plagued analytics and decision-making for decades, mismanaged data and inconsistent representations will remain a red flag for each AI project you undertake. The level of data quality demanded by machine learning and deep learning can be significant. Like a coin with two sides, low data quality can have two separate and equally devastating impacts. On the one hand, low-quality data associated with historical data can distort the training of a predictive model. On the other, new data can distort the model and negatively impact decision-making. As a sharable resource, data is exposed across your organization through layers of services that can behave like a virus when the level of data quality is poor—unilaterally affecting all those who touch the data. Therefore, information architecture for artificial intelligence must mitigate traditional issues associated with data quality, foster data movement, and, when necessary, provide isolation.
The purpose of this book is to provide you with an understanding of how the enterprise must approach the work of building an information architecture to make way for successful, sustainable, and scalable AI deployments. The book includes a structured framework and advice that is practical and actionable toward implementing an information architecture that's equipped to capitalize on the benefits of AI technologies.
Key Features of This Book are as Follows:
●Thorough Updating: All the chapters and topics have undergone thorough revision and updating of various aspects. At the same time, most of the newer information has been inserted between the lines. In doing so, the basic accepted style of the book is simple, easy-to-understand, and reproducible of the subject matter, and emphasis on clarity and accuracy has not been changed.
●More and new figures/tables: There are several newer figures and tables in this book. All figures with proper illustrations have been placed alongside the corresponding link, respectively, enhancing the understanding of the subject for beginners in data science.
●Summary and Inquiries: Throughout the book, a unique summary of the topic has been placed at the end of every topic. The inquiries have also been placed for answering the question and a quick revision of the topics in a short time. The student can revise the entire subject quickly, turning pages of the book. The summaries are short to quickly revise the topic without searching for them, making the book truly user-friendly.
What You'll Learn
We'll begin in Chapter 1, Data Science,
with a discussion of data science with an illustration of various algorithms. Chapter 2, Stepping into AI,
with a discussion of the building AI, an illustrative device developed by IBM to demonstrate the steps or rungs an organization must climb to realize sustainable benefits with the use of AI. From there, Chapters 3, Forming Organizations Using AI
and Chapter 4, Working with Data and AI,
cover an array of considerations data scientists and IT leaders must be aware of as they traverse their way up the ladder. Finally, in Chapter 5, Smarter Learning Software,
and Chapter 6, Looking Forward to Analytics,
we'll explore some recent history: data warehouses and how they've given way to data lakes. Next, we'll discuss how data lakes must be designed in terms of topography and topology. This will flow into a deeper dive into data ingestion, governance, storage, processing, access, management, and monitoring.
In Chapter 7, Optimizing Disciplines on AI Ladder,
we'll discuss how DevOps, DataOps, and MLOps can enable an organization to better use its data in real-time. In Chapter 8, Value Edition and Maximizing the use of data,
we'll delve into the elements of data governance and integrated data management. We'll cover the data value chain and the need for data to be accessible and discoverable for the data scientist to determine the data's value. Chapter 9, Statistical analysis for valuing data,
introduces different approaches for data access, as different roles within the organization will need to interact with data in different ways. The chapter also furthers the discussion of data valuation, explaining how statistics can assist in ranking the value of data.
In Chapter 11, Extend the value through data AI,
we'll discuss some things that can go wrong in information architecture and the importance of data literacy across the organization to prevent such issues. Chapter 12, An IA for AI,
will bring everything together with a detailed overview of developing an information architecture for artificial intelligence (IA for AI). This chapter provides practical, actionable steps to bring the preceding theoretical backdrop to bear on real-world information architecture development. Finally, Chapter 13, Modernization in data science,
will bring about the case studies and practical industrial application provided.
Content
01. Data Science
Abstract 1
1.1 Analyzing the Data Science 1
1.2 Lifecycle of Data Science 2
1.3 Tools For Data Science 3
1.4 Types Of Data Science Work 4
1.5 Components of Data Science 5
1.6 Machine Learning in Data Science 7
1.7 Data science and IBM Cloud 9
1.8 Application of Data Science 10
1.9 Summary 13
1.10 Inquiries 13
02. Stepping into AI
Abstract 16
2.1 Building base data for AI 16
2.3 Choosing the Ladder rung by rung 19
2.4 Adapting to Retain Organizational
2.5 Data-Based in Modern Business 20
2.6 Developing AI-centric organization 23
2.7 Summary 23
2.8 Inquiries 24
03. Data Science Organization Using AI
Abstract 26
3.1 Artificial Intelligence cooperating with
3.2 Decision making in AI 28
3.3 Standardizing data and data science 31
3.4 Data science for the enterprise 31
3.5 Facilitating data in a reaction time 34
3.6 Summary 35
3.7 Inquiries 36
04. Working With Data And AI
Abstract 38
4.1 User-friendly data 38
4.2 Data governance 41
4.2 Data Governance 42
4.3 Encapsulation Knowledge 46
4.4 Summary 49
4.5 Inquiries 50
05. Smarter Learning Software
Abstract 52
5.1 Preaching big data imaginary 52
5.2 Powerful data and algorithms 55
5.3 New normal is big data 57
5.4 Data Management for AI 59
5.5 Summary 60
5.6 Inquiries 61
06. Looking Forward to Analytics
Abstract 63
6.1 Need for Organization 63
6.1.2 The raw zone 65
6.2 Data Topologies 69
6.3 Exploring Various Zones 72
6.4 Summary 76
6.5 Inquiries 77
07. Optimizing Disciplines on AI Ladder
Abstract 79
7.1 Operational AI 79
7.2 Time Passage 80
7.3 Create 82
7.4 Execute 83
7.5 Operating the work 85
7.6 Business-driven tools for Software
7.7 Summary 89
7.8 Inquiries 90
08. Value Edition and Maximizing the Use of Data
Abstract 92
8.1 Marching Towards Value Chain 92
8.2 Curation 95
8.3 Socializing the Data 95
8.4 Integrated Data Management 96
8.5 Multi-Tenacy 99
8.6 Summary 100
8.7 Inquiries 101
09. Statistical Analysis For Valuing Data
Abstract 104
9.1 Data Management Through Asset 104
9.2 Inexact Science 106
9.3 Data Inequality Among Users 108
9.4 Accessing the Data in Control 110
9.5 Bottom-Up Approach 111
9.6 Various Industries use Data and AI 111
9.7 Benefits from Statistics 112
9.9 Summary 115
9.10 Inquiries 116
10. Long Term Availability
Abstract 119
10.1 Avoid Hard Coding 120
10.2 Overloading 121
10.3 Locked In 121
10.4 Ownership and Decomposition 123
10.5 Avoiding Changing in Design 125
10.6 Summary 126
10.7 Inquiries 126
11. Extending Value Data Through AI
11.1 Emphasizing the AI
11.2 Polyglot Persistence 133
11.3 Profit in Data Literacy 140
11.4 Skill Sets 144
11.5 Pursuing AI 144
11.6 Creating Metadata 145
11.7 Right Movement to Data 147
11.8 Summary 147
11.9 Inquiries 148
12. An IA for AI
Abstract 152
12.1 Development Effort for AI 153
12.2 Machine Learning Model 153
12.3 Data Drift 157
12.4 Essential elements 158
12.6 Intersections 162
12.7 Interoperability Across Element 164
12.8 Driving Action 168
12.9 Keep It Simple 169
12.10 Organizing Data zones 169
12.11 Possibilities of Open Platforms 170
12.12 Summary 171
12.13 Inquiries 172
13. Data Governance for Creating Trust in Data Science Decision Outcomes
Abstract 175
13.1 Transformation of business 176
13.2 Data Science Decision-Making Outcomes 178
13.2 The Role of Data Governance with Regards to Data Science as a Product of Human Agency 178
13.3 The Role of Data Governance with Regards
13.4 The Role of Data Governance with Regards
13.5 The Role of Data Governance with Regards
13.7 Summary 181
13.8 Inquiries 182
14. Big Data Analytics Creates Business Value in Smart Manufacturing
Abstract 186
14.1 Cyber-Physical System 186
14.2 Big Data Analytics in Smart Manufacturing 188
14.3 Business value of IT frameworks 189
14.4 Science and Technology in Industry 189
14.5 Computing Devices and Internet 190
14.6 Context-aware Mobile computing 191
14.8 Mobile systems and services 191
14.7 Summary 192
14.8 Inquiries 193
15. Modernization of Data Science in AI
Abstract 197
15.1 Case Study 198
15.2 Biomedical Engineer’s Station 199
15.3 AI in Media Platforms 201
15.4 IBM Commercial Process 203
15.5 Hadoop Ecosystem 205
15.6 Image and Speech Recognition 207
15.7 Investing and Financing 208
15.8 Manufacturers using IoT 209
15.9 Telephonic Communication 210
15.10 Summary 212
15.11 Inquiries 213
Glossary 216
Index219
Chapter 1. Data Science
Abstract
Data science is a multidisciplinary approach to extracting actionable insights from the large and increasing volumes of data collected and created by today’s organization. Data preparation can involve cleansing, aggregating, and manipulating it to be ready for specific types of processing. In addition, analysis requires the development and use of algorithms, analytics, and AI models. Thus, data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.
1.1 Analyzing the Data Science
It’s driven by software that combs through data to find patterns within to transform these patterns into predictions that support business decision making; the scientifically designed tests and experiments prediction must be validated accurately. The results should be shared through skillful data visualization tools that make it possible for anyone to see the patterns and understand trends. Data science requires computer science and pure science skill that build a data science. A data scientist must know mathematical modeling, statistics, and the scientific method.
1.2 Lifecycle of Data Science
The data science lifecycle—also called the data science pipeline—includes anywhere from five to sixteen (depending on whom you ask) overlapping, continuing processes. The processes common to just about everyone’s definition of the lifecycle include the following, right from the first step of obtaining data to analysis and result presentation. There are five important things in the data science life cycle.
https://miro.medium.com/max/1400/1*DjIccrMeRWmrC_mCUOGDhw.pngFig 1.1 Lifecycle of data science
1.2.1 Gathering Data
There are certain things to be known for gathering information from data resources. Technical skills in different programming languages to be known. Social media sites such as Facebook and Twitter let their users approach data by connecting with web servers. The most convenient way of gathering data is fetching from the files. Kaggle or preexisting information stored in Tab Separated Values (TSV) or Comma Separated Value (CSV) format can be downloaded from the files. Since these are flat text files, a specific Parser format is needed to read them.
1.2.2 Cleaning Data
The next step is to clean the data, referring to the scrubbing and filtering of data. This procedure requires the conversion of data into a different format. It is necessary for processing and analyzing information. If the files are web-locked, then it is also needed to filter the lines of these files. Moreover, cleaning data also constitute withdrawing and replacing values. In case of missing data sets, the replacement must be done properly since they could look like non-values. Additionally, columns are split, merged, and withdrawn as well.
1.2.3 Exploring data
As data has to be examined before ready to use. In business areas, data scientist has to transform the data that is available into corporate settings. The first thing is to be done an exploration of data. Different data require inspection, such as nominal and ordinal, numerical, and categorical data.
1.2.4 Modeling Data
Modeling has to deal with a few tasks. For example, models can be trained to differentiate via classification, such as mails received as ‘Primary’ and ‘Promotion’ through logistic regressions. Forecasting is also possible through the use of linear regressions. Grouping data to comprehend the logic backing these sections is also an achievable feat. For instance, E-Commerce customers are grouped to understand their behavior on a particular E-Commerce site. This is made possible with hierarchical clustering or with the aid of K-Means, and such clustering algorithms.
Prediction and regression are the main two devices used for classification and identification, forecasting values, and clustering groups.
1.2.5 Interpreting Data
It is one of the most important steps for the data science life cycle. It is the last phase. Generalization ability is the crux of the power of any predictive model. This model explains that it is dependent on capacity, and the future data cannot be seen and is vague.
1.3 Tools For Data Science
To create a model, data scientists must be able to create, build and run code. The most popular programming languages among data scientists are open source tools that include or support prebuilt statistical, machine learning, and graphic capabilities.
1.3.1 R
An open-source programming language and environment for developing statistical computing and graphics, R is the most popular programming language among data scientists. R provides a wide variety of libraries and tools for cleansing and prepping data, creating visualizations, and training and evaluating machine learning and deep learning algorithms. It’s also widely used among data science scholars and researchers.
1.3.2 Python
Python is a general-purpose, object-oriented, high-level programming language that emphasizes code readability through its distinctive, generous use of white space. Several Python libraries support data science tasks, including Numpy for handling large dimensional arrays, Pandas for data manipulation and analysis, and Matplotlib for building data visualizations.
1.4 Types Of Data Science Work
Data science creating paths for many job roles. Due to the over-demanding of data science, the job has different functionality.
1.4.1 Data Analyst
The data analyst means that who performs mining of huge amount of data, Patterning the data, models the data, checking the relationship and trends. At the end of the day, it comes up with the visualization and reporting of problem-solving issues and decision-making.
For becoming an analysis, one has to know mathematics, business modeling, and the basics of statistics. In addition, one should be familiar with the concepts and tools of programming languages.
1.4.2 Data engineer
It is generally an IT worker whose primary work is to prepare data for different analytical, operational users. They build different pipelines to connect different sources of systems. The amount of data an engineer works with varies with the organization, particularly with respect to its size. Data engineer work is to provide transparent relationships and enabling the business to be a trustworthy business decision.
The bigger the company, the more complex architecture analytics it requires with respect to its size.
Data Scientist
Data scientists are a specialist who makes models make predictions and answers key business questions which applies to the statistics and building machine learning models. Data scientists have more depth and expertise in these skills and will also train and optimize machine learning models. Thus, they tackle the problem with immense knowledge experience of advanced statistics and algorithms.
1.5 Components of Data Science
Data Science tutorialFig 1.2: Data science components
1.5.1 Statistics
The essential component of Data Science is statistics. This is the method to collect and innumerate data in large amounts to get useful and meaningful insight. There are two main categories of statistics:
Descriptive statistics:
Descriptive statistics helps to organize data and only focuses on the characteristics of data-providing parameters.
Inferential Statistics:
Inferential statistics generalizes a large data set and applies probability before concluding. It also allows to infer the parameters of the population based on sample stats and build a model on it.
1.5.2 Visualization
Visualization means representing the data in visuals such as maps, graphs, etc., so that people can understand it easily. It makes it easy to access a vast amount of data. The main goal of data visualization is to make it easier to identify patterns, trends, and outliers in large data sets. The main benefit of data visualization is that it can understand the information quickly, helps to improve the insights, and quickly make a decision.
It increases understanding of the next level and to stabilizes the performance. It also provides an easy distribution of information that increases the opportunity to share insights with everyone. It also helps to find the information quickly achieve success with higher speed and fewer mistakes.
1.5.3 Data Engineering
Data engineering involves acquiring, storing, retrieving, and transforming data. The key to understanding the data depends on the engineering part. First, engineer design and build things. Data engineers should approach the design which builds pipelines that transform and transport data into a format, and it reaches the data scientist or other end users in a highly usable state. These pipelines must take data from many different sources and collect them into a single warehouse representing the data uniformly as a single source of truth.
1.5.4 Advance Computing
Advance computing has many functions. It involves designing, writing, debugging, and maintaining the source code of computer programs. In addition, advanced computing capabilities are used to handle a growing range of challenging science and engineering problems, many of which are compute and data-intensive.
Data Science tutorialFig 1.3: Data science designing
1.6 Machine Learning in Data Science
To become a data scientist, one should also be aware of machine learning and its algorithms, as, in data science, there are various machine learning algorithms that are broadly being used. Following are the name of some machine learning algorithms used in data science:
•Regression
•Decision tree
•Clustering
•Principal component analysis
•Support vector machines
•Naive Bayes
•Artificial neural network
•Apriori
1.6.1 Linear Regression Algorithm
This Linear Regression algorithm is a popular technique on a machine learning algorithm. This algorithm is regression-based. This is a method where targets the model value based on independent variables. This algorithm is mostly used in forecasting and predictions. Since it shows the linear relationship between input and output variables, hence it is called linear regression.
Data Science tutorial