Complete Download (Ebook) Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition by Fabio Nelli ISBN 9781484295311, 1484295315 PDF All Chapters
Complete Download (Ebook) Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition by Fabio Nelli ISBN 9781484295311, 1484295315 PDF All Chapters
com
https://ebooknice.com/product/python-data-analytics-with-
pandas-numpy-and-matplotlib-3rd-edition-51978758
OR CLICK HERE
DOWLOAD EBOOK
ebooknice.com
ebooknice.com
ebooknice.com
ebooknice.com
(Ebook) Python Data Analytics: Data Analysis and Science
Using Pandas, Matplotlib and the Python Programming
Language by Nelli Fabio ISBN 9781484209592, 1484209591
https://ebooknice.com/product/python-data-analytics-data-analysis-and-
science-using-pandas-matplotlib-and-the-python-programming-
language-38169124
ebooknice.com
https://ebooknice.com/product/python-data-analysis-numpy-matplotlib-
and-pandas-47505714
ebooknice.com
ebooknice.com
Python Data Analytics
With Pandas, NumPy, and Matplotlib
Third Edition
Fabio Nelli
Python Data Analytics: With Pandas, NumPy, and Matplotlib
Fabio Nelli
Rome, Italy
This book is dedicated to all those who are constantly looking for awareness
Table of Contents
■
■Chapter 1: An Introduction to Data Analysis��������������������������������������������������������� 1
Data Analysis�������������������������������������������������������������������������������������������������������������������� 1
Knowledge Domains of the Data Analyst������������������������������������������������������������������������� 2
Computer Science���������������������������������������������������������������������������������������������������������������������������������� 2
Mathematics and Statistics�������������������������������������������������������������������������������������������������������������������� 3
Machine Learning and Artificial Intelligence������������������������������������������������������������������������������������������ 3
Professional Fields of Application����������������������������������������������������������������������������������������������������������� 3
v
■ Table of Contents
SciPy������������������������������������������������������������������������������������������������������������������������������ 42
NumPy�������������������������������������������������������������������������������������������������������������������������������������������������� 42
Pandas�������������������������������������������������������������������������������������������������������������������������������������������������� 43
matplotlib��������������������������������������������������������������������������������������������������������������������������������������������� 43
Conclusions�������������������������������������������������������������������������������������������������������������������� 43
■
■Chapter 3: The NumPy Library����������������������������������������������������������������������������� 45
NumPy: A Little History��������������������������������������������������������������������������������������������������� 45
The NumPy Installation�������������������������������������������������������������������������������������������������� 46
ndarray: The Heart of the Library����������������������������������������������������������������������������������� 47
Create an Array������������������������������������������������������������������������������������������������������������������������������������� 48
Types of Data���������������������������������������������������������������������������������������������������������������������������������������� 49
The dtype Option���������������������������������������������������������������������������������������������������������������������������������� 50
Intrinsic Creation of an Array���������������������������������������������������������������������������������������������������������������� 50
Basic Operations������������������������������������������������������������������������������������������������������������ 51
Arithmetic Operators���������������������������������������������������������������������������������������������������������������������������� 52
The Matrix Product������������������������������������������������������������������������������������������������������������������������������� 53
vi
■ Table of Contents
General Concepts����������������������������������������������������������������������������������������������������������� 64
Copies or Views of Objects������������������������������������������������������������������������������������������������������������������� 64
Vectorization����������������������������������������������������������������������������������������������������������������������������������������� 65
Broadcasting���������������������������������������������������������������������������������������������������������������������������������������� 66
Structured Arrays����������������������������������������������������������������������������������������������������������� 68
Reading and Writing Array Data on Files������������������������������������������������������������������������ 70
Loading and Saving Data in Binary Files���������������������������������������������������������������������������������������������� 70
Reading Files with Tabular Data����������������������������������������������������������������������������������������������������������� 70
Conclusions�������������������������������������������������������������������������������������������������������������������� 72
■
■Chapter 4: The pandas Library—An Introduction������������������������������������������������ 73
pandas: The Python Data Analysis Library��������������������������������������������������������������������� 73
Installation of pandas����������������������������������������������������������������������������������������������������� 74
Installation from Anaconda������������������������������������������������������������������������������������������������������������������� 74
Installation from PyPI���������������������������������������������������������������������������������������������������������������������������� 78
vii
■ Table of Contents
The Series��������������������������������������������������������������������������������������������������������������������������������������������� 80
The Dataframe�������������������������������������������������������������������������������������������������������������������������������������� 87
The Index Objects��������������������������������������������������������������������������������������������������������������������������������� 94
Conclusions������������������������������������������������������������������������������������������������������������������ 114
■
■Chapter 5: pandas: Reading and Writing Data��������������������������������������������������� 115
I/O API Tools������������������������������������������������������������������������������������������������������������������ 115
CSV and Textual Files��������������������������������������������������������������������������������������������������� 116
Reading Data in CSV or Text Files��������������������������������������������������������������������������������� 116
Using Regexp to Parse TXT Files�������������������������������������������������������������������������������������������������������� 119
Reading TXT Files Into Parts��������������������������������������������������������������������������������������������������������������� 121
Writing Data in CSV���������������������������������������������������������������������������������������������������������������������������� 121
viii
■ Table of Contents
Concatenating�������������������������������������������������������������������������������������������������������������� 154
Combining������������������������������������������������������������������������������������������������������������������������������������������ 156
Pivoting����������������������������������������������������������������������������������������������������������������������������������������������� 157
Removing�������������������������������������������������������������������������������������������������������������������������������������������� 160
Permutation������������������������������������������������������������������������������������������������������������������ 169
Random Sampling������������������������������������������������������������������������������������������������������������������������������ 170
ix
■ Table of Contents
pyplot��������������������������������������������������������������������������������������������������������������������������� 189
The Plotting Window��������������������������������������������������������������������������������������������������������������������������� 189
Histograms������������������������������������������������������������������������������������������������������������������� 218
Bar Charts�������������������������������������������������������������������������������������������������������������������� 219
Horizontal Bar Charts�������������������������������������������������������������������������������������������������������������������������� 222
Multiserial Bar Charts������������������������������������������������������������������������������������������������������������������������� 223
Multiseries Bar Charts with a pandas Dataframe������������������������������������������������������������������������������� 225
Multiseries Stacked Bar Charts���������������������������������������������������������������������������������������������������������� 227
Stacked Bar Charts with a pandas Dataframe������������������������������������������������������������������������������������ 229
Other Bar Chart Representations�������������������������������������������������������������������������������������������������������� 230
xi
■ Table of Contents
■
■Chapter 8: Machine Learning with scikit-learn������������������������������������������������� 259
The scikit-learn Library������������������������������������������������������������������������������������������������ 259
Machine Learning��������������������������������������������������������������������������������������������������������� 259
Supervised and Unsupervised Learning��������������������������������������������������������������������������������������������� 259
Training Set and Testing Set��������������������������������������������������������������������������������������������������������������� 260
Conclusions������������������������������������������������������������������������������������������������������������������ 287
■
■Chapter 9: Deep Learning with TensorFlow������������������������������������������������������� 289
Artificial Intelligence, Machine Learning, and Deep Learning�������������������������������������� 289
Artificial Intelligence��������������������������������������������������������������������������������������������������������������������������� 289
Machine Learning Is a Branch of Artificial Intelligence���������������������������������������������������������������������� 290
Deep Learning Is a Branch of Machine Learning�������������������������������������������������������������������������������� 290
The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning������������������ 290
xii
■ Table of Contents
TensorFlow������������������������������������������������������������������������������������������������������������������� 298
TensorFlow: Google’s Framework������������������������������������������������������������������������������������������������������� 298
TensorFlow: Data Flow Graph������������������������������������������������������������������������������������������������������������� 298
Conclusions������������������������������������������������������������������������������������������������������������������ 321
■
■Chapter 10: An Example—Meteorological Data������������������������������������������������ 323
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea������������������������� 323
The System in the Study: The Adriatic Sea and the Po Valley������������������������������������������������������������� 323
xiii
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 348
■
■Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook���� 349
The Open Data Source for Demographics�������������������������������������������������������������������� 349
The JavaScript D3 Library�������������������������������������������������������������������������������������������� 352
Drawing a Clustered Bar Chart������������������������������������������������������������������������������������� 355
The Choropleth Maps��������������������������������������������������������������������������������������������������� 358
The Choropleth Map of the U.S. Population in 2022����������������������������������������������������� 362
Conclusions������������������������������������������������������������������������������������������������������������������ 366
■
■Chapter 12: Recognizing Handwritten Digits���������������������������������������������������� 367
Handwriting Recognition���������������������������������������������������������������������������������������������� 367
Recognizing Handwritten Digits with scikit-learn�������������������������������������������������������� 367
The Digits Dataset�������������������������������������������������������������������������������������������������������� 368
Learning and Predicting����������������������������������������������������������������������������������������������� 370
Recognizing Handwritten Digits with TensorFlow�������������������������������������������������������� 372
Learning and Predicting with an SLP��������������������������������������������������������������������������� 376
Learning and Predicting with an MLP�������������������������������������������������������������������������� 381
Conclusions������������������������������������������������������������������������������������������������������������������ 384
■
■Chapter 13: Textual Data Analysis with NLTK���������������������������������������������������� 385
Text Analysis Techniques���������������������������������������������������������������������������������������������� 385
The Natural Language Toolkit (NLTK)�������������������������������������������������������������������������������������������������� 386
Import the NLTK Library and the NLTK Downloader Tool��������������������������������������������������������������������� 386
Search for a Word with NLTK�������������������������������������������������������������������������������������������������������������� 389
Analyze the Frequency of Words�������������������������������������������������������������������������������������������������������� 390
Select Words from Text����������������������������������������������������������������������������������������������������������������������� 392
Bigrams and Collocations������������������������������������������������������������������������������������������������������������������� 393
Preprocessing Steps��������������������������������������������������������������������������������������������������������������������������� 394
xiv
■ Table of Contents
Conclusions������������������������������������������������������������������������������������������������������������������ 401
■
■Chapter 14: Image Analysis and Computer Vision with OpenCV����������������������� 403
Image Analysis and Computer Vision��������������������������������������������������������������������������� 403
OpenCV and Python������������������������������������������������������������������������������������������������������ 404
OpenCV and Deep Learning������������������������������������������������������������������������������������������ 404
Installing OpenCV��������������������������������������������������������������������������������������������������������� 404
First Approaches to Image Processing and Analysis���������������������������������������������������� 404
Before Starting����������������������������������������������������������������������������������������������������������������������������������� 404
Load and Display an Image���������������������������������������������������������������������������������������������������������������� 405
Work with Images������������������������������������������������������������������������������������������������������������������������������� 406
Save the New Image��������������������������������������������������������������������������������������������������������������������������� 407
Elementary Operations on Images������������������������������������������������������������������������������������������������������ 407
Image Blending����������������������������������������������������������������������������������������������������������������������������������� 411
■
■Appendix B: Open Data Sources������������������������������������������������������������������������ 435
Index��������������������������������������������������������������������������������������������������������������������� 437
xv
About the Author
Fabio Nelli is a data scientist and Python consultant who designs and develops Python applications for
data analysis and visualization. He also has experience in the scientific world, having performed various
data analysis roles in pharmaceutical chemistry for private research companies and universities. He has
been a computer consultant for many years at IBM, EDS, and Hewlett-Packard, along with several banks
and insurance companies. He holds a master’s degree in organic chemistry and a bachelor’s degree in
information technologies and automation systems, with many years of experience in life sciences (as a tech
specialist at Beckman Coulter, Tecan, and SCIEX).
For further info and other examples, visit his page at www.meccanismocomplesso.org and the GitHub
page at https://github.com/meccanismocomplesso.
xvii
About the Technical Reviewer
xix
Preface
About five years have passed since the last edition of this book. In drafting this third edition, I made some
necessary changes, both to the text and to the code. First, all the Python code has been ported to 3.8 and
greater, and all references to Python 2.x versions have been dropped. Some chapters required a total
rewrite because the content was no longer compatible. I'm referring to TensorFlow 3.x which, compared
to TensorFlow 2.x (covered in the previous edition), has completely revamped its entire reference system.
In five years, the deep learning modules and code developed with version 2.x have proven completely
unusable. Keras and all its modules have been incorporated into the TensorFlow library, replacing all the
classes, functions, and modules that performed similar functions. The construction of neural network
models, their learning phases, and the functions they use have all completely changed. In this edition,
therefore, you have the opportunity to learn the methods of TensorFlow 3.x and to acquire familiarity with
the concepts and new paradigms in the new version.
Regarding data visualization, I decided to add information about the Seaborn library to the matplotlib
chapter. Seaborn, although still in version 0.x, is proving to be a very useful matplotlib extension for data
analysis, thanks to its statistical display of plots and its compatibility with pandas dataframes. I hope that,
with this completely updated third edition, I can further entice you to study and deepen your data analysis
with Python. This book will be a valuable learning tool for you now, and serve as a dependable reference in
the future.
—Fabio Nelli
xxi
CHAPTER 1
In this chapter, you’ll take your first steps in the world of data analysis, learning in detail the concepts and
processes that make up this discipline. The concepts discussed in this chapter are helpful background
for the following chapters, where these concepts and procedures are applied in the form of Python code,
through the use of several libraries that are discussed in later chapters.
Data Analysis
In a world increasingly centralized around information technology, huge amounts of data are produced
and stored each day. Often these data come from automatic detection systems, sensors, and scientific
instrumentation, or you produce them daily and subconsciously every time you make a withdrawal from the
bank or purchase something, when you record various blogs, or even when you post on social networks.
But what are the data? The data actually are not information, at least in terms of their form. In the
formless stream of bytes, at first glance it is difficult to understand their essence, if they are not strictly
numbers, words, or times. This information is actually the result of processing, which, taking into account a
certain dataset, extracts conclusions that can be used in various ways. This process of extracting information
from raw data is called data analysis.
The purpose of data analysis is to extract information that is not easily deducible but, when understood,
enables you to carry out studies on the mechanisms of the systems that produced the data. This in turn
allows you to forecast possible responses of these systems and their evolution in time.
Starting from a simple methodical approach to data protection, data analysis has become a real
discipline, leading to the development of real methodologies that generate models. The model is in fact
a translation of the system to a mathematical form. Once there is a mathematical or logical form that can
describe system responses under different levels of precision, you can predict its development or response
to certain inputs. Thus, the aim of data analysis is not the model, but the quality of its predictive power.
The predictive power of a model depends not only on the quality of the modeling techniques but also
on the ability to choose a good dataset upon which to build the entire analysis process. So the search for
data, their extraction, and their subsequent preparation, while representing preliminary activities of an
analysis, also belong to data analysis itself, because of their importance in the success of the results.
So far I have spoken of data, their handling, and their processing through calculation procedures. In
parallel to all the stages of data analysis processing, various methods of data visualization have also been
developed. In fact, to understand the data, both individually and in terms of the role they play in the dataset,
there is no better system than to develop the techniques of graphical representation. These techniques are
capable of transforming information, sometimes implicitly hidden, into figures, which help you more easily
understand the meaning of the data. Over the years, many display modes have been developed for different
modes of data display, called charts.
At the end of the data analysis process, you have a model and a set of graphical displays and you can
predict the responses of the system under study; after that, you move to the test phase. The model is tested
using another set of data for which you know the system response. These data do not define the predictive
model. Depending on the ability of the model to replicate real, observed responses, you get an error
calculation and knowledge of the validity of the model and its operating limits.
These results can be compared to any other models to understand if the newly created one is
more efficient than the existing ones. Once you have assessed that, you can move to the last phase of
data analysis—deployment. This phase consists of implementing the results produced by the analysis,
namely, implementing the decisions to be made based on the predictions generated by the model and its
associated risks.
Data analysis is well suited to many professional activities. So, knowledge of it and how it can be put
into practice is relevant. It allows you to test hypotheses and understand the systems you’ve analyzed
more deeply.
Computer Science
Knowledge of computer science is a basic requirement for any data analyst. In fact, only when you have
good knowledge of and experience in computer science can you efficiently manage the necessary tools for
data analysis. In fact, every step concerning data analysis involves using calculation software (such as IDL,
MATLAB, etc.) and programming languages (such as C ++, Java, and Python).
The large amount of data available today, thanks to information technology, requires specific skills in
order to be managed as efficiently as possible. Indeed, data research and extraction require knowledge of
these various formats. The data are structured and stored in files or database tables with particular formats.
XML, JSON, or simply XLS or CSV files, are now the common formats for storing and collecting data, and
many applications allow you to read and manage the data stored in them. When it comes to extracting data
contained in a database, things are not so immediate, but you need to know the SQL Query language or use
software specially developed for the extraction of data from a given database.
Moreover, for some specific types of data research, the data are not available in an explicit format, but
are present in text files (documents and log files) or web pages, or shown as charts, measures, number of
visitors, or HTML tables. This requires specific technical expertise to parse and eventually extract these data
(called web scraping).
2
Chapter 1 ■ An Introduction to Data Analysis
Knowledge of information technology is necessary for using the various tools made available by
contemporary computer science, such as applications and programming languages. These tools, in turn, are
needed to perform data analysis and data visualization.
The purpose of this book is to provide all the necessary knowledge, as far as possible, regarding the
development of methodologies for data analysis. The book uses the Python programming language and
specialized libraries that contribute to the performance of the data analysis steps, from data research to data
mining, to publishing the results of the predictive model.
3
Chapter 1 ■ An Introduction to Data Analysis
Types of Data
Data can be divided into two distinct categories:
• Categorical (nominal and ordinal)
• Numerical (discrete and continuous)
Categorical data are values or observations that can be divided into groups or categories. There are two
types of categorical values: nominal and ordinal. A nominal variable has no intrinsic order that is identified
in its category. An ordinal variable instead has a predetermined order.
Numerical data are values or observations that come from measurements. There are two types of
numerical values: discrete and continuous numbers. Discrete values can be counted and are distinct and
separated from each other. Continuous values, on the other hand, are values produced by measurements or
observations that assume any value within a defined range.
4
Chapter 1 ■ An Introduction to Data Analysis
• Predictive modeling
• Model validation/testing
• Visualization and interpretation of results
• Deployment of the solution (implementation of the solution in the real world)
Figure 1-1 shows a schematic representation of all the processes involved in data analysis.
Problem Definition
The process of data analysis actually begins long before the collection of raw data. In fact, data analysis
always starts with a problem to be solved, which needs to be defined.
The problem is defined only after you have focused the system you want to study; this may be a
mechanism, an application, or a process in general. Generally this study can be in order to better understand
its operation, but in particular, the study is designed to understand the principles of its behavior in order to
be able to make predictions or choices (defined as an informed choice).
The definition step and the corresponding documentation (deliverables) of the scientific problem or
business are both very important in order to focus the entire analysis strictly on getting results. In fact, a
comprehensive or exhaustive study of the system is sometimes complex and you do not always have enough
information to start with. So the definition of the problem and especially its planning can determine the
guidelines for the whole project.
5
Chapter 1 ■ An Introduction to Data Analysis
Once the problem has been defined and documented, you can move to the project planning stage of
data analysis. Planning is needed to understand which professionals and resources are necessary to meet
the requirements to carry out the project as efficiently as possible. You consider the issues involving the
resolution of the problem. You look for specialists in various areas of interest and install the software needed
to perform data analysis.
Also during the planning phase, you choose an effective team. Generally, these teams should be cross-
disciplinary in order to solve the problem by looking at the data from different perspectives. So, building a
good team is certainly one of the key factors leading to success in data analysis.
Data Extraction
Once the problem has been defined, the first step is to obtain the data in order to perform the analysis.
The data must be chosen with the basic purpose of building the predictive model, and so data selection is
crucial for the success of the analysis as well. The sample data collected must reflect as much as possible
the real world, that is, how the system responds to stimuli from the real world. For example, if you’re using
huge datasets of raw data and they are not collected competently, these may portray false or unbalanced
situations.
Thus, poor choice of data, or even performing analysis on a dataset that’s not perfectly representative of
the system, will lead to models that will move away from the system under study.
The search and retrieval of data often require a form of intuition that goes beyond mere technical
research and data extraction. This process also requires a careful understanding of the nature and form of
the data, which only good experience and knowledge in the problem’s application field can provide.
Regardless of the quality and quantity of data needed, another issue is using the best data sources.
If the studio environment is a laboratory (technical or scientific) and the data generated are
experimental, then in this case the data source is easily identifiable. In this case, the problems will be only
concerning the experimental setup.
But it is not possible for data analysis to reproduce systems in which data are gathered in a strictly
experimental way in every field of application. Many fields require searching for data from the surrounding
world, often relying on external experimental data, or even more often collecting them through interviews
or surveys. So in these cases, finding a good data source that is able to provide all the information you need
for data analysis can be quite challenging. Often it is necessary to retrieve data from multiple data sources to
supplement any shortcomings, to identify any discrepancies, and to make the dataset as general as possible.
When you want to get the data, a good place to start is the web. But most of the data on the web can be
difficult to capture; in fact, not all data are available in a file or database, but might be content that is inside
HTML pages in many different formats. To this end, a methodology called web scraping allows the collection
of data through the recognition of specific occurrence of HTML tags within web pages. There is software
specifically designed for this purpose, and once an occurrence is found, it extracts the desired data. Once the
search is complete, you will get a list of data ready to be subjected to data analysis.
Data Preparation
Among all the steps involved in data analysis, data preparation, although seemingly less problematic, in
fact requires more resources and more time to be completed. Data are often collected from different data
sources, each of which has data in it with a different representation and format. So, all of these data have to
be prepared for the process of data analysis.
The preparation of the data is concerned with obtaining, cleaning, normalizing, and transforming
data into an optimized dataset, that is, in a prepared format that’s normally tabular and is suitable for the
methods of analysis that have been scheduled during the design phase.
Many potential problems can arise, including invalid, ambiguous, or missing values, replicated fields,
and out-of-range data.
6
Chapter 1 ■ An Introduction to Data Analysis
Data Exploration/Visualization
Exploring the data involves essentially searching the data in a graphical or statistical presentation in order
to find patterns, connections, and relationships. Data visualization is the best tool to highlight possible
patterns.
In recent years, data visualization has been developed to such an extent that it has become a real
discipline in itself. In fact, numerous technologies are utilized exclusively to display data, and many display
types are applied to extract the best possible information from a dataset.
Data exploration consists of a preliminary examination of the data, which is important for
understanding the type of information that has been collected and what it means. In combination with the
information acquired during the definition problem, this categorization determines which method of data
analysis is most suitable for arriving at a model definition.
Generally, this phase, in addition to a detailed study of charts through the visualization data, may
consist of one or more of the following activities:
• Summarizing data
• Grouping data
• Exploring the relationship between the various attributes
• Identifying patterns and trends
Generally, data analysis requires summarizing statements regarding the data to be studied.
Summarization is a process by which data are reduced to interpretation without sacrificing important
information.
Clustering is a method of data analysis that is used to find groups united by common attributes (also
called grouping).
Another important step of the analysis focuses on the identification of relationships, trends, and
anomalies in the data. In order to find this kind of information, you often have to resort to the tools as well as
perform another round of data analysis, this time on the data visualization itself.
Other methods of data mining, such as decision trees and association rules, automatically extract
important facts or rules from the data. These approaches can be used in parallel with data visualization to
uncover relationships between the data.
Predictive Modeling
Predictive modeling is a process used in data analysis to create or choose a suitable statistical model to
predict the probability of a result.
After exploring the data, you have all the information needed to develop the mathematical model that
encodes the relationship between the data. These models are useful for understanding the system under
study, and in a specific way they are used for two main purposes. The first is to make predictions about the
data values produced by the system; in this case, you will be dealing with regression models if the result is
numeric or with classification models if the result is categorical. The second purpose is to classify new data
products, and in this case, you will be using classification models if the results are identified by classes or
clustering models if the results could be identified by segmentation. In fact, it is possible to divide the models
according to the type of result they produce:
• Classification models: If the result obtained by the model type is categorical.
• Regression models: If the result obtained by the model type is numeric.
• Clustering models: If the result obtained by the model type is a segmentation.
7
Chapter 1 ■ An Introduction to Data Analysis
Simple methods to generate these models include techniques such as linear regression, logistic
regression, classification and regression trees, and k-nearest neighbors. But the methods of analysis are
numerous, and each has specific characteristics that make it excellent for some types of data and analysis.
Each of these methods will produce a specific model, and then their choice is relevant to the nature of the
product model.
Some of these models will provide values corresponding to the real system and according to their
structure. They will explain some characteristics of the system under study in a simple and clear way. Other
models will continue to give good predictions, but their structure will be no more than a “black box” with
limited ability to explain characteristics of the system.
Model Validation
Validation of the model, that is, the test phase, is an important phase that allows you to validate the model
built on the basis of starting data. That is important because it allows you to assess the validity of the data
produced by the model by comparing these data directly with the actual system. But this time, you are
coming from the set of starting data on which the entire analysis has been established.
Generally, you refer to the data as the training set when you are using them to build the model, and as
the validation set when you are using them to validate the model.
Thus, by comparing the data produced by the model with those produced by the system, you can
evaluate the error, and using different test datasets, you can estimate the limits of validity of the generated
model. In fact the correctly predicted values could be valid only within a certain range, or they could have
different levels of matching depending on the range of values taken into account.
This process allows you not only to numerically evaluate the effectiveness of the model but also to
compare it with any other existing models. There are several techniques in this regard; the most famous is
the cross-validation. This technique is based on the division of the training set into different parts. Each of
these parts, in turn, is used as the validation set and any other as the training set. In this iterative manner,
you will have an increasingly perfected model.
Deployment
This is the final step of the analysis process, which aims to present the results, that is, the conclusions of the
analysis. In the deployment process of the business environment, the analysis is translated into a benefit
for the client who has commissioned it. In technical or scientific environments, it is translated into design
solutions or scientific publications. That is, the deployment basically consists of putting into practice the
results obtained from the data analysis.
There are several ways to deploy the results of data analysis or data mining. Normally, a data analyst’s
deployment consists of writing a report for management or for the customer who requested the analysis.
This document conceptually describes the results obtained from the analysis of data. The report should
be directed to the managers, who are then able to make decisions. Then, they will put into practice the
conclusions of the analysis.
In the documentation supplied by the analyst, each of these four topics is discussed in detail:
• Analysis results
• Decision deployment
• Risk analysis
• Measuring the business impact
When the results of the project include the generation of predictive models, these models can be
deployed as stand-alone applications or can be integrated into other software.
8
Chapter 1 ■ An Introduction to Data Analysis
Open Data
In support of the growing demand for data, a huge number of data sources are now available on the Internet.
These data sources freely provide information to anyone in need, and they are called open data.
9
Chapter 1 ■ An Introduction to Data Analysis
Here is a list of some open data available online covering different topics. You can find a more complete
list and details of the open data available online in Appendix B.
• Kaggle (www.kaggle.com/datasets) is a huge community of apprentices and expert
data scientists who provide a vast amount of datasets and code that they use for
their analyses. The extensive documentation and the introduction to every aspect
of machine learning are also excellent. They also hold interesting competitions
organized around the resolution of various problems.
• DataHub (datahub.io/search) is a community that makes a huge amount of
datasets freely available, along with tools for their command-line management. The
dataset topics cover various fields, ranging from the financial market, to population
statistics, to the prices of cryptocurrencies.
• Nasa Earth Observations (https://neo.gsfc.nasa.gov/dataset_index.php/)
provides a wide range of datasets that contain data collected from global climate and
environmental observations.
• World Health Organization (www.who.int/data/collections) manages and
maintains a wide range of data collections related to global health and well-being.
• World Bank Open Data (https://data.worldbank.org/) provides a listing of
available World Bank datasets covering financial and banking data, development
indicators, and information on the World Bank’s lending projects from 1947 to the
present.
• Data.gov (https://data.gov) is intended to collect and provide access to the
U.S. government’s Open Data, a broad range of government information collected at
different levels (federal, state, local, and tribal).
• European Union Open Data Portal (https://data.europa.eu/en) collects and
makes publicly available a wide range of datasets concerning the public sector of the
European member states.
• Healthdata.gov (www.healthdata.gov/) provides data about health and health care
for doctors and researchers so they can carry out clinical studies and solve problems
regarding diseases, virus spread, and health practices, as well as improve the level of
global health.
• Google Trends Datastore (https://googletrends.github.io/data/) collects and
makes available the collected data divided by topic of the famous and very useful
Google Trends, which is used to carry out analyses on its own account.
Finally, recently Google has made available a search page dedicated to datasets,
where you can search for a topic and obtain a series of datasets (or even data
sources) that correspond as much as possible to what you are looking for. For
example, in Figure 1-3, you can see how, when researching the price of houses, a
series of datasets or data sources are suggested in real time.
10
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-3. Example of a search for a dataset regarding the prices of houses on Google Dataset Search
As an idea of open data sources available online, you can look at the LOD cloud diagram (http://cas.
lod-cloud.net), which displays the connections of the data link among several open data sources currently
available on the network (see Figure 1-4). The diagram contains a series of circular elements corresponding
to the available data sources; their color corresponds to a specific topic of the data provided. The legend
indicates the topic-color correspondence. When you click an element on the diagram, you see a page
containing all the information about the selected data source and how to access it.
11
Chapter 1 ■ An Introduction to Data Analysis
Figure 1-4. Linked open data cloud diagram 2023, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch,
and Richard Cyganiak. http://cas.lod-cloud.net [CC-BY license]
12
Chapter 1 ■ An Introduction to Data Analysis
Compared to other programming languages generally used for data analysis, such as R and MATLAB,
Python not only provides a platform for processing data, but it also has features that make it unique
compared to other languages and specialized applications. The development of an ever-increasing number
of support libraries, the implementation of algorithms of more innovative methodologies, and the ability to
interface with other programming languages (C and Fortran) all make Python unique among its kind.
Furthermore, Python is not only specialized for data analysis, but it also has many other applications,
such as generic programming, scripting, interfacing to databases, and more recently web development,
thanks to web frameworks like Django. So it is possible to develop data analysis projects that are compatible
with the web server with the possibility to integrate them on the web.
For those who want to perform data analysis, Python, with all its packages, is considered the best choice
for the foreseeable future.
Conclusions
In this chapter, you learned what data analysis is and, more specifically, the various processes that comprise
it. Also, you have begun to see the role that data play in building a prediction model and how their careful
selection is at the basis of a careful and accurate data analysis.
In the next chapter, you take this vision of Python and the tools it provides to perform data analysis.
13
CHAPTER 2
The Python language, and the world around it, is made by interpreters, tools, editors, libraries, notebooks,
and so on. This Python world has expanded greatly in recent years, enriching and taking forms that
developers who approach it for the first time can sometimes find complicated and somewhat misleading.
Thus, if you are approaching Python for the first time, you might feel lost among so many choices, especially
about where to start.
This chapter gives you an overview of the entire Python world. You’ll first gain an introduction to the
Python language and its unique characteristics. You’ll learn where to start, what an interpreter is, and how to
begin writing your first lines of code in Python before being presented with some new and more advanced
forms of interactive writing with respect to shells, such as IPython and the IPython Notebook.
Python is an object-oriented programming language. In fact, it allows you to specify classes of objects
and implement their inheritance. But unlike C++ and Java, there are no constructors or destructors. Python
also allows you to implement specific constructs in your code to manage exceptions. However, the structure
of the language is so flexible that it allows you to program with alternative approaches with respect to the
object-oriented one. For example, you can use functional or vectorial approaches.
Python is an interactive programming language. Thanks to the fact that Python uses an interpreter to
be executed, this language can take on very different aspects depending on the context in which it is used.
In fact, you can write long lines of code, similar to what you might do in languages like C++ or Java, and then
launch the program, or you can enter the command line at once and execute a command, immediately
getting the results. Then, depending on the results, you can decide what command to run next. This highly
interactive way to execute code makes the Python computing environment similar to MATLAB. This feature
of Python is one reason it’s popular with the scientific community.
Python is a programming language that can be interfaced. In fact, this programming language can be
interfaced with code written in other programming languages such as C/C++ and FORTRAN. Even this
was a winning choice. In fact, thanks to this aspect, Python can compensate for what is perhaps its only
weak point, the speed of execution. The nature of Python, as a highly dynamic programming language, can
sometimes lead to execution of programs up to 100 times slower than the corresponding static programs
compiled with other languages. The solution to this kind of performance problem is to interface Python to
the compiled code of other languages by using it as if it were its own.
Python is an open-source programming language. CPython, which is the reference implementation
of the Python language, is completely free and open source. Additionally every module or library in the
network is open source and their code is available online. Every month, an extensive developer community
includes improvements to make this language and all its libraries even richer and more efficient. CPython is
managed by the nonprofit Python Software Foundation, which was created in 2001 and has given itself the
task of promoting, protecting, and advancing the Python programming language.
Finally, Python is a simple language to use and learn. This aspect is perhaps the most important,
because it is the most direct aspect that a developer, even a novice, faces. The high intuitiveness and ease of
reading of Python code often leads to “sympathy” for this programming language, and consequently most
newcomers to programming choose to use it. However, its simplicity does not mean narrowness, since
Python is a language that is spreading in every field of computing. Furthermore, Python is doing all of this
very simply, in comparison to existing programming languages such as C++, Java, and FORTRAN, which by
their nature are very complex.
16
Chapter 2 ■ Introduction to the Python World
Lexing, or tokenization, is the initial phase in which the Python (human-readable) code is converted
into a sequence of logical entities, the so-called lexical tokens (see Figure 2-1).
Parsing is the next stage in which the syntax and grammar of the lexical tokens are checked by a parser,
which produces an abstract syntax tree (AST) as a result.
Compiling is the phase in which the compiler reads the AST and, based on this information, generates
the Python bytecode (.pyc or .pyo files), which contains very basic execution instructions. Although this
is a compilation phase, the generated bytecode is still platform-independent, which is very similar to what
happens in the Java language.
The last phase is interpreting, in which the generated bytecode is executed by a Python virtual
machine (PVM).
CPython
The standard Python interpreter is CPython, and it was written in C. This made it possible to use C-based
libraries over Python. CPython is available on a variety of platforms, including ARM, iOS, and RISC. Despite
this, CPython has been optimized on portability and other specifications, but not on speed.
Cython
The strongly intrinsic nature of C in the CPython interpreter has been taken further with the Cython project.
This project is based on creating a compiler that translates Python code into C. This code is then executed
within a Cython environment at runtime. This type of compilation system makes it possible to introduce C
semantics into the Python code to make it even more efficient. This system has led to the merging of two worlds
of programming language with the birth of Cython, which can be considered a new programming language.
You can find documentation about it online. I advise you to visit cython.readthedocs.io/en/latest/.
Pyston
Pyston (www.pyston.org/) is a fork of the CPython interpreter that implements performance optimization.
This project arises precisely from the need to obtain an interpreter that can replace CPython over time to
remedy its poor performance in terms of execution speed. Recent results seem to confirm these predictions,
reporting a 30 percent improvement in performance in the case of large, real-world applications.
Unfortunately, due to the lack of compatible binary packages, Pyston packages have to be rebuilt during the
download phase.
17
Chapter 2 ■ Introduction to the Python World
Jython
In parallel to Cython, there is a version built and compiled in Java, called Jython. It was created by Jim
Hugunin in 1997 (www.jython.org/). Jython is an implementation of the Python programming language in
Java; it is further characterized by using Java classes instead of Python modules to implement extensions and
packages of Python.
IronPython
Even the .NET framework offers the possibility of being able to execute Python code inside it. For this
purpose, you can use the IronPython interpreter (https://ironpython.net/). This interpreter allows .NET
developers to develop Python programs on the Visual Studio platform, integrating perfectly with the other
development tools of the .NET platform.
Initially built by Jim Hugunin in 2006 with the release of version 1.0, the project was later supported by a
small team at Microsoft until version 2.7 in 2010. Since then, numerous other versions have been released up
to the current 3.4, all ported forward by a group of volunteers on Microsoft’s CodePlex repository.
PyPy
The PyPy interpreter is a JIT (just-in-time) compiler, and it converts the Python code directly to machine
code at runtime. This choice was made to speed up the execution of Python. However, this choice has led to
the use of a smaller subset of Python commands, defined as RPython. For more information on this, consult
the official website at www.pypy.org/.
RustPython
As the name suggests, RustPython (rustpython.github.io/) is a Python interpreter written in Rust. This
programming language is quite new but it is gaining popularity. RustPython is an interpreter like CPython
but can also be used as a JIT compiler. It also allows you to run Python code embedded in Rust programs
and compile the code into WebAssembly, so you can run Python code directly from web browsers.
Installing Python
In order to develop programs in Python, you have to install it on your operating system. Linux distributions
and macOS X machines should have a preinstalled version of Python. If not, or if you want to replace that
version with another, you can easily install it. The process for installing Python differs from operating system
to operating system. However, it is a rather simple operation.
On Debian-Ubuntu Linux systems, the first thing to do is to check whether Python is already installed
on your system and what version is currently in use.
Open a terminal (by pressing ALT+CTRL+T) and enter the following command:
python3 --version
If you get the version number as output, then Python is already present on the Ubuntu system. If you get
an error message, Python hasn’t been installed yet.
In this last case
18
Chapter 2 ■ Introduction to the Python World
If, on the other hand, the current version is old, you can update it with the latest version of your Linux
distribution by entering the following command:
Finally, if instead you want to install a specific version on your system, you have to explicitly indicate it
in the following way:
On Red Hat and CentOS Linux systems working with rpm packages, run this command instead:
If you are running Windows or macOS X, you can go to the official Python site (www.python.org) and
download the version you prefer. The packages in this case are installed automatically.
However, today there are distributions that provide a number of tools that make the management and
installation of Python, all libraries, and associated applications easier. I strongly recommend you choose one
of the distributions available online.
Python Distributions
Due to the success of the Python programming language, many Python tools have been developed to meet
various functionalities over the years. There are so many that it’s virtually impossible to manage all of them
manually.
In this regard, many Python distributions efficiently manage hundreds of Python packages. In fact,
instead of individually downloading the interpreter, which includes only the standard libraries, and then
needing to individually install all the additional libraries, it is much easier to install a Python distribution.
At the heart of these distributions are the package managers, which are nothing more than applications
that automatically manage, install, upgrade, configure, and remove Python packages that are part of the
distribution.
Their functionality is very useful, since the user simply makes a request regarding a particular package
(which could be an installation for example). Then the package manager, usually via the Internet, performs
the operation by analyzing the necessary version, alongside all dependencies with any other packages, and
downloads them if they are not present.
Anaconda
Anaconda is a free distribution of Python packages distributed by Continuum Analytics (www.anaconda.com).
This distribution supports Linux, Windows, and macOS X operating systems. Anaconda, in addition to
providing the latest packages released in the Python world, comes bundled with most of the tools you need
to set up a Python development environment.
Indeed, when you install the Anaconda distribution on your system, you can use many tools and
applications described in this chapter, without worrying about having to install and manage them
separately. The basic distribution includes Spyder, an IDE used to develop complex Python programs,
Jupyter Notebook, a wonderful tool for working interactively with Python in a graphical and orderly way, and
Anaconda Navigator, a graphical panel for managing packages and virtual environments.
19
Chapter 2 ■ Introduction to the Python World
The management of the entire Anaconda distribution is performed by an application called conda. This
is the package manager and the environment manager of the Anaconda distribution and it handles all of the
packages and their versions.
One of the most interesting aspects of this distribution is the ability to manage multiple development
environments, each with its own version of Python. With Anaconda, you can work simultaneously and
independently with different Python versions at the same time, by creating several virtual environments.
You can create, for instance, an environment based on Python 3.11 even if the current Python version is still
3.10 in your system. To do this, you write the following command via the console:
This will generate a new Anaconda virtual environment with all the packages related to the Python
3.11 version. This installation will not affect the Python version installed on your system and won’t generate
any conflicts. When you no longer need the new virtual environment, you can simply uninstall it, leaving
the Python system installed on your operating system completely unchanged. Once it’s installed, you can
activate the new environment by entering the following command:
activate py311
C:\Users\Fabio>activate py311
(py311) C:\Users\Fabio>
You can create as many versions of Python as you want; you need only to change the parameter passed
with the python option in the conda create command. When you want to return to work with the original
Python version, use the following command:
source deactivate
(py311) C:\Users\Fabio>deactivate
Deactivating environment "py311"...
C:\Users\Fabio>
A
naconda Navigator
Although at the base of the Anaconda distribution there is the conda command for the management of
packages and virtual environments, working through the command console is not always practical and
efficient. As you will see in the following chapters of the book, Anaconda provides a graphical tool called
Anaconda Navigator, which allows you to manage the virtual environments and related packages in a
graphical and very simplified way (see Figure 2-2).
20
Chapter 2 ■ Introduction to the Python World
21
Chapter 2 ■ Introduction to the Python World
Also from the Environments panel it is possible to create new virtual environments, selecting the basic
Python version. Similarly, the same virtual environments can be deleted, cloned, backed up, or imported
using the menu shown in Figure 2-4.
Figure 2-4. Button menu for managing virtual environments in Anaconda Navigator
But that is not all. Anaconda Navigator is not only a useful application for managing Python
applications, virtual environments, and packages. In the third panel, called Learning (see Figure 2-5), it
provides links to the main sites of many useful Python libraries (including those covered in this book). By
clicking one of these links, you can access a lot of documentation. This is always useful to have on hand if
you program in Python on a daily basis.
22
Chapter 2 ■ Introduction to the Python World
An identical panel to this is the next one, called Community. There are links here too, but this time to
forums from the main Python development and Data Analytics communities.
The Anaconda platform, with its multiple applications and Anaconda Navigator, allows developers to
take advantage of this simple and organized work environment and be well prepared for the development
of Python code. It is no coincidence that this platform has become almost a standard for those belonging to
the sector.
Using Python
Python is rich, but simple and very flexible. It allows you to expand your development activities in many
areas of work (data analysis, scientific, graphic interfaces, etc.). Precisely for this reason, Python can be used
in many different contexts, often according to the taste and ability of the developer. This section presents
the various approaches to using Python in the course of the book. According to the various topics discussed
in different chapters, these different approaches will be used specifically, as they are more suited to the task
at hand.
Python Shell
The easiest way to approach the Python world is to open a session in the Python shell, which is a terminal
running a command line. In fact, you can enter one command at a time and test its operation immediately.
This mode makes clear the nature of the interpreter that underlies Python. In fact, the interpreter can read
one command at a time, keeping the status of the variables specified in the previous lines, a behavior similar
to that of MATLAB and other calculation software.
23
Chapter 2 ■ Introduction to the Python World
This approach is helpful when approaching Python the first time. You can test commands one at a time
without having to write, edit, and run an entire program, which could be composed of many lines of code.
This mode is also good for testing and debugging Python code one line at a time, or simply to make
calculations. To start a session on the terminal, simply type this on the command line:
C:\Users\nelli>python
Python 3.10 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
The Python shell is now active and the interpreter is ready to receive commands in Python. Start by
entering the simplest of commands, but a classic for getting started with programming.
If you have the Anaconda platform available on your system, you can open a Python shell related to a
specific virtual environment you want to work on. In this case, from Anaconda Navigator, in the Home panel,
activate the virtual environment from the drop-down menu and click the Launch button of the CMD.exe
Prompt application, as shown in Figure 2-6.
24
Chapter 2 ■ Introduction to the Python World
A command console will open with the name of the active virtual environment prefixed in brackets in
the prompt. From there, you can run the python command to activate the Python shell.
(Edition3) C:\Users\nelli>python
Python 3.11.0 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:21) [MSC v.1916 64
bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Now you’ve written your first program in Python, and you can run it directly from the command line by
calling the python command and then the name of the file containing the program code.
python MyFirstProgram.py
From the output, the program will ask for your name. Once you enter it, it will say hello.
25
Chapter 2 ■ Introduction to the Python World
M
ake Calculations
You have already seen that the print() function is useful for printing almost anything. Python, in addition
to being a printing tool, is a great calculator. Start a session on the Python shell and begin to perform these
mathematical operations:
>>> 1 + 2
3
>>> (1.045 * 3)/4
0.78375
>>> 4 ** 2
16
>>> ((4 + 5j) * (2 + 3j))
(-7+22j)
>>> 4 < (2*3)
True
Python can calculate many types of data, including complex numbers and conditions with Boolean
values. As you can see from these calculations, the Python interpreter directly returns the result of the
calculations without the need to use the print() function. The same thing applies to values contained in
variables. It’s enough to call the variable to see its contents.
>>> a = 12 * 3.4
>>> a
40.8
In this way, all the functions contained in the math package are available in your Python session so you
can call them directly. Thus, you have extended the standard set of functions available when you start a
Python session. These functions are called with the following expression.
library_name.function_name()
26
Chapter 2 ■ Introduction to the Python World
For example, you can now calculate the sine of the value contained in the variable a.
>>> math.sin(a)
As you can see, the function is called along with the name of the library. Sometimes you might find the
following expression for declaring an import.
Even if this works properly, it is to be avoided for good practice. In fact, writing an import in this way
involves the importation of all functions without necessarily defining the library to which they belong.
>>> sin(a)
0.040693257349864856
This form of import can lead to very large errors, especially if the imported libraries are numerous. In
fact, it is not unlikely that different libraries have functions with the same name, and importing all of these
would result in an override of all functions with the same name that were previously imported. Therefore,
the behavior of the program could generate numerous errors or worse, abnormal behavior.
Actually, this way to import is generally used for only a limited number of functions, that is, functions
that are strictly necessary for the functioning of the program, thus avoiding the importation of an entire
library when it is completely unnecessary.
Data Structure
You saw in the previous examples how to use simple variables containing a single value. Python provides a
number of extremely useful data structures. These data structures can contain lots of data simultaneously
and sometimes even data of different types. The various data structures provided are defined differently
depending on how their data are structured internally.
• List
• Set
• Strings
• Tuples
• Dictionary
• Deque
• Heap
This is only a small part of all the data structures that can be made with Python. Among all these data
structures, the most commonly used are dictionaries and lists.
The type dictionary, defined also as dicts, is a data structure in which each particular value is associated
with a particular label, called a key. The data collected in a dictionary have no internal order but are only
definitions of key/value pairs.
27
Chapter 2 ■ Introduction to the Python World
If you want to access a specific value within the dictionary, you have to indicate the name of the
associated key.
>>> dict["name"]
'William'
If you want to iterate the pairs of values in a dictionary, you have to use the for-in construct. This is
possible through the use of the items() function.
The type list is a data structure that contains a number of objects in a precise order to form a sequence
to which elements can be added and removed. Each item is marked with a number corresponding to the
order of the sequence, called the index.
If you want to access the individual elements, it is sufficient to specify the index in square brackets (the
first item in the list has 0 as its index), while if you take out a portion of the list (or a sequence), it is sufficient
to specify the range with the indices i and j corresponding to the extremes of the portion.
>>> list[2]
3
>>> list[1:3]
[2, 3]
If you are using negative indices instead, this means you are considering the last item in the list and
gradually moving to the first.
>>> list[-1]
4
In order to do a scan of the elements of a list, you can use the for-in construct.
28
Other documents randomly have
different content
reminds us of the doings of Justice, when she did act, in the reign of
James VI. Hunter and Strachan, a notary, were hanged on the 18th of
February, ‘as an example to the terror of others,’ says Fountainhall.
Three other persons, including a notary, were glad to save
themselves from a trial, by voluntary banishment. ‘Some moved that
they might be delivered to a captain of the recruits, to serve as
soldiers in Flanders; but the other method was judged more
legal.’[397]
The year passed, and many more years after it, without clearing up
the mystery. We find no trace of further legal proceedings regarding
the missing gentleman, his family, or property. The fact itself
remained green in the popular 1709.
remembrance, particularly in the district to
which Sir Michael belonged. In November 1724, the public curiosity
was tantalised by a story published on a broadside, entitled Murder
will Out, and professing to explain how the lost gentleman had met
his death. The narrative was said to proceed on the death-bed
confession of a woman who had, in her infancy, seen Sir Michael
murdered by her parents, his tenants, in order to evade a debt which
they owed him, and of which he had called to crave payment on the
day of his disappearance. Stabbing him with his own sword as he sat
at their fireside, they were said to have buried his body and that of
his horse, and effectually concealed their guilt while their own lives
lasted. Now, it was said, their daughter, who had involuntarily
witnessed a deed she could not prevent, had been wrought upon to
disclose all the particulars, and these had been verified by the finding
of the bones of Sir Michael, which were now transferred to the
sepulchre of his family. But this story was merely a fiction trafficking
on the public curiosity. On its being alluded to in the Edinburgh
Evening Courant as an actual occurrence, ‘the son and heir of the
defunct Sir Michael’ informed the editor of its falsity, which was also
acknowledged by the printer of the statement himself; and pardon
was craved of the honourable family and their tenants for putting it
into circulation. On making inquiry in the district, I have become
satisfied that the disappearance of this gentleman from the field of
visible life was never explained, as it now probably never will be. In
time, the property was bought by a neighbouring gentleman, who did
not require to use the mansion as his residence. Denmill Castle
accordingly fell out of order, and became a ruin. The fathers of
people still living thereabouts remembered seeing the papers of the
family—amongst which were probably some that had belonged to the
antiquarian Sir James—scattered in confusion about a garret
pervious to the elements, under which circumstances they were
allowed to perish.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com