Lebesgue's Theory of Integration: Rahul Jain
Lebesgue's Theory of Integration: Rahul Jain
Lebesgue’s
Theory of
Integration
The Untouched Classic
Lebesgue’s Theory of Integration
Rahul Jain
Lebesgue’s Theory
of Integration
The Untouched Classic
Rahul Jain
Indian Police Service
Indian Institute of Science (IISc)
Bengaluru, Karnataka, India
Tata Institute of Fundamental Research (TIFR)
Bengaluru, Karnataka, India
Translation from the French language edition: “Leçons sur l’intégration et la recherche des fonctions primitives,
professées au Collège de France par Henri Lebesgue” by Rahul Jain, © Gauthier-Villars 1904. Published by
Gauthier-Villars. All Rights Reserved.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte
Ltd. 2025
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or
hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give
a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that
may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
In this work, I have compiled the lessons I taught at the Collège de France, during the
academic year 1902–1903, as in–charge of the course founded by the Peccot family.
The 20 lessons that comprise this course have been dedicated to the study of the devel-
opment of the concept of integral. A complete historical account of the development could
not be given in 20 lessons. Therefore, I had to leave out many important results. I ini-
tially focused on the integration of real functions of a single real variable. The reader
may investigate if the indicated results lend themselves easily to generalisations. More-
over, from among the number of definitions that have been successively proposed for the
integration of real functions of one real variable, I have retained only those, that are, in
my opinion, indispensable for understanding the various transformations that the problem
of integration has undergone, and enable the reader to grasp the relationship between the
seemingly simple notion of area, and certain very complicated aspects of the analytic
definitions of the integral.
Indeed one may wonder, whether there is any interest in dealing with such complica-
tions or better to focus on the study of functions that require only simple definitions. This
approach has its advantages, especially in an elementary course. But, as one advances,
if the focus is always on the considerations of “nice” functions, one would necessarily
forego solving a good number of problems with simple statements which have been posed
for a long time. It is for the purpose of solving these problems, and not for the love of
complications, that I have introduced in this book a more general definition of integral
that includes Riemann integral as a particular case.
Those who read me carefully, while perhaps regretting that things are not very simple,
will grant me, I think, that this definition is necessary and natural. I dare to say that it is, in
a certain sense, simpler and easier to grasp than Riemann’s. Only, the previously acquired
mental makeup would make it appear complicated. It is simpler because it highlights
the most important properties of integrals, while Riemann’s definition highlights only the
method of calculation. That is why it is almost always as easy, sometimes even easier,
with the help of a general definition of integration, to prove a property for all functions
v
vi Preface
to which this definition applies—that is, for all summable functions—than to prove it for
only integrable functions, based on Riemann’s definition. Even if one is interested only in
the results related to the simple functions, it is still useful to know the notion of summable
functions because it suggests a rapid procedure of proof.
As an application of the definition of the integral, I studied the determination of prim-
itive functions and rectification of curves. To these two applications I would have liked to
add another very important one: the study of trigonometric expansion of functions; but,
in my course, I could give this subject only such an incomplete description that I deem it
unfit to reproduce it here.
Following the example given by M. Borel, I have written these lessons without assum-
ing that the reader has any knowledge beyond what is taught in undergraduate courses of
all faculties. I could even say that I do not assume anything more than a knowledge of
the definition and the most elementary properties of the integrals of continuous functions.
But while it is not essential to know a lot of things before reading these lessons, it is
necessary to have a bent of mind and also it is useful to be interested in certain ques-
tions in the theory of functions. A perfectly prepared reader would be the one who would
have already read l’introduction à l’étude des fonctions d’une variable réelle, of M. Jules
Tannery, (See [25]), and the Leçons sur la théorie des fonctions (See [5]) by M. Emile
Borel.
If one compares this book with the few pages that we dedicate to the integration and
to the search for primitive functions, one would undoubtedly find it a bit long. However, I
hope that all those who have written on the theory of functions and who are familiar with
the difficulties in being both rigorous and short, would not be too surprised at its length.
Maybe they would forgive me for having been, in their opinion, sometimes too verbose,
sometimes too concise.
As for the writing, I have taken recourse mainly to the original memoirs; however I
must mention, in addition to the previously cited works, the Fondamenti per la teorica
delle funzioni di variabili reali, (See [8]) of M. Ulisse Dini, and the Cours d’Analyse de
l’École Polytechnique ([12] and [13]) of M. Camille Jordan. Finally, I have to thank M.
Borel for the pieces of advice he gave me during the course of correcting the proofs.
Two years ago, when the house of Gauthier–Villars informed me that the first edition of
these lessons was out of print, I was very perplexed. How can this book retain its character
as a review of the main concepts of the integration and of the results acquired in the search
of primitive functions, without incorporating in it the numerous works published on these
subjects over 23 years?
I had to choose. I resolutely discarded anything that would not directly contribute to
“making it more understandable”. For example, if I discussed term–by–term integration
of a series, it was because the possibility of this operation originates directly from those
properties which characterise the integration and shed light on them. When one consid-
ers integration as the sum of an infinite number of indivisibles, one uses an extension
of the notion of sum which, in some respect, is comparable to the one that gives the
sum of a series. These two extensions are intimately linked. But I did not discuss the
method of integration by parts and by substitutions, the second mean value theorem, the
inequality of Schwarz, and its generalisations, which are nevertheless indispensable for
the mathematical use of integration.
Generalisation is one of the best ways of “making people understand” mathematics.
Whereas, in the particular case, one gets stuck into the observation of facts which are spe-
cific to this particular case. In the general case, there is nothing more to observe, than the
facts on which one must reason. This approach leads to the same result as with axiomatic
definitions, albeit in a logically less precise, but much more lively and suggestive manner!
However, I have not treated the functions of several variables, because the reader of
this collection could refer to an excellent book by M. de la Vallée Poussin. Additionally,
it has offered me a generalisation of a much broader integral: the integral of Stieltjes.
Indeed, it is almost a misconception to treat the integral of Stieltjes by restricting it
to the functions of a single variable. Yet I thought I could do so. I contended myself by
indicating the general physical meaning of the integral of Stieltjes.
When I shifted from the point of view of the quadrature and took up the perspective of
primitive functions, I no longer had a choice. I had to talk about the work of M. Denjoy,
vii
viii Preface to the Second Edition
which is fundamental and decisive enough, to have my undivided focus. While following
the ideas of M. Denjoy, I have deviated conspicuously from his presentation in form and
sometimes in substance. By doing so, I believe, that I have made this beautiful theory of
functions of one real variable more accessible and contributed to its better comprehension.
The totalisation of M. Denjoy essentially uses the transfinite induction. Therefore, I
had to use the transfinite induction more deliberately than I had done in the first edition.
Although this first edition has appeared, to some, audacious and purposely filled with
outrageous novelties, it was the work of a timid man who, out of the seven chapters he had
written, had devoted six to the exposition of the previous research before addressing the
work that was considered revolutionary. If he did so, it was not out of propagandist skill
seeking to recruit followers for revolution, but rather to reassure himself. He believed
in fact, and still believes, that to do useful work it is necessary to walk on one of the
paths opened by previous works. Otherwise, the risk is too great, of creating a science
unrelated to the rest of mathematics. Therefore, he endeavoured to extract ideas that had
consciously or unconsciously guided, the mathematicians in the study of integration, their
ideals in this field, in the words of the late P. Boutroux, and to show that his personal
ideas were closely connected with those of his predecessors.
It is with the same timidity that previously I discussed the transfinite numbers. I was
able to proceed by allusions and affirmations because I was only using transformations
of simply infinite series into more complex series provided by the method of chains of
intervals. But, for M. Denjoy’s totalisation, I had to develop the “Note” that I had devoted
to transfinite numbers.
From this “Note” it follows, in particular, that I could have avoided the use of chains
of intervals and, consequently, no longer relied on the transfinite numbers in many places
of this book. I thought there would be drawbacks and some hypocrisy in doing so. Let
me explain by analogy. The infinitesimals were once obscure entities that appeared in
imprecise and inaccurate statements. Everything became clear, thanks to the notion of
limit. We can, therefore, do away with the notion of infinitesimals. But on the other hand,
there is no longer any obscurity in using them. And wouldn’t it be somewhat hypocritical
to discourage others from using the suggestive and convenient language of infinitesimals
while continuing to use it oneself for reasoning? The chains of intervals are used quite
naturally. The transfinite numbers are an excellent mathematical tool. It is advisable to
get accustomed to using them.
To better understand the totalisation of M. Denjoy, I generalised it in a manner as
Stieltjes generalised the ordinary integration. This led to the problems whose solutions
are still awaited.
But what is the purpose of these studies? They would have been very useful even
if their only effect had been of fixing our attention on integration and differentiation,
sufficient for us to recognise this: integration is always an operation similar to the one
required to calculate the amount of heat necessary to raise the temperature of a body by 1
degree, as a function of the masses of its part and their specific heats; the differentiation is
Preface to the Second Edition ix
the inverse operation. These operations connect two quantities attached with these bodies
and a function attached to the points of these bodies.
“How, one could say, you did not know that?” Do not expect to get my confession so
easily: “I knew it, I knew it very well.” However, if I had known it in 1903 as perfectly
as I do now, I would not have omitted to discuss the complete Stieltjes integration in the
first edition of this book. And we must believe that this omission did not generally seem
very serious because none of those who did me the honour of reviewing my book pointed
it out.
I said integral of Stieltjes; shouldn’t have I said integral of Cauchy? Cauchy, in fact,
very clearly talked about the importance and the physical meaning of the new integration
taken in all its generality, whereas Stieltjes primarily, logically defined the new operation,
but only in case of one variable. I did not think it necessary to change the adopted ter-
minology. If I had done so, would it have been necessary to take the name of the first
inventor currently known or the name of the one who gave integral its widest definition,
currently known? In any case, the attribution would have been inaccurate and unjust; it is
better to stick to the established inaccuracies.
This provides me with an opportunity to apologise for the omissions and commissions.
I only wanted to give some starting points for bibliographical research. I did not try to
summarise by the history of intensive developments of the notion of integration during the
past 20 years. Nor do I claim to succeed in listing everything I have borrowed from various
recent works on the theory of functions of real variable; they have all been consistently
helpful to me.
I thank M. Vasilesco who kindly helped me in the correction of the proofs.
This book presents an English translation of Lebesgue’s renowned work in French, Leçons
sur lintégration et la recherche des fonctions primitives, second edition, published in 1928
by Gauthier–Villars, [19]. The original work in French was an outcome of a course
conducted by Mr. Henri Lebesgue at the Collège de France during the academic year
1902–1903. He was in charge of the course, which was sponsored by the Peccot family.
Across 20 lessons, he imparted ideas that have been compiled into this book, emanating
from the content covered in that course. The focus of the course lay in the development of
the integration concept. However, rather than providing a comprehensive historical sum-
mary, only a selective account was offered during those twenty lessons. These results
specifically concern the integration of functions involving only one real variable, par-
ticularly those which were easily generalisable. This translation is supplemented with
a comprehensive historical summary, which may be essential for a reader of the 21st
century.
In the year 1900 and around that time, the idea of generalisation held the utmost
significance. Lebesgue explicitly stated a preference for following established paths rather
than forging entirely new mathematical avenues. However, what started with a humble
beginning, culminated ultimately in pioneering a new branch of mathematics. Initially, it
was a humble beginning. He commenced by revisiting the “numbers of analysis” such as
definite integral, lengths of curve, area of domains, etc., previously addressed in Camille
Jordan’s work.1
While Lebesgue did not aim to conduct a comprehensive historical survey of the devel-
opment of functions, their integration, Fourier series convergence, etc., the translator
endeavours to provide a historical perspective within this book to ensure its self–con-
tained nature. This historical context and analysis, pertinent to the subsequent chapters,
are detailed in the first chapter of this book.
It is important to take note of the mathematical backdrop against which Lebesgue laid
the foundations of his seminal work. The developments commenced with the definition of
xi
xii Preface by the Translator
the term “function” and gradually progressed to the integration of functions. Numerous
mathematicians, including Euler, Descartes, Lagrange, Cauchy, D’Alembert, Bernoulli,
Fourier, Darboux, Cantor, Dirichlet, Hankel, Riemann, Jordan, Borel, Lebesgue, Baire,
Stieltjes, Denjoy, and others, contributed to these advancements. Initially, the term “func-
tion” was narrowly defined. It was perceived as a representation achievable through an
equation with various terms linked by algebraic signs. Furthermore, these terms comprised
distinct functions like radicals, exponents, trigonometric expressions, and so forth.
The primary focus was on determining which class of functions could be integrated and
understanding the integration process. During Newton’s era, integration was viewed as the
inverse of differentiation, while in Cauchy’s era, it evolved into a process involving a limit
of sums. It was first established that continuous functions were integrable. However, the
scope of integrable functions expanded over time, encompassing functions with a finite
number of maxima, minima, and a finite number of discontinuity points. Subsequently,
Riemann’s integration emerged, encompassing functions with a countably infinite number
of discontinuities distributed in a specific manner within a function. Although the set
of discontinuities was extensive (comprising a countably infinite number), it remained
“reducible” or of “measure zero”. This signified that the length of intervals containing
discontinuity points could be arbitrarily small.
These discussions culminated in the measure–theoretic estimation of the size of a
set, leading to the development of integration within the realm of measure theory. This
introduced the concepts of measurable and summable functions, signifying a departure
from geometric quadrature and the Cauchy–Darboux type of sums and limit processes.
Although the concept of limit of sums persisted in measure–theoretic integration, it
differed from the Cauchy–Darboux sums, which were termed “integral by excess and
defect”.
It was D’Alembert and then Bernoulli who, while working on the wave equation,
introduced the concept of representing a function through the summation of an infinite
trigonometric series. Bernoulli’s work was significantly supported by Fourier’s theory
presented in 18222 .
These advancements led to the idea of representing a function via a trigonometric
series, further advancing the abstraction of the concept of a function. This develop-
ment brought forth various questions concerning the convergence of such trigonometric
series, now known as Fourier series. Different types of convergence emerged, including
point-wise convergence, uniform convergence, absolute convergence, almost everywhere
convergence, and the convergence of square–integrable functions (Lennart Carleson
1966), among others. Accompanying this was the question of the term–by–term inte-
grability of series, which revolved around whether one could interchange the integration
and summation signs and, if so, under what conditions. This query was profound because
properties such as continuity or integrability held by the terms of a series did not automat-
ically transfer to the sum function; special conditions were necessary. It was demonstrated
2 See [9].
Preface by the Translator xiii
that while the term functions of a series were Riemann integrable, the sum function was
not. To establish the integrability of the sum function, prior assumptions were needed
regarding the sum function (Osgood and Arzela). However, Lebesgue’s definition of
integration overcame this challenge. A sequence of bounded and Lebesgue integrable
(summable) functions converged to bounded Lebesgue integrable (summable) functions.
This way, Lebesgue expanded the definition of integrable functions, broadening the class
beyond what Riemann had established.
It was observed that certain functions were differentiable and yielded bounded
derivative functions upon differentiation. However, peculiarly, these derivative func-
tions were not integrable according to Riemann’s definition. This problem was resolved
through Lebesgue’s integration definition, ensuring that every bounded derivative became
integrable. Consequently, this solution addressed the fundamental theorem of integral
calculus. Thus, Lebesgue restored the integration process as the reverse of differentiation.
The development of the integration theory is intricately linked to the theory of measure
and emerged due to a sheer dearth. Camille Jordan undertook the task of defining a
set’s extent, while Borel further elaborated on the concept of measure, later expanded
by Lebesgue. As a precursor to measuring the set (an estimate of its size), set–theoretic
approximations were made regarding the sizes of sets, including nowhere dense sets and
everywhere dense sets within an interval. This estimation of size aimed to assess the size
of a function’s singularities within the set.
Lebesgue delved into the rectification of curves. An integral formula was devised
to calculate the arc length of curves characterised by continuously varying Tangents-an
approach known as the analytic method. Another method, without presumptions about
the curve’s tangents, emerged to measure the arc length. This method approximated
curve lengths by inscribing polygons along the curve, attaining a limit when the poly-
gon perfectly matched the curve. This process yielded valuable deductions, such as the
approximation of continuous functions with polynomials and the concept of functions
with bounded variation. Functions of bounded variation proved instrumental in solving
the fundamental theorem of integral calculus and determining the primitive function of
a bounded derivative function. It emerged as a necessary and sufficient condition for the
existence of a primitive function derived from a bounded derivative function. Additionally,
the concept of functions with bounded variation had implications in the Stieltjes integral,
where the integrator function was considered a function of bounded variation.
All in all, this book is a valuable addition to the study of mathematics, as it presents
the original work of the creator of the theory of integration and is now made accessible to
English–speaking students worldwide. The theory of integration was in its infancy during
the time when this work was written, and the author has provided references to the original
work of several eminent mathematicians of that era. The relative simplicity of the theory
at that time makes it easier to understand its intricacies without being overwhelmed by a
multitude of tools. The process of building a theory from the ground up can be likened
to constructing an intricate building. This book provides insight into the floor plan, the
xiv Preface by the Translator
placement of different pillars, the location of stairs, the depth of the foundation, and
the overall layout of the structure. This gives readers a glimpse into the mind of the
author and how great minds worked during that era. The author’s work strikes a delicate
balance between straightforward ideas and intricate, sophisticated, abstract, esoteric, and
specialised concepts.
Moreover, this translation preserves the philosophical and intuitive style of Lebesgue’s
original writing, which reflects the cultural and intellectual milieu of early 20th-century
mathematics. While the verbose and detailed explanations may feel unfamiliar to modern
readers accustomed to concise and formal texts, the historical and pedagogical value of
this approach is undeniable. With time, readers will grow accustomed to this style and
come to appreciate its depth and clarity, offering a unique window into the evolution of
mathematical thought.
This book will be useful for students at all levels, including those taking their first
course in analysis, advanced students, book collectors, professors, and general book lovers
who appreciate the aesthetic value of great mathematical works. In conclusion, this book
is a valuable resource for anyone interested in the theory of integration, its historical
development, and the mind of its creator. The journey through the book promises to be
rewarding, and I hope it will inspire and develop interest in the subject among readers.
xv
xvi Acknowledgements
1 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Development of Concept of Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Euler’s Concept of Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Work of D’Alembert and Solution of Wave Equation . . . . . . . . . 6
1.2.3 Advent of Daniel Bernoulli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Fourier’s Ideas in His Analytic Theory of Heat . . . . . . . . . . . . . . 7
1.2.5 Work of Dirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Riemann’s Ideas of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Hankel’s Work: Oscillation of f , Measure Theoretic
and Topological Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Darboux Upper and Lower Sums . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Ulisse Dini and Others: Counterexample . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Criticism of Riemann’s Definition of Integration . . . . . . . . . . . . . 16
1.4 Work of Camille Jordan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 Notion of Extent of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.2 Integrable and Bounded Functions . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Work of Borel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.1 Synthesis of Borel Sets: Two Examples . . . . . . . . . . . . . . . . . . . . 22
1.5.2 Definition of Measurable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Work of Lebesgue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6.1 Problem of Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Exterior and Interior Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.3 Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.4 Borel Measurable Sets: A Procedure . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.5 Relation Between Jordan and Lebesgue Measurable Sets . . . . . 31
1.6.6 Relation Between Borel Measurable Set and Jordan
Measurable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
xvii
xviii Contents
1.1 Introduction
In the social and cultural landscape of mathematics in which Lebesgue’s1 personality came
to prominence on the world stage, the following mathematical developments were present:
1. Riemann’s theory marked the pinnacle of generality in the realm of integration. His
integrability criterion was the weakest under which the traditional definition, rooted in
Cauchy sums, maintained its validity; it was so weak that the concept of integration could
be extended to functions with a dense set of points of discontinuity—functions whose
existence most mathematicians had not even considered. Therefore, until Cauchy’s sums
were the only approach to defining the integral, further generalisation of the integral
was not conceivable. In this context, measure-theoretic concepts emerged as significant
players, providing a novel foundation for defining Riemann’s integral.
2. Fourier’s Problem of .1822: Using a trigonometric series, Fourier sought to represent an
arbitrary function. He had an insight that the coefficients of the series could be determined
through term-by-term integration of the series against the .cos kx and .sin lx functions,
using the orthogonality of trigonometric functions. His problem can be summarised as
follows:
M. If a trigonometric series can represent a bounded function . f , is that series the Fourier
series of . f ? A related problem is:
N. When is the term-by-term integration of an infinite series of functions permissible?
In other words, when is it true that
1 Lebesgue: born in Beauvais, France, on 28 June 1875; died in Paris, France, on 26 July 1941.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 1
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_1
2 1 Historical Perspective
∞
∞
β β
. f n (x) d x = f n (x) d x?
α n=1 n=1 α
Fourier had assumed that the answer to . N was Always and had used . N to prove that the
answer to . M was Yes. By the end of the .19th century, it was recognised that . N was not
∞
always true even for uniformly bounded series precisely because . f (x) = f n (x) need
n=1
not be Riemann integrable; moreover, the positive results that were obtained required
extremely tedious efforts.
In .1829, Dirichlet was the first one to give a correct proof that for certain functions, the
associated Fourier series converges to the function. He had to assume the function to be
continuous and add a more stringent condition that the function had only a finite number
of maxima and minima in the interval of periodicity.
Riemann took up this case further in .1854. He carefully analysed the results of Dirichlet.
He suggested that different types of convergence may be taken into consideration. These
developments, however, paved the way for Lebesgue’s elegant proof that . N is true for
any uniformly bounded series of Lebesgue integrable functions . f n (x). By applying this
result to . M, Lebesgue could affirm Fourier’s belief that the answer is Always. Before the
introduction of Lebesgue integration, when dealing with series of Riemann integrable
functions, it was necessary to assume the uniform convergence of the series. In .1870,
Eduard Heine stated the following.
Up to the present time, it was believed that the integral of a convergent series, the terms of
which remain finite between finite limits of integration, would be the sum of the integrals of
the individual terms. M. Weierstrass observed first that the proof of this theorem demands
that the series in the limits of integration not only converges but also converges uniformly.
So much so, even Cauchy fell into the error of assuming that the sum of a convergent series
of continuous functions shared the common properties of its terms and, accordingly, was
continuous and integrable (if the term functions were integrable).
3. Fundamental Theorem of Integral Calculus: the assertion
β
. f d x = f (β) − f (α),
α
is called the fundamental theorem of integral calculus. With Riemann’s definition of inte-
gral, it was known that there exist non-integrable derivative functions. Thus, as defined
by Riemann, integration does not always solve the fundamental theorem of integral cal-
culus. Ulisse Dini and Vito Volterra showed that there exist functions with bounded, and
non-integrable derivatives. Volterra constructed an example in Giornale di Battaglini,
.1881 of such a function:
1.1 Introduction 3
Example (Volterra)2 : Let. E be a perfect non-dense set which is not an integrable group.3
Let .(a, b) be an interval contiguous to . E, and let us consider the function
1
ϕ(x, a) = (x − a)2 sin
. ;
x −a
its derivative vanishes an infinite number of times between .a and .b. Let .a + c be the
greatest value of .x, x ≤ a+b
2 at which .ϕ vanishes. With this in mind, let us define a
function . F(x) by the following conditions: it is zero at the points of . E. In every interval
.(a, b) contiguous to . E, it is equal to .ϕ(x, a) from .a to .a + c; from .a + c to .b − c, the
function . F is constant and equal to .ϕ(a + c, a); from .b − c to .b, . F is equal to .−ϕ(x, b).
This function . F(x) is continuous, it has a derivative . F (x), and the derivative . F is
bounded. However, in the sense of Riemann, . F is not integrable, because at every point
of . E, the maximum of . F is .+1, and its minimum is .−1. Since this is the case at point
. x = a for the function .ϕ (x, a); however, by hypothesis . E is not an integrable group.
them. He then established that for any given .ε, the polygons . S and . S are such that each
point in the annular space that separates them is at a distance less than .ε from .C. In
this way, the length of curve .C was approximated by the perimeter of the polygons, and
limit was taken as described by the following procedure. Jordan considered the curve .C
defined by parametric equations
Then he divided the interval .[t0 , T ] by points .t0 < t1 < t2 < · · · < tn < T which corre-
sponded to the points .(x0 , y0 ), (x1 , y1 ), . . . (xn , yn ), (X , Y ) on the curve .C. The perime-
ter of the polygon inscribed in the curve, of which the above points are vertices is given
by
. (xk+1 − xk )2 + (yk+1 − yk )2 .
Jordan asserted that if this sum tended towards a definite and constant limit when the
interval .T − t0 decreased indefinitely, then this limit would represent the length of the
curve’s arc corresponding to this interval.
Similarly, if the above curve has continuously varying tangents, Jordan gives a method
of finding the length of the curve using the differential arc length of the curve, which is
. ds = x 2 + y 2 dt;
and the length of the curve in the interval .[t0 , T ] will be found by integrating the differ-
ential arc length between the two limits
T
s=
. x 2 + y 2 dt.
t0
5. Area of domains. Jordan considered plane domains bounded by one or several curves
with continuous derivatives except, possibly, at a finite number of points. He defined the
curves parametrically as above.
extended to the interior of this domain. He also gave a procedure to reduce this double
X
integral to a single integral using the formula . x0 y d x which yields the integral in
parametric form as
T
. ϕ(t) f (t) dt.
t0
Jordan discussed in his Cours D’Analyse [2], p. 132 and the following pages, calculation
of the area of more general domains which were bounded by several rectifiable lines,
each of which was decomposable in several partial arcs, such that in each of them .x
varies in the same direction
xi+1
. d x(−y1 + y2 − y3 + · · · ),
xi
where .(xi , xi+1 ) is the interval where these partial arcs are taken. He further supposed
that if along each of the arcs . A1 , A2 , . . . , An there existed tangents whose direction
varied continuously with the arc .s then the integral depicting the area of the domain
would take the form
. y cos θ ds,
1.2 Development of Concept of Function 5
where .θ is the angle between the . y-axes and tangent to the arc. The above analysis was
done for the curves without multiple points. After the advent of the measure theory of
Borel (measurable Borel sets) and the subsequent work of Lebesgue, the area of the plane
domains was defined as measure of the set of points constituting the plane domains.
Although not originating with Euler, the conceptual groundwork for functions was found to
have been remarkably elevated through his profound insights, transforming calculus into a
formal theory of functions. In his seminal work, Introductio in analysin infinitorum, pub-
lished in 1748, Euler defined a function of variable quantity as an analytical expression
constituted by the variable itself and constants. This pioneering definition hinged upon the
fundamental notion of an analytical expression, a concept Euler deemed fundamental to all
known functions.
However, Euler’s significant contribution went beyond a mere definition; he began a shift
in perspective that eventually laid the foundation for the modern understanding of functions.
Notably, his ground-breaking work in 1734 on partial differential equations marked the
inception of arbitrary functions within integral solutions. This innovation didn’t come with-
out debate. When challenged by Jean D’Alembert, who argued that these arbitrary functions
required representation by a single algebraic or transcendental equation for proper mathe-
matical analysis, Euler countered. He asserted that the curves depicted by these arbitrary
functions need not adhere to any predetermined law. Instead, they could possess irregular-
ities and discontinuities, potentially composed of various curve fragments or traced freely
across the plane.
Euler’s terminology and conception of discontinuity were distinctive. He perceived a dis-
ruption in the analytical form of the functional relationship, allowing for functions that might
be continuous in the modern sense yet discontinuous within Euler’s framework. However,
the notion that arbitrary functions could exhibit modern-style discontinuity at multiple points
within a finite interval did not garner serious consideration during Euler’s time. Instead, the
focus lay primarily on the absence of unique determination by a single equation for arbitrary
functions, rather than their properties as mappings .x → f (x) between real numbers.
Euler’s visionary perspective on functions laid the groundwork for a profound shift in
mathematical thinking, setting the stage for the evolution of function theory as we understand
it today.
This debate on the structure and definition of function further manifested itself in the
solution of the wave equation, which D’Alembert proposed.
6 1 Historical Perspective
where the functions . f and .g possess sufficient smoothness, as interpreted in the modern
terminology. However, it became evident that Euler did not consider this solution the most
general solution of the wave equation. Euler argued that the string’s position at any time
(i.e., the progression of waves within the string) relied on the initial position (at .t = 0) and
the initial velocity (at .t = 0). This initial shape of the string could take any form.
Euler’s insight into the wave motion within the string highlighted what we now refer to
as “initial data” or conditions concerning the position and velocity of the string. The initial
shape of the string might manifest as a continuous curve, multiple continuous curves linked
end-to-end, or even exhibit discontinuities at the ends. According to Euler, this array of
possibilities couldn’t be encompassed within just two functions . f and .g. He asserted that
these functions alone could not accommodate the discontinuity in the ‘analytic expression’.
Hence to Euler, the solution of D’Alembert was a particular one. But D’Alembert contended
that the functions should be continuous, meaning that a single equation for the function
should suffice.
This raised a crucial question: could . f and .g from D’Alembert’s solution be represented in
∞
trigonometric form, . an sin nπx l ? In other words, does an arbitrary function allow such a
1
trigonometric expansion? Is this true for any function?
Surprisingly, it was observed that even non-periodic functions. f and.g could be expanded
into periodic trigonometric functions.
In this context, Euler found Bernoulli’s solution more particular and of limited utility
than D’Alembert’s. Thus, D’Alembert and Lagrange joined Euler in contesting Bernoulli’s
solution. However, this controversy persisted without a definitive resolution until .1807. It
was not until a quarter-century after Bernoulli’s passing that Fourier introduced some of
these concepts in his early communications to the French Academy, later expounded upon
in his “Analytic Theory of Heat” in .1822.
Fourier did not seek a rigorous proof but offered concrete examples that validated the potency
of his assertion. The precise constraints needed to make this claim entirely accurate remained,
and to some extent, still remain for subsequent mathematicians to determine.
Fourier’s findings not only upheld Daniel Bernoulli’s claim regarding his series but also
surpassed it by revealing the series’ true potential. It swiftly dismantled Euler’s and his
contemporaries’ belief that a mathematical function could be continuously extended in only
one way beyond its defined interval. Furthermore, Fourier’s examples indicated that the
curve represented by his series could comprise disconnected segments—pieces devoid of
logical or definitional linkage, not even connected sequentially at their ends.
Fourier’s assertion made it evident that the power of representation through analytic
expression was at least as formidable as the capacity for geometric visualisation.
Once it was understood that mathematical expression could adapt to vastly different
and unrelated requirements, it became apparent that there was no logical endpoint until the
definition we now accept for a function of a real variable, often referred to as the Dirichlet
definition of a function. In this definition, if every value of .x in an interval corresponds to a
definite value of . y, regardless of how fixed or determined, then . y is termed a function of .x.
For example, . y = 1 at all rational points in any interval, while being .0 at irrational points.
When it was established that an arbitrary function can be represented by a Fourier series
of the form (1.5), the question arose of how to find the coefficients .an and .bn of the series for
a given function. Fourier evaluated these constants using Euler’s method (Fourier himself
8 1 Historical Perspective
admitted this!) and the orthogonality of trigonometric functions. He used the following
integrals. ⎧
⎪
⎪ 1 +π
⎪ an =
⎨ f (x) cos nx d x,
π −π
. +π (1.6)
⎪
⎪ 1
⎪
⎩ bn = f (x) sin nx d x.
π −π
Once a method of determining constants .an and .bn was found, the next question that arose
was the convergence of the Fourier series (1.5). Dirichlet later completed this task. It was
Dirichlet who supplied the requisite sufficient conditions for the convergence of the Fourier
series to the given function, and hence the truth of the theorem that hitherto remained to
be ascertained. In his remarkable Memoir [3], known for its clarity and rigour, Dirichlet
provides sufficient conditions for the convergence of Fourier series.
Fourier analysis elucidated the following:
1. Earlier, it was thought that with a given mathematical expression, a curve defined on
an interval can be extended in a unique manner beyond the interval of definition. If
it were done in another manner, it would disobey the mathematical expression which
defined it in the interval of definition. But now, with the advent of Fourier analysis, the
given curve could be extended beyond the interval of definition in any manner. There
would be no violation of the given mathematical expression. Indeed, now there existed
no definite mathematical expression, since all such arbitrary mathematical expressions
were admissible which had Fourier expansion, and their coefficients could be found by
the integration method.
2. The arbitrary curve representing the initial shape of the string could consist of sepa-
rate pieces of any kind, which were independent of each other. They may not even be
connected to each other at the end points. Such was the generality provided by Fourier
analysis of the initial shape of the string. Thus, the arbitrariness of the function was
justified. In other words, Fourier analysis widened the class of functions which could
be admitted as the initial shape of the string and the solution of the wave equation. This
accomplished the dual task of defining a function and widening the class of functions
admissible as the solution of the wave equation.
3. This analysis freed the functions from the fetters of Analytic expressions using variables
and constants and connecting symbols such as addition, subtraction, division, roots,
exponents, etc. In this whole analysis one cannot miss to notice that the widening of the
admissible class of functions is done through the integration present at that time. Indeed,
such analysis pointed out that all those functions that are integrable over the interval of
definition of the shape of string, against the trigonometric functions such as .sin and .cos,
and yield a finite result, are admissible.
1.2 Development of Concept of Function 9
Fourier’s announcements opened up the field of research in this direction. Many mathemati-
cians of the time became engaged in researching on the convergence of the Fourier series.
Particularly notable is the work of Dirichlet in his 1829 paper. To begin with, Dirichlet, in
his analysis, admitted functions which were bounded, piecewise monotonous, and piecewise
continuous. He further loosened the conditions and allowed the function to have infinite dis-
continuities even in a finite interval. He noted, if for any .a, b in .[−π, +π] there exist some
.r , s such that the function is continuous in the interval .(r , s).
4
It is necessary that then the function .ϕ(x) be such that, if one denotes by .a and .b, any two
quantities between .−π and .+π, it is always possible to place between .a and .b, other quantities
.r and .s close enough together so that the function remains continuous in the interval from .r to
.s.
Then, for such functions, the Fourier series would converge. It was here that he constructed
his famous example of very discontinuous function that could not be integrated in Riemann
sense. It was constructed out of the sheer necessity of being able to integrate the function in
the series.
c if x ∈ Q,
. f (x) = (1.7)
d if x ∈ R \ Q.
Dirichlet states that such functions cannot be substituted into the series because the terms
which are definite integrals would lose all significance.5
Easily understood is the necessity of this restriction when considering that the different terms
in the series are definite integrals, reverting to the fundamental notion of integrals. Then, it
becomes clear that the integral of a function only holds meaning if the function satisfies the
previously stated condition. An example of a function not meeting this condition would be if
we supposed .ϕ to be equal to a determined constant .c when the variable .x assumes a rational
value, and equal to another constant .d when the variable is irrational. The function defined in
this manner has finite and determined values for any .x; however, it cannot be substituted into
the series because the different integrals involved in this series would lose all meaning in this
scenario. The restriction I’ve specified, along with the requirement not to become infinite, are
the only ones that apply to the function .ϕ. Any cases not excluded by these restrictions can be
reduced to those we have considered earlier.
Therefore, Dirichlet continued his work in this direction and tried to find out the extent to
which discontinuities may be allowed in the function so that its integrability would not be
compromised.
4 See [3].
5 See [3].
10 1 Historical Perspective
n
. S= δi f (xi−1 + δi εi )
i=1
where .δi = (xi − xi−1 ), (i = 1, . . . , n), and he called .εi , “positive true rationals”. These
b
sums depended on.εi and.δi . Riemann defined. a f (x) d x as the fixed limit. A towards which
these sums converge when .δi converges towards zero and .δi and .εi are chosen arbitrarily.
Here, Riemann diverged from Cauchy in that, while Cauchy chose only continuous func-
tions, he introduced integrable functions. He stated that if the above sums failed to converge
b
towards a unique limit, the expression . a f (x) d x was meaningless. For this reason, Rie-
mann limited himself to bounded functions, sparking the inquiry into which functions are
integrable—i.e., for which bounded functions do the sums uniquely converge.
Riemann answered this question in terms that may be broadly summarised as follows:
Those functions are integrable for which the points at which the function’s oscillation is
greater than a given .ε > 0 can be enclosed in intervals whose total length can be made as
small as desired. This statement provided a sufficient condition of integrability, particularly
for functions with poles.
In his theory, integrable functions constituted a category where a function’s points of dis-
continuity were dense in every interval, irrespective of its size—though current understand-
ing suggests these functions are not entirely discontinuous. Riemann offered an example
of an integrable function defined by a convergent series, having discontinuities for specific
rational .x values.
(x) (2x) (3x)
.1 + + 2 + 2 + ···
12 2 3
where .(nx) denotes the positive or negative difference between .nx and the nearest integral,
unless .nx falls halfway between two consecutive integers. However, when .nx falls midway
1.3 Riemann’s Ideas of Integration 11
between two consecutive integers, .(nx) is set to .0. The series sum exhibits discontinuity for
p
every rational .x of the form . 2n where . p is an odd integer coprime to .n.
Examples like this, and other integrable functions exhibiting an infinite number of max-
ima and minima, non-representable by a Fourier series, were thought-provoking. These
inquiries deeply impacted Hermann Hankel, leading to his significant memoir “über die
unendlich oft oszillirenden und unstetigen Functionen” (On Infinitely Oscillating and Dis-
continuous Functions). Hankel introduced the principle of “Condensation of Singularities,”
a contribution crucial to the independent theory of functions of a real variable. This distinc-
tion equally credits Riemann, historically one of the initiators of this theory, through his
work on integrating discontinuous functions in papers related to “the representability of a
function by Fourier series.”
Riemann’s example remains notable for mathematically formulating a discontinuous
function that cannot be represented graphically.
Hankel also introduced the concept of a jump set of a function, defined as:
where . J f (x) represents the jump of . f at .x, which is greater than some .σ > 0.
Relationship Between Measure-Theoretic and Topological Sizes
Hankel aimed to establish a direct relationship between the measure-theoretic size and the
topological size of a set. He posited that “topologically small sets”, which are nowhere dense,
should correspond to sets with “measure zero”. Conversely, sets with measure zero should
be considered topologically small. Hankel provided the following definitions in .1870:
1. Dense Set: An ensemble (“schaar”) “fills up an interval”, “if in the distance there is no
interval however small not containing at least one point of the ensemble”.
12 1 Historical Perspective
2. Nowhere Dense Set: A set lies “not filled up” but “scattered” on the interval, “if between
any two points, however close, of the distance there is always an interval containing no
point of that ensemble”.
Hankel used the above definitions to characterise the sets . Sσ ( f ), σ > 0. He arrived at the
following statements, one of which was later proven incorrect:
1. If. Sσ ( f ) is nowhere dense in an interval.(a, b), then the total length of the partial intervals
. I with .ω f (I ) > 2σ can be assumed to be arbitrarily small.
2. Conversely, if the total length of the intervals . I with .ω f (I ) > σ can be assumed to be
arbitrarily small, then . Sσ ( f ) is nowhere dense in .(a, b).
The first statement is, in fact, false. If it were true, then by Riemann’s criteria, any function . f
with its jump set . Sσ ( f ) being nowhere dense (as per the first statement) would be integrable.
However, in .1875 H.J.S. Smith, an English mathematician, found functions with nowhere
dense sets of discontinuities that were not integrable, thereby providing a counterexample.
Darboux, in his paper,6 commented on the work of Hankel. He stated:
.· · ·
One of them, Mr. Hankel, published in .1870 the results of his studies on Riemann’s Mem-
oir. Unfortunately, the conclusions of his work are not beyond reproach, and a distinguished
geometer, Mr. Gilbert, raised objections to Hankel’s demonstrations that could also be directed
towards the results. Hankel’s principle of the condensation of singularities is stated in too abso-
lute a manner, and, regrettably, that death did not allow this excellent geometer the time to
reconsider the propositions he had given and to limit them, making them immune to any objec-
tion. I have taken it upon myself to revisit some of these propositions, and I believe I have
rendered them immune to all criticism in the form I have presented them .. . ..
This discrepancy led to the development of the theory of content because the “topological
smallness” of sets was insufficient to determine the integrability, or lack thereof, of functions.
6 See [4].
7 See [4].
1.3 Riemann’s Ideas of Integration 13
Darboux introduced the notions of the upper bound . M, the lower bound .m, and the
oscillation . of a function . f over an interval . I . Here, the oscillation . is defined as . :=
M − m. He partitioned the interval .[a, b] into sub-intervals determined by points:
. M := M1 δ1 + · · · + Mn δn
m := m 1 δ1 + · · · + m n δn
:= 1 δ1 + · · · + n δn
His significant achievement was in demonstrating that. M,.m, and. converge toward definite
limits as .n approaches infinity. These limits depend solely on .a, .b, and . f . These limits are
expressed as:
b
n
. f (x) d x := lim m i δi
a n→∞
i=1
b n
f (x) d x := lim Mi δi
a n→∞
i=1
Darboux required that for a function . f , the oscillation tends towards zero:
implying:
b b
. f (x) d x − f (x) d x = 0
a a
b b
f (x) d x = f (x) d x
a a
According to this definition, every continuous function is integrable. If a given function. F(x)
on an interval .[a, b] is differentiable, with a bounded and integrable derivative function . f
(i.e., . F (x) = f (x) on .[a, b]), then . f satisfies the fundamental theorem of calculus:
14 1 Historical Perspective
x
. F(x) − F(a) = f (y) dy ∀x ∈ [a, b].
a
Thus, Darboux relaxed the conditions on. f . He assumed. f to be integrable, while Cauchy
had to impose more stringent conditions on . f , such as . f being continuous, for the Funda-
mental Theorem of Integral Calculus to hold.
Consequences of Darboux’s Definition
Various consequences can be deduced from the preceding definition of the integral.8
1. The integral of the function . f (x) will continue to exist and will not change in value if
the values of . f (x) are altered for a limited number of .x-values.
2. The integral x
. f (x) d x
a
is always a continuous function of .x.
3. Let x
. F(x) = f (y) dy,
a
and suppose that . f (x) is continuous at .x = x0 . Then, the function . f (x) will be the
derivative of . F(x) for all values of .x for which . f (x) is continuous.
4. Fundamental Theorem of Calculus, as shown above.
5. Let .ϕ(u) be a function of .u such that
ϕ(u) − ϕ(v)
.
u−v
lies between the two finite limits .α and .β when .u and .v take all the values between . A
and . B.
In this case, the oscillations of the function . f (x) always lie between . A and . B, and those
of the function .ϕ( f (x)) will clearly be of the same order in every interval. Consequently,
the two functions . f (x) and .ϕ[ f (x)] belong to the same class. For example, if a function
. f (x) is integrable, then the functions
.[ f (x)] , [ f (x)]n 1 + f 2 (x), . . .
2
highly discontinuous functions were constructed to demonstrate that even these functions
are integrable according to Darboux’s definition of integration.
He started with the discontinuous functions . f , such that . f (x + h) and . f (x − h) have
limits, when .h approaches zero through positive values, that are different from . f (x). For
instance, the function . E(x), which represents the greatest integer less than or equal to .x,
has the property that for any integer .n, we have
Similarly, .(x) is the function that represents the difference between .x and the nearest
integer. This function is indeterminate for .x = integer + 21 ; thus, he defined
1
(x) = 0 for x = integer + .
.
2
are functions .ϕn (x) for which .ϕn (x + 0) or .ϕn (x − 0) exist, then the series possesses the
same property, and we have
. f (x + 0) = f (x) = ϕ1 (x + 0) + · · · + ϕn (x + 0) + · · ·
. f (x − 0) = f (x) = ϕ1 (x − 0) + · · · + ϕn (x − 0) + · · ·
It shows that a uniformly convergent series in a given interval can, in some sense, be
treated like sums composed of a finite number of terms.
The preceding theorem allows us to define discontinuous functions within any interval.
Consider the series
∞
E(nx)
. f (x) = an
nx
n=1
If the series of constants
. A = a1 + a2 + · · ·
16 1 Historical Perspective
is absolutely convergent, the series . f (x) will have its terms bounded by those of the series
A, and its remainder, for any .x, will be less than that of . A. Hence, it will be uniformly
.
Ulisse Dini introduced the concept that, based on Riemann’s definition, there may exist
functions whose derivatives are not integrable. Specifically, it was proposed that if a non-
constant function . F defined on an interval .[a, b] has a derivative . f (where . F = f ) that
vanishes on a dense set in .[a, b], then . f cannot be integrable. According to Riemann’s
definition, if . f was integrable, it would satisfy the Fundamental Theorem of Calculus:
x
. F(x) − F(a) = f (y) dy;
a
However, since . f vanishes on a dense set in .[a, b], Riemann’s sums would yield:
x
n
. f (x) d x = lim f (ξi )δi = 0.
a n→∞
i=1
Mr. Jordan, in the second edition of his Cours D’Analyse [2], conducted an in-depth study
of these quantities. However, it seemed useful for me to revisit this study, and here is why. It
is known that there exist non-integrable derivative functions when one adopts, as M. Jordan
1.3 Riemann’s Ideas of Integration 17
does, the definition of integral given by Riemann. Thus, integration, as defined by Riemann,
does not always solve the fundamental problem of integral calculus:
Find a function given its derivative.
This led M. Lebesgue to propose a definition of integration that would allow the integration
of any bounded derivative. The integration of a bounded derivative would yield its primitive
function, considered as a function of the upper limit of integration.
2. Term-by-term integrability of series: Another issue arising from Riemann’s defini-
tion of integration was the inability to interchange the limit process and integration. Many
mathematicians of the time overlooked this problem, considering the interchangeability of
integration and the limit process as inherent. Thus, the term-by-term integration of a series,
∞
b ∞
b
. u n (x) d x = u n (x) d x
n=1 a a n=1
. f (x, y) da = f (xi , yi )a(Ri j ) where (xi , yi ) ∈ Ri j ,
A
Jordan introduced the concept of a Jordan ‘measurable’ set in 1892. He took motivation from
Cantor’s work in set theory and provided a more general theory than Riemann’s integration.
Jordan developed the idea of inner and outer extent as a generalisation of the concept of
upper and lower integrals of Riemann.
As discussed above, the theory of Riemann integration was not wholly satisfactory in
higher dimensions. The main problem arose in the fact that the sum of the areas of rectangles,
in a partition, that met the boundary of an arbitrary curve (or set) could not be made as small
as desired. Even in the one-dimensional case, it was shown that there were nowhere dense
sets that could not be enclosed in intervals of arbitrarily small lengths (see Example .1 of
Sect. 1.5.1). It was further shown with the help of Peano’s space-filling curves.
Camille Jordan explored the relationship between the ‘measure’ of sets and the inte-
grability of functions over those sets. He focused on domains where functions of multiple
variables could be integrated. For this analysis, Jordan partitioned the domain . A and con-
sidered two types of rectangles: those that did not contain any points from the complement
of the domain . AC and those that contained at least one point of the domain.
Using this approach, Jordan defined the inner and outer content of the domain. He pro-
posed that a domain is measurable if the inner and outer contents converge to the same value
as the size of the rectangles approaches zero through finer and finer partitions. Having estab-
lished the concept of domain ‘measure’, Jordan utilised Riemann-Darboux sums to define
the integral of a function, similar to the one-dimensional case. The only difference is that,
in the multi-dimensional case, the measure of the domain replaced the length of intervals
used in the one-dimensional case. With this background, it is appropriate to give the:-
1.4 Work of Camille Jordan 19
Jordan considered the sets of dimensions .1, 2, and .3 and called their extents the length,
area and volume.
For definiteness, he considered the sets of two dimensions, . E, and took its points with
coordinates.(u, v). He represented the points geometrically on a plane. Then, he decomposed
the plane into a grid of squares of side length .r .
He considered two types of domains
1. . S which is formed of the squares that were completely inside the set . E and
2. . S + S which is formed of squares completely inside the set . E or meeting the boundary
of . E.
He denoted the area of the two types of domains respectively by . S and . S + S itself. Then,
he varied the grid of squares by varying the edge length of the squares, by letting .r → 0,
and showed that the respective areas . S and . S + S tended towards fixed limits.
First, he considered the domain . S and kept .r ≤ ρ (a fixed number), demonstrating that
the area . S ≤ A. Similarly, for the domains . S + S , he showed that when .r ≤ ρ, the area
. S + S ≥ a. Thus, he concluded that as .r → 0, the area . S increases to a definite limit . A, and
the area . S + S decreases to a definite limit .a. He referred to . A as the interior area of . E and
.a as the exterior area of . E. He called a set to be quarrable when the area of the boundary
. S ≥ S1 + S2 + · · · ; S + S ≤ S1 + S1 + S2 + S2 + · · ·
. A ≥ A 1 + A 2 + · · · ; a ≤ a1 + a2 + · · ·
. A = A 1 + A 2 + · · · ; a = a1 + a2 + · · ·
After these analytic considerations, Jordan considered the decomposition of planes into
quarrable regions on exactly the same lines. These considerations were geometric in nature
and led him to define the interior and exterior extent of the sets of different dimensions. He
called a set to be measurable if the above two extents coincide.
20 1 Historical Perspective
Having defined the measure of a set, Jordan cleared the field for the integration of func-
tions. For further analysis of integration, he defined:-
Jordan started by defining the variable quantities .x, y, . . . as those quantities that are not
linked in any way and can take any value that can be taken when values of other variables
are fixed. A function . f is defined as a dependent variable on the values of .x, y, . . .. To each
value of .x, y, . . . in a set . E, there corresponds a value of . f . Having defined the functions in
this way, putting some restrictive conditions on them was pertinent. Therefore, the bounded
functions were defined.
Bounded functions: A function . f (x, y, . . .) is considered bounded in a set . E if the set
of values it takes for all points .(x, y, . . .) in . E is bounded.
He stated that the sum, the difference, and the product of bounded functions are bounded,
and if a function is bounded from below, then its reciprocal is also bounded. Jordan consid-
ered the bounded functions for the purpose of integration and began this analysis by defining
the:-
Darboux Type Sums
He took a bounded function . f (x, y, . . .) on a measurable domain . E, whose measure is
also denoted by . E. He decomposed . E into measurable elementary domains .e1 , e2 , . . ., and
considered the maximum value . M and the minimum value .m of . f on . E as well as on
these measurable elementary domains (respectively . Mk and .m k ). Darboux-type sums were
formed:-
.S = M k ek ; s = m k ek .
Using Darboux’s theorem, Jordan showed that the sums . S and .s tended towards the same
definite limits, irrespective of the method in which the domain . E was partitioned when the
diameter of the elementary domains tended towards zero.
Jordan showed that the sum . S is bounded from below. The lower bound of these sums . S
was called integral by excess. Similarly, the sums .s were shown to be bounded from above,
and their upper bound was called integral by defect. Jordan was able to calculate the number
of measurable elementary domains in which the domain . E was to be divided, depending on
the diameter of the domain and how close the sums . S (.s) should be to their lower bound
(upper bound), i.e., integral by excess (integral by defect).
Jordan further noted that any measurable domain . E can be decomposed into several
measurable domains . E 1 , E 2 , . . . which can be further divided into infinitely small elements
.e1 , e2 , . . . , allowing for the formation of individual sums . Mk ek for each element. The
corresponding sum for the entire domain . E is obtained simply by adding these partial sums.
Taking the limit of these sums, it was found that the integral by excess of a function . f over
1.5 Work of Borel 21
the domain . E equals the sum of the similar integrals for the individual constituent domains
. E 1 , E 2 , . . .. This result holds analogously for the integral by defect.
Furthermore, he constructed an increasing sequence of measurable domains . E 1 , . . . ,
. E n , . . . such that each domain is interior to both the following domains in the sequence
and the original domain . E. Additionally, the extents of these domains converge towards . E
itself. The integral of . f over domain . E (either by excess or by defect) is then equal to the
limit of the integrals of . f over the domains . E n as .n approaches infinity. This is because
the difference between two integrals is equivalent to the integral taken over the difference
of their domains, . E − E n . The absolute value of this difference is bounded by . L(E − E n ),
(where . L = Max.{|M|, |m|}) which approaches zero as .n approaches infinity.
Integrable Functions
Finally, an integrable function . f (x, y, . . .) in the domain . E was defined as one for which
the two integrals by excess and by defect
. T = lim Mk ek ; t = lim m k ek ,
coincide, i.e., for which .T = t. This limit was called the integral of function . f in the domain
E. It was represented by the notation
.
.I = f (x, y, . . .) de;
E
From the above analysis, it was clear that the value of the integral was independent of the
variables .x, y, . . . but depended only on the nature of . f and on the domain . E.
In the above analysis, Jordan’s approach was to enclose a set with finitely many rectangles.
However, these finitely many enclosing rectangles were not sufficient for further analysis.
Borel took a different approach towards defining a ‘measure’ from Jordan’s. He stated
some properties (see Sect. 1.5.2) that a measure should have, which are discussed in the
following section.
Borel designed a remarkably simple construction for measurable sets, ensuring complete
compatibility with the length of intervals. He crafted the definition to align with set-theoretic
operations like addition and subtraction with their corresponding algebraic counterparts in
measuring sets. Similarly, relationships like greater than or equal to or less than or equal
to between measures are reflected in corresponding set-theoretic operations like . A ⊆ B
or . B ⊆ A, where . A and . B are Borel measurable sets. This elegant construction laid the
foundation for further investigations into the intricate interplay between set theory and
measure theory.
22 1 Historical Perspective
Borel constructed a new type of set by removing countably many intervals from .(0, 1).
Specifically, he chose to remove intervals centered at rational points within the interval
.(0, 1). His primary interest laid in the study of uncountable perfect sets, which he found
simpler to study compared to non-perfect sets. This focus was rooted in the approximation of
algebraic and irrational numbers, naturally leading Borel to the consideration of uncountable
perfect sets.
In the third chapter of his book [6], Borel defined derived sets, closed sets, perfect sets,
and dense sets from a topological perspective. He provided insights into the size of sets
(whether they are large or small, countable or uncountable) through this viewpoint. A brief
outline of Borel’s procedure for synthesizing such sets is given as follows:
Example 1. A Perfect, Uncountable, and Non-Dense Set
1. Construction of set . A. Borel considered the rational numbers between .0 and .1, and
associated with each of them . qp the interval
p 1 p 1
. − 3, + 3 (1)
q q q q
In this way, he obtained a countable infinity of intervals situated on the line. Then, he
considered the set . A of those points of .(0, 1) which did not belong to any of these
intervals. He noted that each endpoint of any interval, having a rational coordinate, is the
midpoint of another interval; hence, it was not necessary to specify whether the endpoints
of an interval are considered part of it or not. In this way, he constructed the set . A.
2. Set . A is Non-Empty. Borel found a close connection between the consideration of set
. A and the question of the approximation of irrationals. To him it was clear that . A was
composed of the irrational numbers .ξ having the property that, for any . qp ,
p 1
. ξ − > .
q q3
The existence√
of.ξ was assured by the theory of continued fractions; such as, for example,
2
the number . 2 . But this theory highlighted virtually nothing about the set . A. Moreover,
it relied on specific properties of rational numbers and was not easily extended to the
approximation of irrationals by algebraic numbers of a determined class.
3. . A is Perfect and Non-Dense. Borel first demonstrated that the set . A was perfect. He
argued that if a point .a in . A did not belong to . A, then .a would have to be interior to
one of the intervals .(1) (without coinciding with its endpoints). This was absurd since
these intervals did not contain any point of . A. Thus, every point of . A belonged to . A.
1.5 Work of Borel 23
Borel further asserted that . A was not dense in any interval. If . A were dense in an interval,
it would include all points of that interval. This was also absurd, since the points . qp did
not belong to . A and there were some in every interval.
Thus, Borel provided an example of a perfect set . A, that was not dense in any interval.
Borel felt it important to delve deeper into the nature of such a set and, first of all, to
demonstrate that it was uncountable without resorting to the theory of continued fractions.
4. . A is Uncountable. Borel considered the interval .(0, 1) with length one. Then, he con-
sidered the interval .(a1 , b1 ) within the interval .(0, 1) with length .α1 . It was clear that the
set . A of points in .(0, 1) that did not belong to .(a1 , b1 ) was composed of all the points in
certain intervals whose total length was .1 − α1 . Assuming, for the sake of clarity, that
.0 < a1 < b1 < 1, these intervals were .(0, a1 ) and .(b1 , 1).
Next, he considered another interval, .(a2 , b2 ) with length .α2 , within the interval .(0, 1),
which had no common points with the interval .(a1 , b1 ). It was clear that the set . A of
points in the interval.(0, 1) that did not belong to either.(a1 , b1 ) or.(a2 , b2 ) was composed
of all the points in a finite number of intervals, with a total length was .1 − α1 − α2 .
More generally, if finite number of disjoint intervals.(a1 , b1 ), (a2 , b2 ), . . . , (an , bn ), with
respective lengths .α1 , α2 , . . . , αn were removed from the interval .(0, 1), the set . A of
remaining points would consist of all the points in a finite number of intervals, with a
total length of
.1 − (α1 + α2 + · · · + αn ).
Even if the intervals .(a1 , b1 ), (a2 , b2 ), . . . , (an , bn ) were not disjoint, the conclusion
remained the same, except that the sum of the lengths of the intervals whose points form
. A would be greater than or equal to .1 − (α1 + α2 + · · · + αn ). In any case, if it was
assumed
.α1 + α2 + · · · + αn < 1,
this sum was certainly not zero, and consequently, the set . A had the power of the con-
tinuum.
Next, Borel dealt with the case where countably many intervals .(an , bn ) were removed
from .(0, 1). For each integer .n, he associated an interval .(an , bn ) of length .αn . This
meant he removed the points of a countable infinity of intervals from the interval .(0, 1).
Additionally, he assumed the series of positive terms
.s = α1 + α2 + · · · + αn + · · ·
s < 1.
.
Then, he pondered what could be said about the set . A consisting of points in the interval
.(0, 1) that did not belong to any of the intervals .(an , bn )? He noted that this set might
not be dense in any sub-interval of .(0, 1). It was also evident that it contained all points
24 1 Historical Perspective
in the sub-intervals where it was dense. However, a preliminary question arose: did such
a set . A exist at all? Could it be deduced from the inequality
s<1
.
that there were points in .(0, 1) not belonging to any of the intervals .(an , bn )? Although
this seemed almost obvious, Borel thought it useful to prove it rigorously, as doing so
provided several important insights.
The first of these remarks was the following: it was to be demonstrated that there exist
points not interior to certain intervals. He considered as belonging to an interval only
the interior points, excluding the endpoints. He enlarged, at each endpoint, each interval
.(an , bn ) by a fraction .ε of its length; i.e., he took .bn bn = an an = εan bn , the points .an
and .bn being outside the interval .(an , bn ).
The interval .(an , bn ) had a length of .(1 + 2ε)an bn ; therefore, the sum of all the intervals
.(a , b ), (a , b ), . . . is .s = (1 + 2ε)s; but, if
1 1 2 2
. s < 1,
(1 + 2ε)s < 1.
.
Now, the hypothesis that every point of the interval .(0, 1) was included in one of the
intervals .(an , bn ) (without excluding the endpoints) would have led to the result that
every point was included in one of the intervals .(an , bn ) (excluding the endpoints). Borel
showed that this hypothesis was incompatible with the inequality
.s < 1.
He showed this by using the following result, in which it was expressly understood that
the words “interior to an interval” excluded the end-points.
Theorem 1.2 If on a bounded line segment, there is a countable infinity of partial intervals
such that every point on the line is interior to at least one of the intervals, then there exists a
finite number of intervals chosen from the given intervals that have the same property [every
point on the line is interior to at least one of them].
From this theorem, he concluded that if a finite number of intervals were such that all
the points of a segment were interior to them, the sum of the lengths of these intervals was
greater than the length of the segment. This was not possible if the finite intervals were
chosen from an infinite number of intervals whose total sum was less than the length of the
segment.
It was thus affirmed that the hypothesis
1.5 Work of Borel 25
.s < 1;
. s1 < 1.
Consequently, there would have existed points not belonging to the intervals .(an , bn ) and
certainly not coinciding with any of the points .α1 , . . . , αn , . . .. It was, therefore, proved that
the set . A was uncountable.
To go back to the case of considered intervals .(1), one needed to substitute the intervals
p 1 p
.(
q − q 3 , q + q 3 ) in place of the intervals .(an , bn ) and the proof followed.
1
Their total extent, since there are.q − 1 such intervals corresponding to the same denominator
q, is given by:
.
∞
∞
q −1 1 1
.2ε = 2ε − = εM
q3 q2 q3
q=1 q=1
with . M being an easy-to-calculate number and less than one. Moreover, if their common
parts are not counted multiple times, we still have a countable infinity of intervals, whose
total extent is less than .εM.
Borel took .ε = n1 , where .n is an integer. He denoted by . E n the set of all points inside the
corresponding intervals, and considered the sequence of sets:
. E1, E2 , . . . , En , . . . (E)
26 1 Historical Perspective
It was clear that each set . E n contained all the points included in the subsequent sets. The
set . E n was composed of all points inside intervals whose total length was less than . M n .
Next, Borel focused on the set . E formed by the points that belong to all the . E n . This set
p
. E certainly included all the points . , but it was not obvious a priori that it contained other
q
points. This set . E also had the remarkable property that its points could be enclosed in a
series of intervals whose total length is as small as desired. For all points of . E were points
of . E n and, therefore, were included in intervals of a total length less than . M
n . Borel showed
that . E had the cardinality of the continuum.
It sufficed to show that the set of transcendental numbers .ξ 10 belonged to the set . E; i.e.,
to show that each of these numbers .ξ belonged to one of the sets . E n . Each of these numbers
.ξ was such that there were infinitely many values of . p and .q satisfying the inequality
p 1
|ξ −
. | < 4.
q q
Since there were infinitely many such values of .q, it was assumed .q > n and, thus
p 1 ε
.|ξ − |< 3
= 3,
q nq q
Borel’s analysis motivated him to give a new definition of measurable sets and the measure
of sets. Borel, in his book [6], Borel states in Chap. .3, p. .47
Here are some new definitions: If a set . E has measure .s, and contains all the points of a set
.E
whose measure is .s , then the set . E − E , formed by the points of . E that do not belong to
. E , will be said to have measure .s − s . Furthermore, if a set is the sum of a countable infinity
of sets with no common part, its measure will be the sum of the measures of its parts. Finally,
if the sets . E and . E have measures .s and .s , respectively, and . E contains all the points of . E ,
then the set . E − E will have measure .s − s . In other words, the first definition states that the
difference of two sets with measures .s and .s is the set of points that belong to the first set but
not to the second set, and has measure .s − s . The second definition states that the measure
of a countable sum of sets with no common part is the sum of the measures of the individual
sets. The third definition states that if two sets have measures .s and .s , and the first set contains
all the points of the second set, then the difference of the two sets has measure .s − s . These
definitions are used in the field of measure theory, which is a branch of mathematics that deals
with the size of sets.
Borel always restricted himself to the interval .(0, 1) and worked with the points of this
interval. He constructed special sets by meticulously adding and subtracting intervals within
this interval, culminating in a set, say . A. This intricate process of set-theoretic manipulation
continued, adding and subtracting intervals to . A, and even considering the addition and
subtraction of (countable) infinitely many such sets . A1 , A2 , . . .. These resulting sets, that
emerged from this meticulous process, were called Borel sets.
Borel only defined measure on these particular sets, calling them measurable. He admitted
that defining measure on other sets, those not born from his established process, was possible,
but he deemed it unworthy of his pursuit. Such sets, he argued, lacked the fundamental
property he deemed essential: a clear correspondence between set-theoretic operations like
addition, subtraction, and subset or super-set relations and their corresponding algebraic
counterparts (addition, subtraction, and less than or greater than relations).
This meticulous approach had significant consequences. Important sets, such as all
bounded perfect sets, were demonstrably measurable. Even a general method for construct-
ing such sets was unveiled, involving the subtraction of intervals from the domain of interest,
which is the interval .(0, 1).
In essence, Borel meticulously crafted a measurable world within the confines of his
chosen domain, laying the foundation for further explorations in the realm of measure
theory. His dedication to rigour and precision established a cornerstone upon which future
mathematicians could build.
The need to attribute numerical values to sets emerged from studies in function theory. This
endeavour was equivalent to assigning a measure to these sets. However, the measure could
not be defined for every set; Lebesgue’s approach to defining measure encompassed sets that
possessed characteristics akin to segment lengths or polygonal areas. In collaboration with
Borel, Lebesgue devised a definition of measure of a set based on its fundamental properties.
This definition of measure extended to spaces of multiple dimensions.
From the idea of measuring sets composed of points on a plane, he derived the notion
of area for planar domains. Extending this concept to points in ordinary space led to the
development of the idea of volume. Consequently, the integral of a continuous function was
conceived as the area of a planar domain. This approach expanded to define the integral
of discontinuous bounded functions as the measure of a specific set of points. Thus, a
geometric interpretation of the integral emerged. Simultaneously, an analytical definition of
the integral was introduced, wherein it represented the limit of a sequence of sums akin to
those in Riemann’s definition. Lebesgue referred to the functions to which the geometric
definition applied as summable.
The class of Lebesgue’s summable functions was very large. In his thesis [5], Lebesgue
states that:
28 1 Historical Perspective
I am unaware of any function that is not summable; I do not know if such functions exist. All
functions that can be defined using arithmetic operations and the limit transitions are summable.
All functions integrable in the sense of Riemann are summable, and both definitions of the
integral yield the same value. Any bounded derivative function is summable.
Lebesgue’s initial definition described a bounded set as one where the distance between
any two points remained bounded. In these bounded sets, he established operations like set
addition and subtraction. However, he proposed that these set-theoretic operations should be
performed only on a finite or countably infinite number of sets. With each bounded set, he
associated a non-negative number termed its measure, subject to the following conditions:
The problem of assigning a measure to a set, as defined above, became known as the
problem of measure. Lebesgue aimed to address this problem exclusively for measurable
sets, which shared properties akin to segments or areas. This problem of sets had different
solutions, depending upon the type of point set under consideration—whether on a straight
line or in a plane, for instance. Hence, the measures were called linear measure or surface
measure, accordingly. Lebesgue attributed a unit measure (where measure .= 1) to any set
of non-zero measure. His rationale was based on the idea that if one measure system trans-
forms into another by multiplication with the same number, these systems are essentially
identical. Now, we shall briefly see how Lebesgue constructed the measurable sets, which
had properties akin to the segment lengths and polygon area, etc.
Lebesgue began by examining a set comprising only a single point. He deduced that such
a set possesses a measure zero. This conclusion arose from the notion that a bounded set,
encompassing an infinite number of points, should have a finite measure. Furthermore, he
asserted that the set of points on a segment. AB should yield an invariant measure, irrespective
of whether . A and . B belong to the set. It was crucial to establish that . AB could not have a
measure zero; otherwise, every bounded set would have a measure zero. Consequently, the
finite non-zero measure of segment . AB remained constant, irrespective of whether points
. A and . B were part of the segment. Thus, it was demonstrated that the measure of a set
comprising a single point must indeed be zero, aligning with the inherent ’zero-length’
characteristic of a point.
Then, Lebesgue assigned a length .1 to segment . AB and chose it to be the unit of length.
Having done this, he assigned a number (its length) to each segment .C D. He called this
number the measure of the set of points of .C D. If the length of .C D is .l = rs (.r , s ∈ N), then
1.6 Work of Lebesgue 29
the length of .C D can be thought of as having .r number of . AB joined end to end, and the
total resulting length is cut into .s pieces. The length of one such piece is the length of .C D,
and the measure of the set of points of .C D is measure of the set of points of one such piece.
Similarly, if .l is irrational, then to every number .k < l corresponds a segment of length .k
contained in .C D, and to every number .k > l corresponds a segment of length .k containing
.C D.
To ensure the fulfilment of the .3r d condition of the measure problem, it is necessary
that the total length of a segment, composed of either a finite or infinite series of disjoint
segments, is equal to the sum of the individual lengths of those segments. This case is
obvious when dealing with a finite number of segments, and it remains applicable even
when the segments are countably infinite in number (refer to M. Borel’s book [6]). With the
definition of the length of segments and an illustrative geometric method for the construction
of measurable sets, Lebesgue moved from the geometric approach to an analytical approach
for determining the measure of a set, which is done by defining:-
Lebesgue considered a set . E and enclosed its points in a finite or countably infinite number
of intervals. There would exist infinitely many such intervals, which enclose the points
of . E. The sum of these ‘enclosing intervals’ is called . E 1 . The set . E 1 contains the set . E
and therefore, it is reasonable to assume that .m(E) ≤ m(E 1 ), where .m(E) and .m(E 1 ) are
respectively measures of . E and . E 1 . In other words, .m(E) is at most equal to the sum of the
lengths of the intervals considered. The lower limit of this sum was called exterior measure
or .m e (E) by Lebesgue.
To define the interior measure of set . E, denoted .m i (E), Lebesgue assumed that all the
points of set. E belong to the segment. AB. The complement of. E, with respect to. AB, denoted
.C AB (E), is the set . AB − E. The measure of the set .C AB (E) is at most .m e (C AB (E)). Hence,
the exterior measure is never less than the interior measure. The definition of exterior and
interior measure paved the way for defining the:-
Analytically, Lebesgue characterised sets as measurable when their exterior and interior
measures coincided. In this case, the common value of these measures denoted the measure
30 1 Historical Perspective
of the set, provided the problem of measure could be solved. If . E was a measurable set, the
defined number .m(E) would inherently satisfy the conditions of the problem of measure.
Another equivalent definition of measurable sets was proposed: A set . E was considered
measurable if it was possible to enclose its points in intervals .α while enclosing the points
of its complement .C(E) in intervals .β in such a manner that the total lengths of the common
segments (intervals) between .α and .β are as small as desired.
Further, it was proved that if . E 1 , E 2 , . . . is a finite or countably infinite sequence of
measurable sets, then the sum set . E is measurable, and such a statement could be made only
for measurable sets.
Indeed, Lebesgue never claimed that the problem of measure was not possible for the
sets whose exterior and interior measures are unequal. However, the procedure he used to
define a set consisted of two steps:
1. Forming the sum of a finite or countably infinite number of previously defined sets.
2. Considering the set of points common to a given finite or infinite number of sets.
and, these two steps applied to measurable sets, yield measurable sets.
Now, we know that an interval is a measurable set. Finitely many additions and subtractions
of intervals give measurable sets. Borel sets are obtained by applying the previous two
procedures finitely many times along with the process of taking complements of the sets.
The sets resulting from this procedure (the Borel sets) were called measurable by Borel and
. B measurable by Lebesgue. Furthermore, Lebesgue estimated their cardinality to be that of
continuum. Examples of Borel sets include all sets formed by sums of intervals and closed
sets whose complements are sums of intervals.
Another example of a Borel measurable set is the following perfect set/ Cantor’s set:
The set . E, formed of points:
a1 a2 a3
. x= + 2 + 3 + ··· .
3 3 3
Where .ai are equal to .0 or .2. This set is obtained by removing the middle third (interval)
from the interval .(0, 1) and from each subsequent resulting intervals. It is a perfect set, and
hence it is . B measurable. Its complement is formed of interval . 13 , 23 of length . 13 , of two
intervals . 19 , 29 , . 23 + 19 , 23 + 29 of length . 312 , of four intervals of length . 313 , etc, therefore
its (the complement’s) measure is
1 1 1
. + 2 2 + 22 3 + · · · = 1.
3 3 3
1.6 Work of Lebesgue 31
Consequently, . E is of measure zero and has the cardinality of the continuum. Therefore, we
form an infinite number of sets with points of . E. All of them have an exterior measure zero
and thus are measurable. The cardinality of the set of these sets is that of the set of sets of
points.
Therefore, we conclude that there exist measurable sets (in Lebesgue’s sense) that are
not . B measurable, and the cardinality of the set of measurable sets is that of the set of sets
of points. Therefore, the class of Lebesgue measurable sets is wider than that of Borel.
The segment . AB, which carries . E, is divided into partial intervals. Let .l be the sum of the
lengths of those intervals where all points are interior to . E, and . L the sum of the lengths of
those intervals containing points from . E or its boundary. It is shown that when the division
of . AB is varied in any way so that the maximum length of the partial intervals tends towards
zero, both numbers .l and . L tend towards definite limits, the interior and exterior extents of
. E.
From this definition, it follows that the exterior extent is at least equal to the exterior
measure and that the interior extent is at most equal to the interior measure. Mathematically:
Jordan called the sets whose exterior and interior extents are equal—measurable; these sets,
which Lebesgue called . J measurable, are therefore measurable in Lebesgue’s sense, and
the two definitions (Jordan’s and Lebesgue’s) of measure agree when both are applicable.
It can be said that the interior extent of . E is the measure of the set of its interior points. The
set of interior points is open; that is, it contains no points of its boundary. Therefore, it has as
its complement, a closed set and hence is a . B measurable set. The exterior extent of . E is the
measure of the set formed by . E and its boundary, which is closed and thus . B measurable.
Therefore, for a set to be . J measurable, it is necessary and sufficient for its boundary to
have zero measure.
A closed set, with its exterior extent being its measure, can be affirmed as . J measurable
if it has zero measure. In particular, the perfect set defined in Sect. 1.6.1 is . J measurable; the
same goes for all those that can be formed with its points. Hence, the set of . J measurable
sets has the same cardinality as the set of point sets, and there exist . J measurable sets that
are not . B measurable.
32 1 Historical Perspective
All this will imply that the cardinality of . J measurable sets is greater than that of Borel
measurable sets. Lebesgue compares the . J and . B measurable sets in one of his letters to M.
E. Borel. He states thus:
.. . .
You mentioned that Jordan’s definitions of measure are more general than yours. Indeed,
Jordan associates two numbers with every set, but these two numbers seem to me to have
no other property than their existence, except in the case where they coincide, and then they
match yours. Please tell me if I am wrong and whether, in your opinion, do Jordan’s definitions
have any interest beyond providing a simpler definition of measure, in the case of a somewhat
unusual set?
Please let me know, for the time when I will be writing, if you are aware of any other definitions
of measure and if the one you have adopted has already been used. I firmly believe that this
is the correct one, as I have applied it myself [6]; and it seems impossible, as you mentioned,
to give up the ability to add measures as we do with sets. Unfortunately, it seems difficult to
demonstrate that these are the only ones we can measure or even that these others exist [7].
.. . .
11
In another interesting letter, Lebesgue writes to Borel to compare sets measurable in his and
Borel’s senses.
We are absolutely in agreement, I believe. I have slightly modified the language, that’s all. If
we consider a set . E measurable (in my sense) [9], and we add to it one of those sets (analogous
to the one on page .67 [10]) that is non-measurable in your sense but measurable in mine (let
it be .e1 ), we have a set . E + e1 measurable in your sense. .e1 has measure zero (in my sense).
. E + e1 has the same measure (in my sense) as . E. Therefore, with the remarks on pages .48 and
.49, [11],. E has a measure at most equal to that of. E + e1 . Besides,.e1 is part of a set measurable
in your sense and of measure zero, let it be .e2 . So, by removing from the set . E + e2 the set
.e1 , we obtain a set . E measurable in your sense contained in . E and having the same measure
as . E + e1 , so that . E having a measure at least equal to that of . E has exactly the measure of
. E + e2 .
Is this not what you wanted to convey to me? I may have expressed myself very poorly, but
this was the small demonstration I intended to include in my note. I do not have the text of my
note, but I hoped I had been clearer than I fear I was. It seemed useful to me to clearly specify
that the sets I called measurable were not exactly the same as those you referred to as such. I
also believe there is no harm in calling measurable (as I do) even those sets that the remarks
on pages 48 and 49 allow us to measure. However, this changes some statements, for example,
the one on page .110 [12]: the set of measurable sets has the power of the continuum [13]. With
the extended sense: the set of measurable sets has the same power as the set of all point sets,
since there are sets of null measure that have the power of the continuum.
Perhaps we could, and I believe this is certain, limit ourselves to the sets you call measurable,
provided we only consider the functions of the Baire set [14]; this is a question I had not
Lebesgue stated the problem of integration from the geometric point of view as follows:
Let a curve .C be given by its equation . y = f (x) (where . f is a continuous positive
function, and the axes are rectangular). We have to find the area of the domain bounded by
an arc of .C, a segment of . O x, and two parallels to the . y-axes at the given abscissae .a and
.b, (.a < b).
This area is called the definite integral of . f taken between the limits .a and .b; it is
b
represented by . a f (x) d x.
The conventional approach for the general case involves determining a domain’s interior
and exterior extents by dividing the plane into rectangles with sides parallel to the . O y and
. O x axes. To draw these rectangles, start by drawing parallels to . O y, then cut the bands
formed by lines parallel to . O x using ordinates that extend from one band to the next. When
considering one of these rectangles, say . R1 , to calculate one of the extents (exterior or
interior), all rectangles . R situated within the same band and enclosed between . R1 and . O x
must be taken into account for calculations of extent. Therefore, these extents (exterior and
interior) are calculated by taking limits of the sum of areas of rectangles with their bases
along . O x.
Let .δ1 , δ2 , . . . be the lengths of these bases; .m 1 , m 2 , . . . , M1 , M2 , . . . be the lower and
upper bounds of . f in the corresponding intervals. If we assume these .δ are given, i.e., the
parallels to . O y have been drawn, and if we choose the segments parallel to . O x in such a
way as to draw rectangles of very small size (to obtain the most approximate values possible
for the extents), we will obtain for these approximate values of extents:
.s = δi m i S = δi Mi .
Thus, we know how to calculate the two extents of the domain. We show that they are equal.
The problem we have posed, therefore, has a meaning, and we know how to solve it.
If . f is a bounded function, Darboux showed13 that the two sums .s and . S tend towards the
perfectly determined limits. He called them integral by defect and integral by excess. When
these two integrals are equal, which occurs for functions other than continuous functions,
the function is called integrable, and the common limit of .s and . S is called since Riemann14
the definite integral of . f taken between .a and .b.
Summable Function
Geometric viewpoint: To understand a summable function geometrically, Lebesgue asso-
ciated the set . E of points whose coordinates satisfy the following two inequalities:
a ≤ x ≤ b 0 ≤ y ≤ f (y),
.
is the integral.
When the function . f takes both positive and negative values, we associate, a set . E of
points whose coordinates satisfy the following three inequalities, with it
a ≤ x ≤ b x f (x) ≥ 0 0 ≤ y 2 ≤ f (x)2 .
.
In this case, set . E is the sum of two sets, . E 1 and . E 2 , formed from points with positive
ordinates for . E 1 and negative ordinates for . E 2 .15 The integral by defect is the interior extent
of . E 1 minus the exterior extent of . E 2 . The integral by excess is the exterior extent of . E 1
minus the interior extent of . E 2 . If . E is . J measurable (in which case . E 1 and . E 2 are also . J
measurable), the function is integrable, the integral being .m(E 1 ) − m(E 2 ).
These results immediately lead to the following generalisation: If the set . E is measurable
(in Lebesgue sense), (in which case . E 1 and . E 2 are also measurable (in Lebesgue sense))
the definite integral of . f , taken between .a and .b, is the quantity:
.m(E 1 ) − m(E 2 ).
m i (E 1 ) − m e (E 2 ), m e (E 1 ) − m i (E 2 ).
.
These two numbers are contained between the integrals by defect and excess.
Analytic viewpoint: Since . E is measurable, it is contained in a set . E and contains a set
. E where . E and . E are . B measurable and of measure .m(E), .§7. Further, the reasoning
which gave us this result proves that we can assume . E and . E formed of segments parallel
to . O y and have their base on . O x, that is, they correspond to two functions . f 1 and . f 2 such
that . f 1 ≥ f 2 .
15 It does not matter, whether the points . x ∈ (a, b) for which . f (x) = 0 are counted in . E or . E .
1 2
1.7 Integration in Lebesgue’s Sense 35
Lebesgue showed that the set of values of .x for which . f (x) is greater than .m > 0 is
measurable. Similarly, the set of values of .x for which. f (x) is less than.m < 0 is measurable.
Hence, it follows that the set of values of .x for which . f (x) is less than or equal to .m > 0
(greater than or equal to .m < 0) is measurable. Therefore, the set of points for which, either
.a ≥ f (x) > b > 0, or .0 > c > f (x) ≥ d, or .e ≥ f (x) ≥ g, g · e < 0 is measurable, and
by letting .b tend towards .a, or .d towards .c or .e and .g towards .0 it was seen that the set of
points for which . y has a given value (. y = k) is measurable. In short, irrespective of the signs
of .a and .b, if . f is summable, then the set of values of .x for which
is measurable.
Conversely, if for any.a and .b the set of values of .x for which.a > f (x) > b is measurable,
and if the function . f is bounded, then it is summable.
Geometric Definition of Integral of Summable Functions
Lebesgue’s work distinguishes him from his predecessors, whose work will be discussed
here. His revolutionary idea of looking at the integration process by partitioning the co-
domain of a function is as follows.
Lebesgue divided the intervals of variation of . f (x). He considered .a0 , . . . , an as the
points of division, and .ei (i = 0, 1, . . . , n) as the set of values of .x for which . f (x) = ai
(See [5]).
Let .ei (i = 0, 1, . . . , n − 1) be the set of values of .x for which .ai < f (x) < ai+1 .
The points in the set . E attached to . f (x), corresponding to the values of .x belonging to
.ei , form a measurable set in the plane with a measure of .|ai | · m l (ei ) (where .m l (ei ) denotes
a linear measure).
Therefore, . E contains a set of measure
n
n
. |ai |m l (ei ) + |ai−1 |m l (ei )
0 1
These two measures differ by less than .(an − a0 )α, where .α is the maximum of .ai − ai−1 ;
therefore, it can be made as close as desired, and . E is measurable; thus, . f is summable.
Additionally, it is known how to calculate the measure of . E; therefore, if . f is positive,
the integral is the common limit of two sums
36 1 Historical Perspective
n
n
σ=
. ai m l (ei ) + ai−1 m l (ei ),
0 1
n
n
= ai m l (ei ) + ai m l (ei )
0 1
when .ai−1 − ai tends towards zero. Now if . f is not always positive, the limit of the sum of
those terms of .σ or . which are positive gives a measure of the set that is denoted by . E 1
(see Sect. 1.7), and the limit of the sum of the negative terms gives .−m(E 2 ), therefore in all
cases .σ and . define the integral.
Analytic Definition of Integral of Summable Functions
It will be useful to show that analytic reasoning could have led to the consideration of
summable functions and to what are now called their integrals.
Lebesgue considered a monotonically increasing continuous function . f (x) defined
between .α and .β (.α < β) and varying between .a and .b. For .x the following values are
taken arbitrarily
.α = x 0 < x 1 < x 2 < · · · < x n = β
In the ordinary sense of the word, the definite integral is the common limit of the two sums
n
n
. (xi − xi−1 )ai−1 and (xi − xi−1 )ai
1 1
. f (x) = ai for the points of a closed set.ei , (i = 0, 1, . . . n); ai < f (x) < ai+1 for the points
of a set, sum of intervals, .ei , (i = 0, 1, . . . , n − 1); the sets .ei , ei are measurable.
The two quantities
1.7 Integration in Lebesgue’s Sense 37
n
n
σ=
. ai m(ei ) + ai m(ei ),
0 1
n
n
= ai m(ei ) + ai+1 m(ei )
0 1
b
tend towards . a f (x) d x when the number of .ai increase in such a way that the maximum
of .ai − ai−1 tends towards zero.
This property being obtained, it can be used to define the integral of . f (x). However,
the two quantities .σ and . have a meaning for functions other than continuous, namely
summable functions. Lebesgue showed that for these functions, .σ and . have the same
limit, which is independent of the choices of .ai ; this limit would be, by definition, the
integral of . f (x) taken between .α and .β.
When, between the .ai , new points of division are introduced, .σ does not decrease, .
does not increase, therefore .σ and . have limits. They are equal because . − σ is at most
equal to .(β − α) multiplied by the maximum of .(ai − ai−1 ).
Next, the interval of variation of . f (x) was partitioned in another way using points .bi . In
this mode .σ and . are the corresponding values of .σ and .. Again, .σ and . are the
corresponding values from the division mode in which .ai and .bi are used simultaneously.
The pair of inequalities
σ ≤ σ ≤ ≤
.
σ ≤ σ ≤ ≤
proves that the six sums .σ, σ , σ , , , have the same limit.
Hence, the existence of the integral is demonstrated. When adopting this slightly different
approach, it is apparent that Riemann’s definition of integral consistently agrees with the
previous one. This agreement is supported by the fact that the points of discontinuity in
integrable functions constitute a set of measure zero.16 Let . f (x) be an integrable function
and let . E be the set of points for which we have
a ≤ f (x) ≤ b
.
where .a and .b are any two numbers. The limit points of . E that do not belong to . E are the
points of discontinuity; therefore, they form a set .e of measure zero. . E + e being closed is
measurable, .e is measurable, and so is . E. This is sufficient to conclude that . f is summable.
If in an interval of length .l, the maximum of . f is . M and the minimum .m, the integral is
contained between .l M and .lm. Hence, it follows that the integral of a summable function
16 Riemann stated this property in the following way: For a function to be integrable, it is necessary
that “the total sum of the intervals, for which the oscillations are greater than .σ, for any .σ, can be
made infinitesimally small”.
38 1 Historical Perspective
is included between the integrals by defect and excess, and in particular, the two integral
definitions coincide when they are both applicable.
Cardinality of Set of Summable Functions
Lebesgue showed that the elementary arithmetic operations applied to summable functions
give back summable functions. Specifically, the sum of any finite number of summable
functions is a summable function and the integral of the sum function is the sum of the
integrals of individual term functions. Similarly, the product of two summable functions is
a summable function and the reciprocal of a summable function, if it is bounded away from
.0 is a summable function. The .kth root of a summable function, if it exists, is summable. If
. f and .ϕ are two summable functions, and if . f (ϕ) exists, then it is summable.
Another important result was deduced, that if a bounded function . f is the limit of a
sequence of summable functions . f i , then . f is summable.
These results led to the definition of an important class of summable functions—the
polynomials. Since. y = h and. y = x are summable functions, then. px q is summable; hence,
all polynomials are summable.
Ever since Weierstrass, it has been known that every continuous function is summable.
However, apart from continuous functions, there exist other functions that are limits of
polynomials, as studied by M. Baire, referred to as class one functions [17]. Consequently,
these first-class functions also fall under the category of summable functions. The limits of
first or second-class functions are also summable, etc. All functions from the set denoted by
. E by M. Baire (p. 70 Loc. Cit.) are summable.
These results offer numerous examples of summable functions that are both discontinuous
and non-integrable (in Riemann’s sense). We can generate such examples using the following
method: The reasoning from Sect. 1.7.1 showed that every integrable function is summable,
proving that if, after ignoring a set of measure zero, a set remains where a function is
continuous at each point, that function is summable. For instance, if . f and .ϕ are two
continuous functions and . F is defined as equal to . f except at the points of a set . E of
measure zero, where we have . F = f + ϕ. However, if .ϕ is never zero and . E is dense in all
intervals, in that case all points become points of discontinuity for . F, thereby rendering it
non-integrable (in the sense of Riemann).
This approach enables the construction of summable functions that form a set equal in
cardinality to the set of functions.
Term-by-Term Integration of Series
The calculation of the integral of a given function presents the same difficulties as the
calculation of the measure of a given set. Most of the discontinuous functions that we have
considered so far in Analysis were defined by series, therefore it may be worthwhile to note
the following result:
1.7 Integration in Lebesgue’s Sense 39
Both functions on the right-hand side have integrals, and so does . f , and the integral of . f is
the sum of the integrals of . f n and . f − f n . Now, an upper limit of the second integral is to
be found.
A positive number . is arbitrarily chosen. Let .en be the set of values of .x for which one
does not have,
.| f − f n+ p | < ε
However, each set .en contains all sets with greater indices, and there exists no point
common to all .en . Therefore, .m(en ) tends towards zero with . n1 and consequently the same
goes for
.
( f − fn ) d x
When . f is bounded the proposition can be stated thus: When a sequence . f 1 , f 2 , . . . of
summable functions, bounded from above in absolute value in their set, has a limit . f the
integral of . f is the limit of integrals of functions . f n .
Here is another form of the statement relative to the general case.
When the set of the remainder of a convergent series of functions having integrals is
bounded from above in absolute value, the series is term-by-term integrable.
As a very particular case, the theorem on the integration of uniformly convergent series
follows from the above.
17 The most interesting particular case of this theorem, the one where . f and the . f are continuous
i
functions, has already been obtained, using very different considerations by M. Osgood in his Memoire
on non-uniform convergence (American Journal .1894).
40 1 Historical Perspective
Therefore, every function having an indefinite integral admits an infinite number of indefinite
integrals which differ only by a constant . F(a).
The indefinite integral is a continuous function.18 This is obvious if the function . f (x)
is bounded. The following inequality must be shown to prove it in a general case, which is
stated without proof.
a+h
. |F(a + h) − F(a)| = f (x) d x < ε,
a
hence,
F(a + h) − F(a)
m<
. < M,
h
therefore for .x = a, if . f (x) is continuous at this point, . F(x) has a derivative equal to . f (a).
If . f (x) is continuous for every value of .x, . F(x) is any one of the functions that has . f (x)
as its derivative, i.e., one of the primitive functions of . f (x).
Thus, in the case of continuous functions, there is an identity between the search for
primitive functions and the search for indefinite integrals of a given function. This well-
known result still holds when dealing with a function whose derivative is integrable in the
sense of Riemann.19 However, there exist derivative functions that are not integrable in
Riemann’s sense20 ; given one of these functions, its primitive function cannot be calculated
using integration in the sense of Riemann.
It was shown that any bounded derivative function has an indefinite integral that is one
of its primitive functions. Therefore, the primitive function of a given bounded function can
be calculated if it exists.
18 We can also add that it is of bounded variation; this variation being at most equal to . | f (x)| d x.
The proof is the same as given by M. Jordan, Cours D’Analyse .§81 [18].
19 See Darboux [4].
20 M. Volterra gave the first example of such functions. Giornale de Battaglini, t. . X I X , 1881.
1.7 Integration in Lebesgue’s Sense 41
In the case of unbounded derivative functions, it was demonstrated that if they have
integrals, there is an identity between their primitive functions and their indefinite integrals.
Primitive of Bounded Derivative Functions
The derivative of a function . f (x) is the limit of the expression:
f (x + h) − f (x)
. = ϕ(x)
h
when .h tends towards zero, which, when .h is fixed, represents a continuous function; there-
fore, the derivative is the limit of continuous functions, and so it is summable.
Let us assume that the derivative . f is always less than . M, in absolute value. By virtue
of the theorem of finite increment .ϕ(x) = f (x + θh), therefore the functions . f (x) are
bounded in their set, and as a result, we have (see Sect. 1.7.1)
b b b
1 x+h
. f d x = lim ϕ(x) d x = lim f (x) d x
a a h→0 h x a
therefore, b
. f d x = f (b) − f (a).
a
Every bounded derivative function admits its primitive functions as its indefinite integrals;
this result remains true even if it is a right or left bounded derivative or a limit towards which
.ϕ(x) tends for certain values of .h tending towards zero.
The Necessary and Sufficient Condition for Existence of Integral of a Derivative Func-
tion
The primitive functions discussed above are of bounded variation.21 Lebesgue showed that
the necessary and sufficient condition for the integral of the derivative (whether bounded or
not) of a differentiable function to exist is that the function must be of bounded variation. If
this condition is met, the function is one of the indefinite integrals of its derivative.
Further Results and An Example
Some more results can be derived from the necessary and sufficient condition for the integral
of .| f | to exist, which may be obtained by resuming the above reasoning.
The total positive variation of . f (x) between .a and .x is equal to the integral of . f (x)
extended to the set of points for which . f is positive, which means to the integral
1 x
. p(x) = ( f + | f |) d x.
2 a
Similarly, for the negative variation, we have:
21 Lebesgue used some of the properties of these functions here (See Jordan, Comptes Rendus de
l’Academie des Sciences .1881 and Cours d’Analyse [18].)
42 1 Historical Perspective
x
1
. − n(x) = ( f − | f |) d x.
2 a
And as we have:
Thus, when a function . f (x) is given, we know how to find out if it is derivative of a function
of bounded variation, and, if it is so, we know to find out its primitive functions. If . f (x)
is bounded, its primitive functions, if they exist, are of bounded variation, and we can find
them.
An Example of a Summable Unbounded Function That Does Not Have an Integral
The function . f (x) given by
1
. f (x) = x 2 sin for x = 0
x2
f (0) = 0,
.f (x) provides an example of an unbounded function that is summable but does not have
an integral. The integral . f (x) of the summable function . f (x) cannot be found by using
previous methods.
It is interesting to note that the classical definition of the integral of a function that
becomes infinite in the vicinity of a point allows us to find . f (x) when . f (x) is known. In
cases where the function to be integrated is unbounded, the definition Lebesgue has adopted
is not a generalisation of the classical definition; it differs from this definition but agrees
with it when both apply. It would also be very easy to generalize the notion of a definite
integral so that the classical definition and the one given by Lebesgue become special cases
of a more general definition.
We have seen that every indefinite integral is continuous. If now we consider this property
as one of the parts of the definition of indefinite integrals, we are led to the following result:
1.7 Integration in Lebesgue’s Sense 43
A function . f (x) defined on .(α, β) has an indefinite integral . F(x) in this interval, if there
exists one and only one continuous function . F(x) up to an additive constant, such that
b
. F(b) − F(a) = f (x) d x
a
for all .a and .b, between .α and .β, in such a way that the right-hand side has a meaning.22
Remarks of M. E. Borel on Lebesgues’s Work
M. E. Borel in his Memoire23 speaks about the work of H. Lebesgue. He says thus:
It has been particularly painful for me to find myself compelled to discuss questions of priority
with Mr. Lebesgue; for twenty years, in fact, I have been several times directly or indirectly
aware of discussions of priority between Mr. Lebesgue and various mathematicians, and while
I often thought that Mr. Lebesgue’s opponents were right, I never believed that Mr. Lebesgue
was wrong. This stance, which has often led his opponents to accuse me of unfairness, is
based on my opinion of his good faith and the nature of his mind. I have no reason to change
this opinion just because I find myself personally involved: patere legem quam fecisti. I am
therefore led, after explaining why I believed I was right, to indicate why I am convinced that
Mr. Lebesgue is not wrong in his judgment of his own work.
To do this, I must try to explain what, in my eyes, characterizes Mr. Lebesgue’s talent; I will
not try to confine it in narrow formulae nor to specify in a few sentences his considerable
contribution; even if I believed myself capable of doing so, I would not, because reading the
passages of his Memoir, where he gives some details about his work and the stages of his
thought, proves to all those who might have been unaware that, if he were to write a detailed
Notice on his work, he would render the greatest service to Science. But I believe I can say,
without fear of contradiction, that Mr. Lebesgue’s talent consists mainly of a great capacity
for assimilation, rare originality of thought, and exceptional creative power; all his works are
original and belong entirely to him, are entirely personal to him, because he never takes an
idea from someone else as it is, to develop it and use it; if a foreign idea is useful to him, it is
in the way a powerful tree creates leaves and fruits with all the nutrients it finds in the soil.
My teacher Jules Tannery often quoted a sentence from Liouville; after comparing long demon-
strations to short ones, he concluded: “In short, long demonstrations have a great advantage,
which is to be long, and short demonstrations have a great advantage, which is to be short.”
And Liouville recalled, it seems, when he was in a good mood, the joke attributed to Lagrange:
“Mathematics is like the pig, everything about it is good.” Mr. Lebesgue’s demonstrations have
a very great advantage, in that they are complicated; this complexity brings an extraordinary
richness, a richness of thought such that Mr. Lebesgue hardly manages to fit all his ideas into a
book or a Memoir. There are many ideas that he does not express; there are others that are only
hinted at or incidentally mentioned. Therefore, Mr. Lebesgue alone knows all the richness of
his thought and work; it is not surprising, therefore, if he perhaps more frequently than others
have to make claims of priority. They are certainly always justified in his eyes, and that is why
I have always assured those who complained about him that they had no right to hold it against
22 Compare this definition with the one given by M.Jordan of the definite integral of an unbounded
function. Cours d’Analyse, pp. .46–.94 [2].
23 See, pp. .71−92 [19].
44 1 Historical Perspective
him; I address myself today to this exhortation, in order to soften the pain caused by certain
passages of his Memoir.
Stieltjes in his paper of .1894, [20] Recherches sur les fractions continues, did the path-
breaking work by creating an analytic theory of continued fractions. In this paper, he set
the connections between the function theory and integral theory which were essential for a
better understanding of an important class of continued fractions. Stieltjes had been working
on the problem of summation of divergent power series for quite a while. In his thesis
of .1886, [21] Stieltjes did fundamental work on remainders in several asymptotic series.
Then, in the period .1889−1890 he published quite a few papers containing examples of
continued fraction expansions for asymptotic series. Such series arose as formal power
series expansions of definite integrals of the form:
∞
f (u) du
. ,
0 z+u
where . f (u) > 0, and the continued fractions are of the form:
1
.
a1
z+ a2
1+ a3
z+
..
.,
1
.
p1
z + b1 − p2
z + b2 − p3
z + b3 −
..
.,
1
.
1
k1 z +
1
k2 +
1
k3 z +
..
.,
in which .k p are real and positive and .z is a complex variable, Stieltjes found that in some
cases the value of continued fraction has the form
∞
f (u) du
. ,
0 z+u
where . f (u) is a positive function of .u, while in some other cases, the value of the continued
fraction has the form
∞
Lp
. ,
z + xp
p=1
where the . L p > 0 and .0 < x1 < x2 < x3 < · · · . Further, in other cases, the value of the
continued fraction may be a sum of an integral and an infinite series of the above form. This
situation led Stieltjes to define an integral of the form24
∞
dϕ(u)
. ,
0 z+u
encompassing all three types of functions. This integral is indeed the Stieltjes integral, which
we see in the modern form.
Riemann integral holds significance when the integrated function is continuous or when its
points of discontinuity form a set of measure zero. On the other hand, the Lebesgue integral
applies to any function . f that is measurable and bounded or summable. Both integrals,
calculated between .a and .x, yield a continuous function of .x, say . F(x) with .a fixed, and
their derivative is. f , excluding a set of points of measure zero. However, there exist derivative
functions that are integrable neither in the Riemann sense nor in the Lebesgue sense. Denjoy
developed a calculation method that specifically worked for any derivative function, resulting
in a continuous function with . f as its derivative.
Denjoy extended the Lebesgue integral, establishing a clear link between totalisation and
integration in a clear, coherent, and expressive language. Integration primarily focuses on
the definite integral, while totalisation emphasises on the indefinite total. Totalisation aligns
more closely with the integration of differential equations than with the computation of
quadrature. Notably, the definite total aligns with Duhamel’s definite integral discussed in
Chap. 7.
Denjoy initially defined a sequence of finite or transfinite operations called Totalisation,
performed on a function . f (x) assumed to be finite throughout the interval .(a, b). If these
operations are feasible . f (x) is considered totalisable. Totalisation associates . f (x) with a
continuous function . F(x) over the entire .(a, b), representing the indefinite total of . f (x).
. F(x) is determined up to an additive constant. For any interval .(l, m) in .(a, b), the difference
. F(m) − F(l) of . F(x) in .(l, m) represents the definite total of . f (x) in .(l, m). Hence, totali-
sation serves the dual purpose of obtaining a function (indefinite total) or a value (definite
total).
Operations of Totalisation
Arnaud Denjoy in his Memoire [22], considered the totalisation of . f in an interval
.(a, b), a < b (which is for calculating a definite total and for calculating an indefinite total it
may be .(a, x)) and established some rules, concepts, definitions, and conditions. He applied
these rules to . f (function to be totalised) indefinitely many times. The following concepts
may be defined.
points from . P or, .(c) at a point of . P, if .ϕ is integrable (non-integrable) over that interval
or at that point. The points of . P where . f is non-integrable form a closed set.
2. Absolute convergence of series in an interval. Denjoy associated a number . An with
each of the contiguous intervals .u n of perfect set . P. Denjoy called the series . An to
be absolutely convergent in an interval .i if the series of numbers . An corresponding to
the intervals .u n interior to .i is absolutely convergent. Therefore, series . An is absolutely
convergent in any interval interior to .i.
Absolute convergence of series at a point. He called the series . An to be absolutely
convergent at a point . M of . P, if . M is interior to an interval .i where the series . An is
absolutely convergent.
The points of non-absolute convergence form a closed set.
. V (α, β) = V (u n ) + f,
P
Denjoy called . f totalisable in the interval .(a, b) if it satisfied the following conditions.
1. For any perfect set . P in .(a, b) the set of points of . P where . f is non-summable over . P,
is non-dense on . P.
2. Continuity of .V :- If .V (c , d ) has been calculated for any .c < d both .c , d interior to
.(c, d), . V (c , d ) tends to a limit when .c tends towards .c and .d tends towards .d.
3. For any perfect set . P, if .V has been calculated in every interval .u n contiguous to . P, the
set of points of . P where the series .Wn is not convergent is non-dense on . P.
With these concepts, definitions and conditions, we find the indefinite and definite total of
f in the following.
.
Indefinite Totalisation
Denjoy assumed the indefinite totals .Vk (x) were known in the intervals .(ak , bk ) such that
ak tend towards .a and .bk tend towards .b. Then, he formed the continuous function .V (x)
.
in .(bk−1 , bk ), using definition .2, indefinite total .V (x) in .(a, b) can be found.
Definite Totalisation
Denjoy took a perfect set . P contained in an interval .(α, β). He assumed that the totals of
. f (x) were known in various intervals .(l, m) contiguous to . P, with respect to .(α, β), and he
48 1 Historical Perspective
supposed that the series . [V (m) − V (l)] provided by these totals is convergent and that
. f (x) is summable over . P. Then, using Definition .3
. f dx + [V (l) − V (m)]
P
1.10 Summary
mately settled by Fourier: a function need not have an explicit expression. Discontinuities,
in the modern sense, were admitted into the realm of admissible functions determining
the initial shape of the string. Dirichlet, in his 1829 memoir, further expanded this class
by including functions that were bounded, piecewise monotonous, piecewise continuous,
and even possessing infinite discontinuities within finite intervals. His iconic “Dirichlet
function,” with distinct values for rational and irrational .x, exemplified functions defying
representation by series.
5. Riemann’s Integration: Taming Discontinuities with Sums. Section 1.3 focuses
on Riemann’s ground-breaking work on integration, born from his doctoral studies under
Dirichlet. Initially fascinated by representing functions using the Fourier series, he proved the
convergence of Fourier coefficients for bounded and integrable functions. Subsequently, Rie-
mann diverged into defining integration through Cauchy-Darboux-type sums. This approach
enabled integrating functions with densely packed discontinuities within the integration
interval. Here, the distribution of singularities held paramount importance in Riemann’s
analysis. Functions with reducible sets of discontinuities and zero mean oscillation were
deemed integrable. The integration process involved partitioning the domain of the function
(unlike Lebesgue’s later focus on the co-domain), followed by summation and limit-taking
procedures.
6. Critiques and Refinements: Pushing the Boundaries of Integration. The subse-
quent subsections analyse critiques of Riemann’s definition. One critical limitation was its
inability to explain the non-solvability of the fundamental theorem of calculus, as illustrated
by Dini’s counterexample. Term-by-term integration of series for Riemann integrable func-
tions also faced restrictions, demanding additional assumptions on the series’ sum function.
Furthermore, Riemann’s approach proved inadequate for resolving the multidimensional
integral problem without ambiguity.
7. Jordan’s Extent and Borel’s Measurable Sets: Stepping Stones to Lebesgue.
Section 1.4 dives into Camille Jordan’s concept of “extent,” a precursor to Lebesgue’s mea-
sure theory. Jordan assigned two numbers to sets—exterior and interior extents—enabling
integration definitions based on this framework. This led to the concepts of integration by
excess and defect, further paving the way for measure-theoretic advancements.
Section 1.5 explores the contributions of Émile Borel, who focused on constructing and
characterising measurable sets. He defined a method for constructing “Borel sets” through
operations like addition, subtraction, and intersection with intervals. He then provided a
measure specifically for Borel sets, capturing size in both topological and measure-theoretic
senses (illustrated by examples in Sect. 1.5.1).
Crucially, Borel acknowledged that his definition of measure might not encompass all
possible sets. However, he emphasised that Borel sets retain key properties similar to inter-
vals, making them a valuable stepping stone towards a more comprehensive measure theory.
8. Problem of Measure: As Visualised by Lebesgue. Section 1.6 unveils Henri
Lebesgue’s revolutionary approach to measure and integration. He began by formulating
the problem of measure based on a few key axioms. Unlike Borel, Lebesgue defined exte-
50 1 Historical Perspective
rior and interior measures of sets differently by enclosing the set and its complement in
progressively smaller intervals and taking limits. Sets with equal inner and outer measures
were deemed Lebesgue measurable.
Significantly, Lebesgue designed his measure theory to inherit the properties of line
segment measures (without common points) and allow countable additivity. He even esti-
mated the cardinality of sets measurable under each framework (Jordan, Borel, Lebesgue),
demonstrating that while Jordan and Lebesgue have the same cardinality, some Lebesgue-
measurable sets are not Borel-measurable.
9. Lebesgue’s Revolutionary Ideas of Integration. Section 1.7 dives into Lebesgue’s
original and fundamental integration theory. Here, he partitions the co-domain of the func-
tion, leveraging the concepts of measurable functions, summable functions, and domain
measure. This novel approach encompasses Riemann integration as a special case, over-
coming limitations like the non-solvability of the fundamental theorem and problematic
term-by-term integration of series. Lebesgue further explores indefinite integration, high-
lighting its connection to differentiation and establishing it as the inverse of the differentiation
process.
10. Stieltjes Integration: From Theory of Continued Fractions to Integration.
Section 1.8 explores Thomas Joannes Stieltjes’s generalisation of Lebesgue and Riemann
integrals. He integrated functions against functions of bounded variation (the integrator),
using continued fractions as a springboard for this novel method. This work had profound
ramifications for developing the theory of distributions, where the distributional derivative
of functions of bounded variation became known as the “Radon measure.”
11. Denjoy’s Work: Concept of Totalisation. Finally, Sect. 1.9 discusses Arnaud Den-
joy’s work, another generalisation of Lebesgue and Riemann integrals. Denjoy’s “totalisa-
tion” method focuses on integrating functions that are not measurable in the Lebesgue sense,
primarily addressing indefinite totalisation.
References
1. Lebesgue, Henri Leon. 1928. Leçons sur l’intégration et la recherche des fonctions primitives:
professées au Collège de France, vol. 6. Gauthier-Villars et cie.
2. Jordan, Camille. 1894. Cours d’analyse de l’École polytechnique, vol. 2. Gauthier-Villars.
3. Lejeune Dirichlet, G. 1829. Sur la convergence des séries trigonométriques qui servent à représen-
ter une fonction arbitraire entre des limites données.
4. Darboux, Gaston. 1875. Mémoire sur les fonctions discontinues. Annales scientifiques de l’École
normale supérieure 4: 57–112.
5. Henri, L. 1902. Lebesgue, “Intégrale, Longueur, Aire” .fre. PhD thesis, PhD Thesis. Université
de Paris
6. Borel, Émile. 1928. Leçons sur la théorie des fonctions. Gauthier-Villars et fils.
7. Jordan, Camille. 1892. Remarques sur les intégrales définies. Journal de mathématiques pures
et appliquées 8: 69–99.
8. Lebesgue, Henri. 1991. Lettres d’henri lebesgue à émile borel. Cahiers du séminaire d’histoire
des mathématiques 12: 1–506.
References 51
9. Banach, Stefan. 1923. Sur le problème de la mesure. Fundamenta Mathematicae 4 (7): 33.
10. Banach, Stefan, and Kazimierz Kuratowski. 1929. Sur une généralisation du problème de la
mesure. Fundamenta Mathematicae 14 (1): 127–131.
11. Lebesgue, H. 1902. Sur les transformations de contact des surfaces minima. Bulletin of Mathe-
matical Sciences, II. Sér. 26: 106–112. Bulletin of Mathematical Sciences, II. Sér. 26: 106–112.
12. Lebesgue, Henri. 1903. Sur le problème des aires. Bulletin de la Société Mathématique de France
31: 197–203.
13. Lebesgue. H. 1902. Un théorème sur les séries trigonométriques. Comptes rendus de l’Académie
des Sciences 134: 585–587.
14. Lebesgue, Henri. 1903. Sur les séries trigonométriques. Annales scientifiques de l’École normale
supérieure 20: 453–485.
15. Montel, Paul. 1907. Sur les suites infinies de fonctions. Annales scientifiques de l’École Normale
Supérieure 24: 233–334.
16. Riemann, Bernhard. 1873. Sur la possibilité de représenter une fonction par une série
trigonométrique. Bulletin des sciences mathématiques et astronomiques 5: 20–48.
17. Baire, René. 1899. Sur les fonctions de variables réelles. Annali di Matematica Pura ed Applicata
1898–1922 (3): 1–123.
18. Jordan, Camille. 1983. Cours d’analyse de l’École polytechnique, vol. 1. Gauthier-Villars et fils.
19. Borel, Émile. 1919. L’intégration des fonctions non bornées. Annales scientifiques de l’École
Normale Supérieure 36: 71–92.
20. Stieltjes, T.-J. 1894. Recherches sur les fractions continues. Annales de la Faculté des sciences
de Toulouse: Mathématiques 8: J1–J122.
21. Stieltjes, T.-J. 1886. Recherches sur quelques séries semi-convergentes. Annales scientifiques de
l’École Normale Supérieure 3: 201–258.
22. Denjoy, Arnaud. 1912. Une extension de l’intégrale de m. lebesgue. Comptes rendus de
l’Académie des Sciences Paris 154: 859–862.
The Integration Before Riemann
2
The integration was first defined as the inverse operation of differentiation; it is the operation
that permits to solve the problem of primitive functions:
Find the functions . F(x) which admit as its derivative a given function . f (x).
We know that, if the solution to this problem is possible, it is possible in infinitely many
ways, and that all the primitive functions . F(x) of the same function . f (x) differ only by an
additive constant. We propose to find any one of such functions . F(x).
At the time when the problem of primitive functions was posed under the form that I
indicate, that is, the times of Newton and of Leibnitz, the word function was quite poorly
defined. We thus call, most often, a quantity . y linked to the variable .x by an equation where
a certain number of symbols of operations appear which we usually see. The foremost of
these operations are: arithmetic operations (addition, subtraction, multiplication, division,
extraction of the roots), trigonometric operations (with the signs sin, cos, tan, arc sin, arc
cos, arc tan), logarithmic operations, and exponentials (with signs .log, a x ).
For a large number of functions expressed in this manner, we could have expressed the
primitive functions in the same manner, so that it appeared to some that any function admits
a primitive function. Moreover, we could respond to anyone who doubted this proposition.
Let (Fig. 2.1) the curve ., y = f (x), represent the given function . f (x); the axes being
rectangular. Let us suppose, for simplicity, . f (x) positive; let .a A, bB be two parallel lines
to the . y-axis, of abscissa .a and .b. These two parallel lines, the arc . AB of ., the segment .ab
of . O x, bound a domain of area . S(x). By evaluating the increment .bBCc of this area, we
see that . f (x) is the derivative of . S(x).1
1 For the proof as well as for the case where . f (x) is not always positive, see the classical treatise of
differential and integral calculus.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 53
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_2
54 2 The Integration Before Riemann
Let us note that in the previous considerations the word function has already received
a considerable extension. The relation between . S(x) and .x is in fact a geometric relation
and not an algebraic, trigonometric, or logarithmic one anymore. Such relations were still
considered as defining the functions. We only carefully distinguished between the geometric
figures defined with the help of laws expressible by geometric equalities and the figures which
were not defined in this way. The curves . y = f (x) of the first kind or the geometric curves
defined the functions . f (x). The curves of the second kind or arbitrary curves did not define
the true functions. When we used the word function for these two types of correspondence
between . y and .x, we distinguished the first from the second by calling them continuous
functions.2
There was also an intermediate category of functions, the ones that were represented with
the help of several arcs of the geometric curves. We would more readily consider them as
forming parts of functions.
The continuous functions were the true functions. Thus, we gave quite a restricted sense
to the word function because we believed that any continuous function, whether defined
geometrically or not, could be represented by an analytic expression of the nature of those
previously discussed, and we believed that it was impossible for the functions which were
not continuous.
But Fourier showed that the trigonometric series, which represented the continuous func-
tions, in the extended case, could also be used to represent discontinuous functions, formed
of parts of functions. In particular, a function .0 from .0 to .π, equal to .1 from .π to .2π, admits
a convergent trigonometric expansion. Thus, the only criterion enabling to distinguish the
true functions from false ones disappeared. It was necessary, either to extend the sense of the
word function, or to restrict the category of algebraic, trigonometric, exponential expressions
that could be used to define functions.
Cauchy remarked that the difficulties that resulted from the research of Fourier arose
even when we used very simple expressions, that is, the functions appeared as continuous or
discontinuous, according to the method used to define them. Cauchy cited, as an example,
a function equal to .+x for .x positive, to .−x for .x negative. This function is not continuous,
it is formed of parts of two continuous
√ functions .+x and .−x. It appears on the contrary as
continuous when it is defined .+ x . 2
To preserve the words continuous function in their original sense, it would therefore have
been necessary to consider only very particular analytic expressions.3 Cauchy preferred
instead to modify the definitions considerably.
For Cauchy . y is a function of .x when, to each state of magnitude of .x, there corresponds
a perfectly well-determined state of quantity of . y.
This definition appears to be the same as the one later given by Riemann. However, in
reality, the correspondences that Cauchy considered are still those that can be established
using analytic expressions. After defining functions, Cauchy adds: a functions is said to be
explicit if the equation which links .x to . y is solved in . y, and implicit if it is not. The fact that
the correspondences are established using analytic expressions never appeared in reasonings
of Cauchy, so that the properties and proofs obtained by Cauchy apply immediately to the
functions satisfying the Riemann’s definition.4
For Cauchy, a function . f (x) is continuous at value .x0 if, for any positive number .ε, we
can find a number .η(ε) such that the inequality .|h| ≤ η(ε) implies
.| f (x0 + h) − f (x0 )| ≤ ε;
the function . f (x) is continuous in .(a, b) if the correspondence between .ε and .η(ε) could
be chosen independent of any number .x0 in, .(a, b).
We recognize here the definitions that are now classic.
To prove the existence of primitive functions of continuous functions, it is sufficient to
consider the geometric proof indicated previously. In this proof, we have appealed to the
concept of area. This concept, already quite delicate when it comes to the domains bounded
by simple geometric curves such as circles or ellipses, becomes even more delicate when it
comes to domains involved in the proofs which concerns us.
The curves . which bound these domains are not necessarily geometric
√ curves anymore,
they could be formed of the parts of the geometric curves .(y = + x 2 ). Therefore, we know
that these curves could be complicated without knowing where this complication stops. Also,
3 This is what Méray did when he assigned to the word function a meaning very close to the one
given to continuous function in the past. Méray defined the function through its Taylor series and
its analytic continuation. By adopting of Méray definition, the existence of the primitive functions
follows immediately from the properties of the entire series.
However, if we apply Méray definition to complex variable functions, we find ourselves necessarily
led to, as M. Borel pointed out to me, consider the discontinuous functions of a real variable. For
example, when a Taylor series is convergent on its circle of convergence, its values, on this circle,
could define two real variable discontinuous functions.
4 I do not mean by this, that the definition of Cauchy is less general than Riemann’s but only that, if
there existed functions satisfying Riemann’s definition without satisfying Cauchy’s, they would not
be excluded from the reasoning.
56 2 The Integration Before Riemann
Cauchy thought it necessary to clarify what should be understood by the number . S(x) in the
previous proof. All he had to do for this was to repeat the operations that were ordinarily
used to calculate the approximate value of . S(x) considered as area and to demonstrate that
these calculations led to a limit.5 We thus have the, now classical, proof of the existence of
primitive functions.
Let .(a, X ) be the considered interval. Let us partition .(a, X ) into partial intervals using
increasing numbers
.a0 = a, a1 , a2 , . . . , an−1 , an = X ;
where .xi is any number included between .ai−1 and .ai . We prove that . S tends to a definite
number . S(x) when the maximum of .ai − ai−1 tends to zero in any manner.
The number . S(x) thus obtained is called the definite integral of the function . f (x) in the
X
interval .(a, X ). Since Fourier, we represent it by the notation . a f (x) d x. This symbol up
to now made sense only in positive intervals .(a, X ), (X ≥ a); by definition, we set
X a
. f (x) d x + f (x) d x = 0.
a X
5 Very often, in mathematics, we take, as Cauchy did here, the procedure of calculation of a number
as its very definition. So much so that some mathematicians do not accept other definition of a number
than the one that permits its calculation.
The value of such a straightforward approach, as demonstrated by Cauchy, lies in its ability to
reduce the number of prerequisites needed to start our reasoning. It is said that Descarte brought down
geometry to algebra; however, would not have been true, if Cauchy, by his definition of the integral,
had not given a logical construction of the concepts hitherto deduced from geometric intuition—such
as area, the volumes, etc.
Then, there was a progress of immense philosophical importance; but, as the work of Cauchy did
not bring any enrichment to the concept of integral, its mathematical interest is little. For this reason,
Cauchy presented it primarily as a pedagogical tool.
2.2 Integration of the Discontinuous Functions 57
ξ being included between .a and .b6 ; this is the mean value theorem.
.
The number . S(x) now being defined in a precise manner, we prove the existence of the
primitive function of . f (x) without difficulty. In fact, we have
x0 +h
S(x0 + h) − S(x0 ) 1
. = f (x) d x = f (x0 + θh)
h h x0
equality which proves that the function . S(x) is continuous and has . f (x) as its derivative.
The function . S(x) which figures in the previous proof or more precisely the function
X X
. S(X ) + K = K + f (x) d x = K 1 + f (x) d x,
a α
in which . K and . K 1 are the arbitrary constants and .α a value of .x taken in the interval
of definition of . f (x), is called the indefinite integral of the function . f (x) and is denoted
by . f (x) d x. We see that the indefinite integral of the function . f (x) is the most general
function . F(x) such that we have, for any .α and .β in the interval of definition of . f (x),
β
. F(β) − F(α) = f (x) d x. (2.1)
α
We also see that, for continuous functions, there is an identity between the indefinite integrals
and the primitive functions.7
In what preceded, the definite integral appeared as an element that enabled us to calculate the
primitive function. On the contrary, in practise, the primitive functions serve to calculate the
definite integrals. These definite integrals are limits of sums. Their number of terms increases
indefinitely while the absolute values of these terms tend to zero. They are encountered in
a large number of questions of analysis, geometry and mechanics.8 For the calculation
6 This proof does not exclude the equality .ξ = a, ξ = b. In some cases it is good to prove that we
could choose .ξ different from .a and .b; the proof is immediate.
The considered theorem is the mean value theorem for the function
x
. S(x) = const. + f (x) d x;
a
it provides, in fact, an expression of increment . S(b) − S(a) undergone by . S(x) when we pass from
.a to .b.
7 This would not hold true if we had not introduced the constant . K in the definition of the indefinite
integral.
8 The simplest application of the concept of integrals is in the quadrature of plane domains. Because
of this application, we often track the concept of definite integral back to Archimedes and to the
58 2 The Integration Before Riemann
of some of these limits of sums, for example, for the definition and the calculation of
the area bounded between a curve and its asymptotes, the integration of the continuous
functions alone is not sufficient anymore. We are thus led to focus on the integration of
functions that are infinite at some points or in the neighborhood of some points. On the other
hand, in certain applications of definite integrals—such as calculating the coefficients of
the trigonometric series representing a given function—it seems advantageous to define the
integral of a function which, while remaining finite, is discontinuous at some points. Ever
since the introduction of the notion of the definite integral, this concept has been extended
to certain discontinuous functions.
We were led to the definition that will be given later by assuming as a principle the
identity, observed in the case of continuous functions, between the indefinite integral and
the primitive function. Let us consider the function . f (x) which, for .x = 0, is equal to . √3 x.
1
tend to a definite limit when .h (positive) tends to zero, then we have by definition
quadrature of the parabola. It is true that a lot of quadrature had been done before the introduction of
integral calculus, but the mathematicians did not attach any particular importance to special domains
whose areas must be calculated to obtain definite integrals. The importance of these domains became
quite evident only after the introduction of the concept of the derivative.
9 This function, not defined for . x = 0, admits, as we know, a trigonometric expansion. We could also
√
denote it by . + x x .
2
10 It is good to add that the definite integrals, that we could thus attach to the two kinds of discontinuous
functions we consider, permit us to express the coefficients of trigonometric expansion of the functions
using the formulae of Euler and Fourier which are applied in case of continuous functions.
11 Cauchy did not deal with the values of the function for point . x = c. Moreover, for him, if . f (x)
tends to a definite value when .x tends to .c, this limit value is . f (c). If . f (x) does not tend to a unique
limit, . f (c) is any one of the values included between the smallest and the largest of the limits of
. f (x). In some Memoirs, P. Du Bois Reymond has taken these conventions.
2.2 Integration of the Discontinuous Functions 59
b c−h b
. f (x) d x = lim f (x) d x + f (x) d x .
a h→0 a c+h
If12 in .(a, b) there exist several points of discontinuity, we partition .(a, b) into a sufficient
number of partial intervals so that, in each of them, there exist not more than one singular
point, if that is possible. Then we take the sum of the numbers thus obtained.
These definitions are linked to the known criteria regarding the existence of integrals of
functions that are infinite around a point.
For the research related to the theory of functions and in particular for the study of
trigonometric series, Lejeune–Dirichlet extended the concept of trigonometric series. The
research of Lejeune–Dirichlet, which he stated himself, has never been published. But,
according to Lipschitz, it can be summed up as follows.
Let a function . f (x) be defined on a finite interval .(a, b), in which it is to be integrated.
Let .e be the set of points of discontinuity of . f (x). If .e contains only a finite number of
points, we would apply the definitions of Cauchy.
According to Lipschitz, the case that Dirichlet studied is the one where the derived set .e
of .e contains only a finite number of points,13 as it is seen in the example, for the function
1 , where .e contains only . x = 0.
1
.
sin x
The points of .e then divide .(a, b) into a finite number of partial intervals. Let .(α, β)
be one of them. In .(α + h, β − k), there are only a finite number of points of .e. If, in this
interval, the definitions of Cauchy are not applicable, we say that the function does not
have an integral in .(a, b). If, on the contrary, they are applicable, we consider the integral
β−k
.
α+h f (x) d x and we simultaneously let .h and .k tend to zero according to an arbitrary. If
we do not obtain a definite limit, . f (x) does not have an integral in .(a, b). If, on the contrary,
we have a definite limit, we set
β β−k
. f (x) d x = lim f (x) d x.
α h→0 α+h
k→0
The integral in .(a, b) is, by definition, the sum of the integrals in the intervals .(α, β).
We see that the definition of Dirichlet is based on the same principles as that of Cauchy.
The general definition which evolves from these principles can be stated thus:
A function . f (x) has an integral in a finite interval .(a, b) if there exists in .(a, b) one and
only one continuous function . F(x), up to an additive constant, such that we have
12 Cauchy also dealt with the case where the right-hand side of this equality would have a meaning,
without the two integrals which appear in it having the limits. In this case, he called this right-hand
side the principal value of the integral . ab f (x) d x.
13 For the definition of derived sets and the properties of the reducible sets, see the Note at the end
of the volume. In the recently given translation of Memoir of Lipschitz (Acta mat.,t.36), M. Montel
observes that Lipschitz explicitly admitted only that the derived set is not dense (see below). However,
Lipschitz believed he could conclude that it contained only a finite number of points. This error further
obscured the intended meaning of Lipschitz’s text.
60 2 The Integration Before Riemann
β
. f (x) d x = F(β) − F(α), (2.2)
α
in any interval where . f (x) is continuous. . F(x) is the indefinite integral of . f (x) and we set
b
. f (x) d x = F(b) − F(a).
a
For this definition to be applicable, it is first necessary that there exists a continuous func-
tion . F(x) satisfying the formula (2.2). This amounts, in both cases treated by Cauchy and
Dirichlet, to assuming the existence of the limits used in the definition. We assume this
condition satisfied and we would seek how should the singular points of . f (x) be distributed
for this function to have an integral. From the point of view which concerns us, the singular
points of . f (x) are those which are not interior to any interval in which . f (x) is continuous.
Therefore, these are the points of .e and of .e . These points form a set which we denote by
. E. Any limit point of points of . E, by its very definition, is also a point of . E. Therefore, . E
contains all its limit points. These are the sets that Jordan called perfect and M. Borel called
relatively perfect. We would call such sets as closed sets, in conformity with the usage which
is now universal.
For the formula (2.2) to completely define . F(x), it is necessary that, in any interval, there
exists another interval where . f (x) is continuous. Therefore, the set . E must be such that, in
any interval, we could find an interval that does not contain points of . E. This is what we
mean by saying that . E must be non-dense in every interval.14
This property of . E is by no means sufficient. For stating the necessary and sufficient
property that . E must satisfy, it is necessary to resort to the properties of the derived sets.
The closed set . E has successive derived sets . E , E , . . . , E ω , . . .. We know that, if one
of the derived sets is empty, . E is called reducible-it is a countable set. Otherwise, one of the
derived sets is perfect, . E and any of its derived sets have the cardinality of the continuum.15
These are the properties which will be useful to us. Let us suppose that there exists a
function . F(x) satisfying the equality (2.1) in all intervals where . f (x) is continuous and let
us investigate if . F(x) is well-determined; when this is the case, the equality (2.1) would be
used in the definition of the integral.
β
We will rely on this obvious remark: if the integral . α f (x) d x, which appears in the
left-hand side of (2.1), has a meaning in all the intervals which do not contain any of the
14 P. Du Bois Reymond, to whom we owe the distinction between two remarkable classes of sets—
those we call sets dense in every interval on one hand, and sets non-dense in every interval on the
other—referred to the former as systèmes pantachiques or pantachies and the latter as systèmes apan-
tachiques or apantachies. It was also Du Bois Reymond who gave the general method of formation
of closed sets and the apantachies. This procedure consists of removing a suitably chosen finite or
countable number of intervals from an interval. On the subject of closed sets and of non-dense sets
see BOREL, Leçons sur la theorie des fonctions, Chap. . I I I .
15 See the Note at the end of the Volume.
2.2 Integration of the Discontinuous Functions 61
finite points .x1 , x2 , . . . , xn , the different continuous functions . F(x) satisfying the equality
(2.2) could differ only by a constant.
If . E contains, only a finite number of points, . F(x) is therefore well-determined, and
hence the definition of Cauchy.
The left-hand side of (2.2) now has a meaning in any interval not containing points of
. E ; therefore, if . E has only a finite number of points, . F(x) is well-determined, and hence
to the case where either . E ω+1 , or . E ω+2 , . . . have only a finite number of points and then to
2ω
the case where . E possesses this property, and so on.
Thus, we see that, if . E is reducible, . F(x) is well-determined, so that our definition is
applicable. Then, there exists an integral obtained by repeated application of the method of
Cauchy–Dirichlet.
To have examples of such functions on which this method is applicable, it is sufficient to
take a reducible set . E, arrange its points in a simply infinite sequence, .x1 , x2 , . . . , and form
the series
1 1 1 1 1
. f (x) = sin + sin + · · · + p sin + ··· .
x − x1 2 x − x2 2 x − x p+1
Now17 suppose that the set . E of singular points of . f (x) is not reducible. We would see
that if there exists a function . F(x) satisfying the condition (2.2) in any interval where . f (x)
is continuous, there exist an infinite such . F(x).
Let . E α be that derived set of . E which is perfect. . E α is obtained by removing from
the considered interval .(a, b), the points interior to the intervals .δ1 , δ2 , . . . , which form a
countable sequence if. E is non-dense in every interval. This is the only case we are interested
in.
Let us define a function .ϕ(x) by the condition of being zero for .x = a, equal to .1 for
. x = b. At all the points of .δ1 , ϕ(x) = . At all the points of .δ2 , ϕ(x) = , if .δ2 is between
1 1
2 4
.a and .δ1 ; and .ϕ(x) = , if .δ2 is between .δ1 and .b. In a general way, having assigned to
3
4
.ϕ(x), in .δ1 , δ2 , . . . , δn−1 , the values .1 , 2 , . . . , n−1 , we assign to .ϕ(x), in .δn , the value
i + j
.
2 , .i and . j being the indices of those of two intervals which contain .δn .
Any point of . E α is the limit point of some intervals .δn . It is easy to see that if the points of
.δα1 , δα2 , . . . tend to . x, .α1 , α2 , . . . tend to a definite limit. We take this limit as the value
16 Because, in such an interval, one of the sets . E n has only a finite number of points.
17 According to the properties of uniformly convergent series, . f (x) has all points of . E as its points
of discontinuity. We easily see that the previous series is term-by-term integrable.
For examples of reducible sets, see the Note at the end of the volume.
62 2 The Integration Before Riemann
Now, if we note that the sets which (Sect. 2.2) are denoted by . E and .e are reducible
simultaneously,18 we see that for the adopted definition to be applicable, it is necessary
and sufficient that the set of points of discontinuity of the function to be integrated . f (x) be
reducible and that there exists a continuous function . F(x) satisfying (2.2) in the intervals
where . f (x) is continuous.
18 It is quite necessary to note that .e can be countable without . E being countable; in such a case, .e
is a countable yet non-reducible set. Such is the case of the set of rational numbers.
The Definition of Integral Given by Riemann
3
The functions on which the previous definitions apply can have an infinite number of points
of discontinuity; but these points are still exceptional, in the sense that they form a non-dense
set. Dirichlet incidentally encountered the function
which is discontinuous at every point, since it is zero for.x irrational, equal to.1 for.x rational.
The considerations of Cauchy and Dirichlet therefore do not apply to all the functions in
the sense of Cauchy. Riemann1 showed, by an example, how the use of the series would
permit to construct the functions whose points of discontinuity form an everywhere dense
set. Therefore on such functions, the previous definitions cannot apply.
Let .(x) be the difference between .x and its nearest integer; if .x is equal to an integer plus
. , we take .(x) = 0. The function thus defined is called excess of . x; it is a function in the
1
2
sense of Cauchy because it admits a Fourier expansion, proceeding along the trigonometric
lines of multiples of .2πx, which is everywhere convergent. Let us consider the function, in
the sense of Cauchy,
(x) (2x) (3x)
. f (x) = + 2 + 2 + ··· ;
12 2 3
we immediately see that if .x is not of the form . 2 p+1
2n (.n and .2 p + 1 being co-prime) . f (x) is
continuous. On the contrary, if .x is of the indicated form, when .x increases to . 2 p+1
2
2n , f (x)
converges to a limit which we denote
1 On the possibility of representing a function by a trigonometric series (Bulletin des Sciences Math-
ematique, .1873, OEuvres des Riemann).
2 We would rely on the uniform convergence of the series representing . f (x).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 63
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_3
64 3 The Definition of Integral Given by Riemann
2p + 1
. f −0
2n
and3 which is
2p + 1 2p + 1 π2
. f −0 = f + ;
2n 2n 16n 2
when .x decreases to . 2 p+1
2n , . f (x) converges to
2p + 1 2p + 1 π2
. f +0 = f − .
2n 2n 16n 2
In any interval,. f (x) has a point of discontinuity. Therefore, the considerations of previous
chapter are not applicable to . f (x).
By using a procedure similar to that of Riemann, it was possible to form a number of
examples of functions that are very discontinuous. By using the now classic concept of
uniformly convergent series, it is easy to give a general statement: a uniformly convergent
series of discontinuous functions . f n defines a function . f which admits for points of discon-
tinuity all the points of discontinuity of functions . f n , provided that each of these points is a
point of discontinuity for only one single function . f n . When it is not so, as in the example
of Riemann, it is necessary to find if the different discontinuities, that we encounter for
the considered values, do not compensate themselves in such a manner that . f becomes
continuous.
We often have the opportunity of applying a similar procedure when, knowing the func-
tions . f n which exhibit a certain singularity at the isolated points . An , we want to construct
a function that exhibits this singularity in the whole interval. We try to obtain the desired
result by taking a uniformly convergent series of functions . f n , such that the corresponding
points . An form an everywhere dense set. This method of construction is named principle of
condensation of singularities.4
The example of Riemann shows that the functions, to which the definition methods
examined in the previous chapter cannot be applied, do not constitute a very particular class
in the set of functions in the sense of Cauchy. And since the restriction5 imposed, according
to Cauchy, on the functions . f (x), namely the relation between . f (x) and .x is analytically
expressible, never played a role in our reasoning. It neither simplified the statements, nor
the solutions of the problems that we have proposed. Therefore there is no disadvantage in
saying, according to Riemann: . y is a function of .x if, to each value of .x, there corresponds
a well-determined value of . y, regardless of the method which led to establish this corre-
spondence. It is this definition that we would now adopt. Only, instead of always supposing
that .x can be taken arbitrarily in an interval .(a, b), we would, at times, suppose that .x must
be taken in a certain set . E, at the points of which the function . y is thus defined, without it
being defined for all the points of interval. For example, the function . x1 ! is defined for the
set of inverses of the positive integers.
Before taking up the study of the integration of functions in the sense of Riemann, I
would give those properties of the functions which would be useful to us in what follows.
If we know that a function always remains confined between the two finite numbers . A and
6
. B, we say that it is bounded. We would mostly limit ourselves to the study of bounded
7
functions . When the function is bounded, it admits an upper limit . L and a lower limit .l.
These numbers are defined, as we know, by the condition that .(l, L) is the smallest interval
containing all the values of . f (x). .ω = L − l is said to be the oscillation of . f (x).
Let . A be a limit point of the set . E on which . f (x) is defined.8 Let .δ1 be an interval
containing . A; in this interval there exist points of . E; they form a set .e1 . The function . f (x)
defined on.e1 admits the upper and lower limits,. L 1 , l1 , an oscillation.ω1 . Let.δ2 be an interval
containing . A and contained in .δ1 , the numbers corresponding to .δ2 are . L 2 , l2 , ω2 ; and we
obviously have
.l 1 ≤ l 2 ≤ L 2 ≤ L 1 , L 2 − l 2 = ω2 ≤ ω1 .
If we consider the intervals .δ1 , δ2 , δ3 , . . . all containing . A, one containing the others, and
the lengths of which converges to zero, we have a sequence of upper limits and lower limits
satisfying the inequality
.l1 ≤ l2 ≤ l3 ≤ · · · ≤ L 3 ≤ L 2 ≤ L 1
The .li on one hand, the . L i on the other, converge therefore to the two limits .l and . L .(l ≤ L)
and the .ωi converge to
.ω = L − l.
We would see that the numbers, . L, l, ω, thus obtained, are also the limits of numbers
. L , l , ω , corresponding to the intervals.δ containing. A and whose two extremities converge
i i i
to . A in any manner as .i goes to infinity. In other words, they are independent of the choice
of the intervals .δi and we can assume that these intervals are not necessarily contained in
6 It is well understood that an unbounded function can however be always finite; such is the case with
function
1
. f (0) = 0, f (x) = for x = 0.
x
If we only know that a function is always smaller than a fixed number, we would say that it is bounded
from above.
7 We often find that the questions are very simple to treat when they are limited to the bounded
functions, whereas they tend to become very complicated for most general functions. I have carefully
indicated in the following if the theorems obtained are valid for all functions or only for the bounded
functions. Ofter, we omit to indicate explicitly that the functions considered are bounded.
8 . A does not necessarily belong to . E.
66 3 The Definition of Integral Given by Riemann
.li ≤ l j ≤ lk ≤ L k ≤ L j ≤ L i ,
the right side of . A, beyond . A. If .ωd = 0, that is, if . Md = m d , f (x0 + 0), exists and is
equal to . Md ; the function is said to be right continuous at point .x0 to the right of .x0 . If
. Md = m d = f (x 0 ), the function. f (x) is said to be right continuous at point . x 0 . We similarly
where.n is integer..ϕ is zero for.|x| < 1; for calculating.ϕ in this case we can choose arbitrarily
a sequence of increasing integers.n 1 , n 2 , . . . and take the limit of the corresponding sequence
. x i . If . x is no longer included between .−1 and .+1, by so operating and suitably choosing the
n
9 These definitions are related to very important concepts of lower semi-continuous functions and
upper semi-continuous functions introduced by M. Baire. These are the functions . f which are equal
at each point, respectively, to the number .l or . L attached to . f at this point.
10 The previous definition is that of maximum, minimum, oscillation of . f (x) to the right of . x , . x
0 0
being excluded. We also often consider similar numbers, including .x0 ; it is necessary, then, to take
the values of .x equal to or greater than .x0 , (x ≥ x0 ). It is this method of operating that is linked to
the concept of right-continuity at point .x0 .
3.1 Properties Relating to Functions 67
integers .n i , we could still have limit, but this limit would depend sometimes on the choice
of the integers .n i . For .x = −1, the set . A contains only .+∞ and .−∞,11 which are the two
limits of indetermination.
For .x = 1, ϕ is equal to .1. For .x > 1, .ϕ is equal to .+∞. The concept of the limits of
indetermination can often be replaced by the simpler concept of smallest and largest limit,
a concept that we attribute to Cauchy.
Let us suppose that the number .ϕ is defined as the limit of a function .ψ(λ) for .λ = λ0 ;
.λ takes either all the possible values or only those from a certain set of which .λ0 is a limit
point (the previous example reduces to this case if we take .λ = n1 , where .n is integer, and
.λ0 = 0). The function .ψ(λ) is not defined for .λ = λ0 , but we know that it has a minimum
or limit inferior .l and a maximum or limit superior . L 12 for .λ = λ0 ; these numbers, finite
or not, are respectively the smallest and the largest of the limits that we can obtain when,
in .ψ(λ), we letλ converge to .λ0 . .l and . L are the two limits of indetermination defined
previously; but, in the case we deal with, these numbers are contained in the set . A of limit
values, while, in the general case, they either make part of . A or of the derived set . A of . A.
But it is also possible, and we shall soon see examples, that the function .ψ(λ) is no longer
a well-determined function, but a function of several determinations.
We say that we have such a function if, to each value of .λ, taken from a set where the
function is defined, corresponds a set of numbers; each of these numbers is represented by
notation .ψ(λ). What has been said in relation to the limit superior and the limit inferior
for the single valued functions, is applicable without any change to the functions of several
determinations..ψ(λ) therefore has a limit inferior .l and a limit superior . L for .λ = λ0 , which
are respectively, the smallest and the greatest of the limits that one can attain by choosing
a sequence of numbers .λi converging to .λ0 and by suitably choosing the corresponding
numbers .ψ(λi ). These two numbers are the limits of indetermination of the limit of .ψ(λ)
when .λ converges to .λ0 .13
Now, let us go back to the study of functions.
There is a very simple relation between the oscillations relative to the intervals contained
in .(a, b) and the oscillations at the different points of .(a, b). We can express it thus:
If, at every point of .(a, b), the oscillation is at most equal to .ω, then in any interval of
length .λ, interior to .(a, b), the oscillation is less than .ω + ε when .λ is sufficiently small; .ε
being any positive number.
The set of numbers .a p , have at least, one limit point .α. If we take a sequence of numbers
.a p converging to .α, the numbers .b p also converge to .α, therefore at .α the oscillation is at
least .ω + ε. There is, then, a contradiction with the hypothesis.
The property is proved. In case where .ω = 0, it reduces to a well known fact: a function
continuous at all the points of an interval is continuous in this interval.14
The converse of our property is not true. Let a function be equal to .−1 for .x negative, to
.+1 for . x positive, zero for . x zero. Its oscillation at . x = 0 is .2 and, however, if we use the
point of division .x = 0, the function has an oscillation equal to .1 in each of the intervals
obtained.
We would now define the mean oscillation of a bounded function . f (x) defined on a finite
interval .(a, b). Let us partition the interval .(a, b) into partial intervals .δ1 , δ2 , . . . , δn . Let
.ωi be the oscillation of . f (x) in the interval .δi , the extremities of the .δi may or may not be
considered as making part of the interval. And let us form the following quantity
δ1 ω1 + δ2 ω2 + · · · + δn ωn
. A=
b−a
If . is the oscillation of . f (x) in .(a, b), ω1 , ω2 , . . . , ωn being at most equal to ., A is at
p
most equal to .. Therefore, if we divide .δi into partial intervals .δi1 , δi2 , . . . , δi i at which the
p
corresponding oscillations are .ωi1 , ωi2 , . . . , ωi i , we have
pi
j=
j j
. δi ωi ≤ δi ωi
j=1
By subdividing the intervals .δi we therefore replace . A by a smaller or at most equal number.
Let us consider two series of divisions of .(a, b) into partial intervals; to the divisions, . Di ,
of the first series correspond the numbers . A1 , A2 , . . . , to, .i , of the second, the numbers
.α1 , α2 , . . .. Let us suppose that, for each of the two series, the maximum of the length of
the intervals used in the .ith division converges to zero as . 1i 15 ; in these conditions we would
see that . Ai and .αi have the same limit.
Let us compare . Ai and .α j . The intervals of the division . j are of two types: some, the
intervals .d, containing the points of the division . Di in their interior; while the others, the
14 It is this property that we state: the continuity is uniform. We express it by saying that the quantity
.η(ε)can be chosen uniformly in the considered interval, that is, independent of the variable .x; see
Sect. 2.1.
The general theorem that we have proved here is due to M. Baire.
15 The points of division used in the .ith division are not necessarily used in the .(i + 1)th; in other
words, for passing from one division to the next, we do not sub-divide the intervals of this division;
we mark the new intervals without worrying about those which are previously used.
3.2 Conditions of Integrability 69
intervals .d , are entirely contained in the intervals of . Di . The contribution of the intervals
.d to the numerator of .α j is at most .nλ j , if .n is the number of points of division of . Di
and .λ j the maximum of the length among the intervals of . j . The intervals .d are part of
the division .j , obtained by combining the points of division of . Di and . j , therefore their
contribution to the numerator of .α j is at most equal to .(b − a)Aj , where . Aj is the number
analogous to . A and related to .j . But, since we know that . Aj is at most equal to . Ai , we
deduce
.α j ≤ Ai + nλ j .
All the .α j , starting from a certain index, are smaller than . Ai + ε, (ε > 0); therefore their
largest limit is at most . Ai + ε and, since .i and .ε are arbitrary, the largest limit of .α j is at
most equal to the smallest of . Ai . Nothing would stop us from interchanging . Ai and .α j in
the reasoning. Therefore, all the limits of . Ai and of .α j are equal, . Ai converges to a definite
limit. This limit .ω is the mean oscillation of the function in .(a, b).
It is necessary to note what we have proved: . Ai converges uniformly to .ω; that is, when
the length of all the intervals are smaller than a certain number .λ, the number . A differs from
.ω only by a quantity smaller than .ε chosen in advance.
Given these definitions, let us arrive at the definition of the integral as given by Riemann.
Riemann brought his attention on the operative procedure which allows to calculate the
integral with as close an approximation as we want, in case of continuous functions. Then,
he asked, in what cases does this procedure give a definite number,16 when applied to the
discontinuous functions.
Let . f (x) be a bounded function defined on a finite interval .(a, b). Let us divide the
interval .(a, b) into partial intervals .δ1 , δ2 , . . . , δn and let us choose arbitrarily, for any .i, a
point .xi in .δi or coinciding with one of the extremities of .δi . Let us consider the sum
16 Cauchy applied his method of defining integral only to the functions which were considered a priori
as ‘interesting’: the continuous functions. Now, on the contrary, any function would be considered
interesting to which the method of definition is applied.
Hence, on one hand, a new classification of functions arises, and on the other, an enrichment of
the concept of integral follows. If we compare the results of Cauchy and that of Riemann (note 1,
Sect. 2.1), it is necessary to note the mathematical character of the progress made due to the later.
The way in which this progress was attained: delimiting the domain of application of a definition
when we do not introduce a priori any restriction on its use, has frequently been used since a century.
70 3 The Definition of Integral Given by Riemann
Let us continuously increase the number of intervals .δ and choose them in such a manner
that the maximum of their length converges to zero.17 Then, if. S converges to a definite limit,
independent of the intervals and of the chosen points .xi , Riemann said that the function . f (x)
is integrable and has for integral, in .(a, b), the limit of . S.
When .δ1 , δ2 , . . . , δn are chosen, the number . S is not completely determined; its lower
limit and upper limit are:
.S = li δi , S = L i δi ,
where.li and. L i represent the lower limit and upper limit of. f (x) in.δi . Let us set. L i − li = ωi ,
then
.S − S ≤ S − S = δi ωi .
j ≤ Si + n Lλ j .
.
From this inequality we conclude, as previously, that . Si and . j have the same limit and
even that they converge uniformly to this limit. The property is proved for . f + k, therefore
it is true for . f , because, in passing from . f to . f + k, we increase all the sums . S by .k(b − a).
In the following, it is important to note that we have proved the existence of a limit for . S,
without making any assumption on the bounded function . f (x). The condition that . f (x) has
mean oscillation zero appears only when, from the existence of a limit for . S, we deduced
the existence of a limit for . S.
We can transform the obtained integrability condition: it is necessary and sufficient that
the sum . δi ωi converges to zero. It is equivalent to saying that the intervals .δi , in which .ωi
17 It is understood, of course that, in passing from one division to the next, we are not bound to use
the points of division already used.
3.2 Conditions of Integrability 71
is greater than an arbitrarily chosen positive number .ε, for sufficiently large .i, have a total
length .λ as small as we want, because we have:
.λε ≤ δi ωi ≤ (b − a − λ)ε + λ,
being the oscillation of . f (x) in .(a, b). We thus have the statement given by Riemann:
.
For a bounded function to be integrable over .(a, b), it is necessary and sufficient that we
could divide .(a, b) into partial intervals such that the sum of the lengths of those intervals
in which the oscillation is greater than .ε, can be made as small as we want, for any .ε > 0.
If such a division is possible, there exist an infinite number of them in any sequence of
divisions such that the maximum length of partial intervals converges to zero, since for any
such sequence, . δi ωi always converges to the same number.
From this property of . δi ωi , it also follows that, if to a sequence of divisions of the
given nature, there correspond the numbers . S and . S having the same limit, we can derive
the integrability of the considered function.
The form given by Riemann to the condition of integrability shows that although the
function is integrable, it does not highlight the role of the points of discontinuity of the
function. Paul Du Bois Reymond has highlighted this role by a transformation of the con-
dition of integrability. The statement of Paul Du Bois Reymond assumes that the definition
of integrable groups is known.
A set of points of a straight line constitutes an integrable group, if the points of the set
could be enclosed in a finite number of segments, the sum of lengths of which can be made
as small as we want.18
A finite number of points constitute an integrable group, but the converse is not true.
Let us consider the set . Z of points whose abscissa are given by the formula
a1 a2 a3
. x= + 2 + 3 + ··· ,
3 3 3
In which all the a’s are either .0 or .2. This set is obtained by removing from the interval
.(0, 1), first the points interior to the interval .( , ), then the points interior to the inter-
1 2
3 3
vals . 32 , 32 and . 3 + 32 , 3 + 32 , then the points interior to the intervals . 313 , 323
1 2 2 1 2 2
and . 322 + 313 , 322 + 323 and . 23 + 313 , 23 + 323 and . 23 + 322 + 313 , 23 + 322 + 323 , .· · · etc.
Therefore we always divide each remaining interval in three equal parts and we remove the
middle part. After .n such operations, there remains .2n intervals. These .2n intervals can be
n
used to enclose19 the points of . Z . However, they have a total length . 23n . Therefore, . Z is an
18 We can, whenever we want, consider that a point is enclosed in an interval, either if it is in the
interior of this interval or coincides with its extremities; or if it is in the interior of the interval,
excluding the extremities. The two corresponding definitions of the integrable groups are obviously
identical. For passing from the first to the second it is sufficient to stretch the intervals, and their two
extremities, as little as desired.
19 To ‘enclose’ is taken here in the larger sense.
72 3 The Definition of Integral Given by Riemann
integrable group. This construction of . Z also shows that . Z is perfect, therefore it has the
cardinality of continuum.
It is obvious that the set formed by the union of points of two integrable groups is an
integrable group.
Here is the statement of Du Bois Reymond:
For a bounded function to be integrable, it is necessary and sufficient that, for any .ε > 0,
the points where the oscillation is greater than .ε form an integrable group.
Suppose . f is integrable, then we can divide .(a, b) into partial intervals such that those
intervals in which the oscillation is greater than .ε have a total length smaller than .η. A point
where the oscillation is greater than .ε cannot belong to an interval where the oscillation is
not greater than .ε. Therefore, such a point is necessarily one of the points, which are used
in the division of .(a, b), or else it is in the intervals of length .η. The points of division being
finite in number, the points where the oscillation is greater than .ε could be enclosed in a
finite number of intervals of total length .2η, and, as .η is arbitrary, they form an integrable
group.
Conversely, we suppose that the points of oscillation greater than .ε form an integrable
group. We can therefore enclose them in a finite number of intervals of total length .η. Let
us use these intervals . I for the division of .(a, b) and let . I be the other intervals. In each . I ,
there are no more points of oscillation greater than .ε, each of these intervals can therefore
be divided into partial intervals . I in each of which the oscillation is at most .2ε. The only
intervals, with oscillations greater than .2ε, are therefore some of the intervals . I ; their total
length is at most .η and that is sufficient, according to the criterion of Riemann, for asserting
that . f is integrable.
In the previous statement, we can replace the set .G(ε) of points where the oscillation is
greater than .ε by the set .G 1 (ε) of points where the oscillation is not smaller than .ε, because
ε
. G( ) contains . G 1 (ε) which itself contains . G(ε).
2
The set .G 1 (ε) enjoys one of the properties which would allow us one last transformation
of the condition of integrability: .G 1 (ε) is closed. Indeed, if . A is a limit point of .G 1 (ε), any
interval containing . A contains points of .G 1 (ε) and . f has an oscillation at least equal to .ε in
this interval.
For the new statement of the condition of integrability, I will introduce a concept that
will be encountered later: that of a set of measure zero. This is a set whose points can be
enclosed in a finite number or a countably infinite number of intervals total length of which
can be made as small as we want.
A point, and an integrable group are examples of sets of measure zero. The set . E formed
by the union of a finite number or a countably infinite number of sets . E n of measure zero
is obviously also of measure zero20 ; any countable set is of measure zero. It is enough to
20 Because we can enclose . E in a countably infinite number of intervals .α of total length . ε and
n n 2n+1
the set. E, the union of sets. E n , could be enclosed in countably infinite number of intervals.α1 , α2 , . . .
of total length
ε
. = ε.
2n+1
3.2 Conditions of Integrability 73
show the difference between a set of measure zero and an integrable group: the first can be
everywhere dense, the second is always non-dense.
Let . f (x) be an integrable function, its points of discontinuity are those of the set obtained
by the union of the integrable groups .G(1), G( 21 ), G( 13 ), . . .; therefore they form a set of
measure zero.
Now, let . f (x) be a bounded function whose points of discontinuity form a set of measure
zero. .G 1 (ε), being a subset of this set, is of measure zero, and it is closed. We will show
later that this is sufficient to assert that .G 1 (ε) is an integrable group.21 It follows that . f is
integrable. Therefore:
For a bounded function to be integrable, it is necessary and sufficient that the set of its
points of discontinuity be of measure zero.22
As an example of discontinuous integrable function, Riemann cites the function
irrational and .ϕ would be . q1 for .x = qp (. p and .q being co-primes). .ϕ is integrable since its
points of discontinuity, being those of rational abscissa, form a countable set.
The function . f (ϕ) is the non-integrable function .χ(x) of Dirichlet (Sect. 3.1), since all
its points are points of discontinuity.
We can sharpen the first two theorems which have just been obtained. Let . f and .ϕ be two
integrable functions; let us partition the interval where they are given in parts .δ1 , δ2 , . . . , δn
in which we choose the values .x1 , x2 , . . . , xn . We have
. δi [ f (xi ) + ϕ(xi )] = δi f (xi ) + δi ϕ(xi );
Now the three sums which appear in this equality are the approximate values of the integrals
of . f + ϕ, f , ϕ; therefore the integral of . f + ϕ is the sum of the integrals of . f and .ϕ.24 In
other words:
The integral of a sum is the sum of the integrals. We suppose, of course, that it is for a
true sum, that is, sum of a finite number of terms and not of a series.
For arriving at the case of a uniformly convergent series, it would be advantageous for
us to use the theorem of means.
Let . f (x) be a function bounded between .l and . L in .(a, b). The integral of . f is, as we
know, the limit of the sum . S = δi f (xi ), but we have
24 It suffices to slightly modify the wording in order to simultaneously demonstrate the integrability
of . f + ϕ, which is assumed to have been previously established in the text’s exposition.
3.3 Properties of Integral 75
.(b − a)l = δi l ≤ δi f (xi ) ≤ δi L = (b − a)L
Therefore . S, and as a result its limit, the integral, is bounded between .(b − a)l and
(b − a)L; the integral is therefore of the form .(b − a)μ, where .μ lies between .l and . L; this
.
Let .sn be the sum of the first .n terms, .rn be that of remaining terms, . F, Un , Sn , Rn be the
integrals of . f , u n , sn , rn . We have
. Sn = U1 + U2 + · · · + Un ,
according to the theorem on the integration of a sum. This same theorem shows that
. F = Sn + Rn .
However, when .n is greater than .n 1 , .rn is smaller than .ε in modulus, and . Rn is smaller than
.|b − a|ε in modulus. When .n is greater than .n 1 , .|F − Sn | is smaller than .|b − a|ε. The series
. Un is therefore convergent to the sum . F.
A uniformly convergent series of integrable functions is term-by-term integrable.
The previous theorems are proved only in the case where the interval .(a, b) is a positive
interval .(b > a), since the integral has been defined only in this case. We complete the
definition as previously.
b
The integral in .(a, b) is always denoted . a f (x) d x. The complementary definition is
expressed by the equality
a b
. f (x) d x + f (x) d x = 0.
b a
It is obvious that the previous theorems proved for the positive intervals are also true for
the negative intervals.
I add that we immediately verify the following:
b c a
. f (x) d x + f (x) d x + f (x) d x = 0.
a b c
76 3 The Definition of Integral Given by Riemann
The definition we are concerned with, has been obtained by applying the methods of integral
calculus for continuous functions to discontinuous functions. We know that among bounded
functions there exist non-integrable functions for which this method does not lead to a definite
number. Nevertheless, we can use this procedure to attach two well defined numbers with
each bounded function.
We have seen (Sect. 3.2) that the sums . S = δi L i converges to a definite limit when the
numbers .δi converge to zero in an arbitrary manner. This limit is one of the two numbers,
which we are dealing with. We call it upper sum integral and we represent it by the symbol
b
. f (x) d x which is stated: upper sum integral from .a to .b of . f (x).
a
Similarly, we can prove the existence of a limit for the sums . S = δi li . Moreover, while
studying the mean oscillations, we saw that . δi ωi converges to a definite limit .(b − a)ω
and as we have
.S − S = δi ωi ,
the existence of the limit of . S is proved.25 This lower sum integral would be denoted by
b
.
a f (x) d x.
These two numbers have been defined for the first time, in a precise way, by Darboux.26
To complete their definitions, given only for .b > a, we set
b a b a
. + = 0, + = 0.
a b a b
It is necessary to remark that, in a negative interval, the upper sum integral is smaller
than the lower sum integral.
We always have
b c a b c a
. + + = 0, + + = 0;
a b c a b c
. ( f + ϕ) ≤ f + ϕ, ( f + ϕ) ≥ f + ϕ.
As seen through reasoning similar to that of Sect. 3.3 but without using the same rela-
tionships where inequality signs are replaced by equality signs; the inequality signs are
25 We could also deduce the existence of this limit from the existence of the upper sum integral for
.− f.
26 Annales de l’École Normale Superieure, .1875.
3.4 Integration by Lower Sum and Upper Sum 77
indispensable. For example, let us take . f (x) = χ(x) (Sect. 3.1), and .ϕ(x) = −χ(x); we
would have in .(0, 1),
. f = 1, ϕ= f + ϕ = 0, f = f + ϕ = 0, ϕ = −1.
When the maximum length .λ of the intervals .δi converges to zero. Let us set . S = ψ(λ), we
would thus define a multiple valued function (Sect. 3.1). The limits of indetermination of
.ψ(λ) are the two integrals by lower and upper sums, as .λ converges to .0. This suggests that
these two integrals often reveal us the limit superior and limit inferior of a number when we
know that this number is given by the integral . f d x whenever . f is integrable.
To better study the limit of indetermination of . S, it is necessary to determine the set . A of
all the limit points of . S.28 For the case of integration, we have this property which I have the
privilege of stating: Any number included between the upper sum and lower sum integrals
is one of the limits of the sum . S, when .λ converges to zero.29
27 We could use in a similar manner an arbitrary non-integrable function as an example in which the
signs of inequality are indispensable.
28 In certain cases, we have determined not only the set of limits of a function .ψ(λ), but also the
frequency of each of these limits. This has been done, notably, for summation of certain divergent
series (see BOREL, Leçons sur les séries divergentes, p. .5).
29 See LEBESGUE, Ann. de l’Ecole Norm. sup., .1910. As an exercise concerning the upper sum and
lower sum integrals, we could prove that, . f (x) being a function of bounded mean oscillation .ω in
.(a, b) and whose limit inferior, limit superior and oscillation at . x are .l(x), L(x) and .ω(x), we have
The same relations are true if, in the definition of . L(x), l(x), ω(x), we exclude the value .x of the
variable, or if, by these notations, we denote the limit superior, limit inferior and oscillation to the
right or to the left of .x, .x being excluded or not. (See the note 2, Sect. 3.1).
Geometric Definition of the Integral
4
In the first chapter, the definition of the integral was linked to certain areas; we will now
investigate if, through a similar geometric approach, we could arrive at the general definition
of Riemann. We will see that it is possible, so that Riemann’s integral appears to be the
natural generalisation of Cauchy’s integral, whether we place ourselves in the geometric or
analytical point of view.1 First, I will associate sets of numbers that will serve as analogues of
lengths, areas, volumes corresponding to line segments, plane domains, or spatial domains.
The first definition of these numbers is credited to Cantor; I would adopt Jordan’s method
of presentation which has simplified and completed the definition given by Cantor.2 Let . E
1 In what follows, I assume that the Euclidean length of a segment and the Euclidean area of a polygon
are defined.
To avoid any difficulties, it is easy to consider a point as a set of three numbers .x, y, z; a displace-
ment as a change of coordinates whose coefficients satisfy known conditions. Then by definition, the
distance between two points .(a, b, c) and .(α, β, γ) is
.+ (a − α)2 + (b − β)2 + (c − γ)2 .
The function defined in this way is, up to a multiplicative constant, the only function of two points
which remains invariable under displacements and satisfies the relation
when . Q is on the segment . P R. This is where the importance of length number comes from.
The area of a polygon is defined by the theorems of elementary Geometry; the importance of this
number is justified in the same way as that of length. (See the Géométrie élémentaire of M. Hadamard,
note . D, or the Géométrie of M. M. Gérard et Niewenglowski.)
2 In the case of a set of points in space, the definition which Cantor used (Acta mathematica, t. . I V )
can be stated thus: from each point . M in a set . E as centre, draw a sphere of radius .ρ. The set of points
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 79
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_4
80 4 Geometric Definition of the Integral
be a bounded set3 of numbers or, if we want, of points on a straight line. Let .(a, b) be one of
the intervals containing . E. Let us divide .(a, b) into a finite number of partial intervals. Let .λ
be the maximum of the lengths of these intervals. I denote by . A the sum of the lengths of the
partial intervals which contain points of . E and by . B the sum of the lengths of those whose
all points belong to . E.4 Jordan showed that . A and . B converge to two perfectly definite limits
when .λ converges to zero. For us, the existence of these limits is obvious, because . A and . B
are the approximate values of the integrals by upper and lower sums of the function .ψ equal
to .1 for the points of . E, zero for the other points.5
The limit of . A is called the exterior extent of . E, ee (E); that of . B is interior extent, .ei (E).
When these two extents are equal, we would say that the set is . J measurable, that is, by
the method of Jordan, and of extent6
in this case, the function .ψ attached to . E is integrable in Riemann’s sense and its integral is
e(E) in .(a, b).
.
Let us interpret the condition of integrability of .ψ. The points of discontinuity of .ψ are
points of . E which are limits of points not belonging to . E, and the limit points of . E which
do not belong to . E. These points are called, by Jordan, the frontier points of . E; their set is
interior to these spheres form one or several domains whose volume (in the ordinary sense of the
word) is obtained by a triple integral. Let . f (ρ) be this volume; the limit of . f (ρ), when .ρ converges
to zero, is the volume of . E.
This definition is equivalent to that of the exterior extent given by Jordan (t. .1 of the .2nd edition
of his Cours d’Analyse).
Minkowski used the number. f (ρ). In the case where. E is formed from points of a curve, Minkowski
considered the ratio . f (ρ)
2 ; if it has a limit, Minkowski called it the length of the curve. The area of
πρ
(ρ)
the surface is defined by the ratio . f2ρ .
We see that the number . f (ρ) could be used in the theory of sets. The above discussion seems
to suggest that it could be used in different manners depending on the number of dimensions of . E.
Moreover, Cantor indicated in his memoir that the concept of volume was useful to him in the defining
the number of dimensions of continuous set. In many questions, such a definition seems quite useful;
however, Cantor did not publish his work on this subject. The recent works concerning the number of
dimensions of a set, raises strong doubts about whether Cantor could have arrived at these important
results in this way.
Related to the number . f (ρ), we can consult a very interesting book: Theory of sets of points,
published since the first edition of its Lessons by Mr. and Mrs. W. H. Young.
3 That is, a set whose all numbers are included between two finite limits.
4 We can give two meanings to the two expressions . an interval containing points . and . all
points of an interval. as in the term. enclosed. (see note.19 Sect. 3.2). It does not matter whether
we adopt one or the other.
5 M. de la Vallée Poussin defined the exterior extent and the interior extent with the help of .ψ.
6 The word extent is deliberately used here; the word measure, that we often use as synonym of the
extent, would be defined later.
4.1 The Measure of the Sets 81
the frontier set of . E. Therefore, for a set to be . J measurable, it is necessary and sufficient
that its frontier set forms an integrable group.
This condition can be transformed if we note that, by definition, for an integrable group,
. A converges to zero. Therefore, an integrable group is a set of exterior extent zero or, if we
want, a . J measurable set and of extent zero. The previous method could be applied to set
formed by points of space of several dimensions only if we had studied a priori, the multiple
integrals by upper sum and lower sum. Such a study does not present difficulties, but it is
simpler to use Jordan’s method which is, in short, the proof of the existence of these integrals
in the particular case of the function .ψ.
Let us consider a bounded set of points . E in the plane, that is, the set of coordinates of
the points of . E is bounded. Such a set is entirely contained in a suitably chosen square, of
area . R. Let us divide the plane into small squares with maximum diagonal length of .λ. Let
. A be the sum of areas of the squares which contain the points of . E and . B be the sum of
the areas of those squares whose all points belong to . E. Both . A and . B are smaller than . R.
It is necessary to show that they converge to definite limits when .λ converges to zero. For
that, let us first consider a sequence of divisions . D1 , D2 , . . ., on which the corresponding
numbers . A1 , B1 , A2 , B2 , . . . , and such that the corresponding .λ values converge to zero.
Also, consider a sequence of division . j on which the corresponding numbers .α j and .β j ,
and such that the corresponding .λ j numbers converge to zero.
Let us compare . Ai and .α j . The squares in . j that contribute to .α j are of two types:
The squares .d which contain in their interior, the points of the edges of the squares of . Di
contributing to . Ai , and the others are the squares .d . The points of the squares .d form a set
which is contained in the set of points which are at a distances less than .λ j from at least one
of the points on the edges of the squares of . Di .
If only a single square of . Di of perimeter .4c contributed to the sum . Ai , this set could be
decomposed into domains whose sum of areas, in the elementary sense of the word, would
be .8cλ j + (π − 4)λ2j for .c > 2λ j ; more generally, if in . Di the sum of the perimeters of the
squares contributing to the sum . Ai is .l, the corresponding set could be divided into domains
whose sum of areas is at most .2lλ j . This value is also the maximum contribution of the
squares .d to the sum .α j .
As for the squares .d , they obviously give a contribution at most equal to . Ai , therefore
we have
.α j ≤ Ai − 2lλ j ,
and that is sufficient7 to show that .α j and . Ai converge to the same limit .A.
The number .A, whose existence has just been shown, is the exterior extent of . E, ee (E);
but this is a surface extent. This distinction is important to note, because any set of points on
a straight line has an exterior surface extent zero and could have any exterior linear extent.
Similarly, we would show that . Bi and .β j converge to the same limit .B. We can also note
that, if to the division . j and to the set of points of the square of area . R which does not
belong to . E, we associate two numbers .αj and .β j , similar to .α j and .β j , we have
αj + β j = R
.
and the existence, which we have just proved, of the limit of .αj shows the existence of the
limit of .β j . This limit is the interior surface extent of . E, ei (E).
As for the linear sets, we say that a set is . J measurable and of extent .e(E) = ee (E), if
the exterior and interior extents are equal.
If we note that the squares which contribute to the sum . A without contributing to the
sum . B are the ones that we must consider to obtain the exterior extent of the frontier of . E,
we see that the frontier of . E has an exterior extent .ee (E) − ei (E); hence, the necessary and
sufficient condition for a set to be . J measurable is deduced.
I have already used the word domain, it needs to be specified here what one means by
that.
A curve is a set of formulae
where.x(t), y(t), z(t) are continuous functions defined on a finite interval.(t0 , t1 ). The points
of the curve are those that are obtained by giving any definite value to .t; the points which
correspond only to one value of .t are called simple; the others are called multiple. If the two
points corresponding to .t0 and .t1 are identical, the curve is said to be closed; if the points .t0
and .t1 do not correspond to any other value of .t, this point is not considered multiple.
If we replace .t by a monotonically increasing or decreasing function of .θ, we obtain
a new curve which is not considered as different from the first; but two curves, to which
correspond the same set of points, could be different; it is the case of two curves, defined on
π π 2 π2
.(− , + ), by . x = sint, y = 0, z = 0, and . x =
π sin 4t , y = 0, z = 0.
2t
2 2
In case of a closed curve, we could make the transformation .θ = tt−t 0
1 −t0
and consider the
function of .θ obtained as periodic and of period .1. Then, to define the curve, it would be
sufficient to prescribe it in any interval of extent .1 and not necessarily in .(0, 1) only. Finally
we could, in this interval, replace .θ by a monotonically increasing or decreasing function of
.τ . All the curves thus obtained are considered identical.
Jordan first proved, in the second edition of his Cours d’Analyse, that a closed curve
without multiple points separates the plane into two regions8 ; we will accept this result,
8 See also the Traite d’Analyse of M. de la Vallée Poussin. In this second edition, I should mention
many other references, as the recent works on this topic and related questions are numerous. I would
content myself by referring to the article of M. Zoretti: Researches récentes sur la théorie des fonctions
(Encyclopédie des Sciences Mathématiques, . I I , I emph),and to the first volumes of the Fundamenta
Mathematicæ.
4.1 The Measure of the Sets 83
which appears intuitively so obvious that we initially have some difficulty to understand
why it needs to be proven.
The points of the interior region constitute what we call the the domain bounded by the
curve. Relative to the points of this curve, we could make two conventions: consider them
as points of the domain or not, which is generally of little importance.
The frontier of a domain is formed by the closed curve used to define it.
When the exterior and interior extents of the domain are equal, the domain is said to be
quadrable and its surface extent is called its area.9
For a domain to be quadrable, it is necessary that its frontier curves be of exterior extent
zero; such a curve is called a quadrable curve. A square is obviously quadrable.
From the definition of quadrable domains, it follows that nothing would have changed if
we had assumed that the division . j (Sect. 4.1) was a division into quadrable domains of
diameter less that .λ j .
Now, here are examples of the various circumstances which we have just envisaged.
The integrable groups provide us with the first example of the linear . J measurable sets.
In particular, the set . Z (Sect. 3.2) is of exterior extent zero. The same holds true, a fortiori,
for any set formed using the points of . Z ; all these sets are therefore . J measurable and of
extent zero. As . Z has the cardinality of the continuum, it is possible to establish a bijective
correspondence between the points of . Z and those of an interval, so that any set of points
in this interval corresponds to a set of points in . Z . Therefore, the set of . J measurable sets
has a cardinality at least equal to that of the set of point sets and, since it obviously cannot
have a higher cardinality, it has exactly that cardinality.10
Another example of a linear . J measurable set is provided by a finite number of intervals.
If we remove an integrable group from such a set, there remains a . J measurable set, and the
extent has not changed.
We easily see that the most general . J measurable set differs from a . J measurable set,
formed by a countably infinite number of intervals, only by the addition of an integrable
group .G 1 , and the subtraction of another integrable group .G 2 .11
It is also easy to mention the surface . J measurable sets. Any bounded set . Z 1 , projecting
itself on the .x-axis, as the set . Z , is a . J measurable set of surface measure zero. The sets
of exterior surface measure zero play the same role in the theory of double integrals, in the
sense of Riemann, as integrable groups on the line; we can call them the integrable groups
of the plane.
9 Moreover, some authors always use, in place of the linear extent and surface extent, the words
length and area.
10 A very important theorem on the comparison of cardinality is used here which can be found in
Note .1 of Leçons sur la théorie des fonctions of M. Borel; a proof due to M. F. Bernstein. Since this
theorem is frequently used, we may state it as follows:
If a set . E contains a set . E 1 and is contained in a set . E 2 , where . E 1 and . E 2 have the same
cardinality, then . E, E 1 and . E 2 have the same cardinality.
11 If by the points of an interval we mean the interior points to this interval, the consideration of .G
2
is futile.
84 4 Geometric Definition of the Integral
A square is a surface . J measurable set. Starting with squares and integrable groups in
the plane, we can construct any . J measurable set of plane, just as we did in case of line.
The integrable groups of the plane could be very different from the integrable groups of
the line. . Z 1 is, as . Z , a discrete set, at least when each point of . Z is the projection of only
single point of . Z 1 . In other words, we cannot move continuously from one point to another
in this set without passing through points that are not part of the set. However, an integrable
group in the plane could be a continuous set, that is, a set such that any two of its points
could be joined by a curve passing only through the points of the set. We know in fact that a
segment, a polygonal line, a circumference, an ellipse have an exterior surface extent zero.
The curves which are integrable groups are the ones that we called quadrable.
To obtain a . J non-measurable set, it is sufficient to take a set which is everywhere dense
and contains no intervals if it is a set on the line or contains no domains if it is a set in
the plane. Indeed, for such a set, the interior extent is zero, while the exterior extent is not.
Therefore, set of points with rational coordinates (or the coordinate) is not . J measurable.
P. Du Bois Reymond noticed that a set can be non-dense without being . J measurable. Let
us take a sequence of fractions.α1 , α2 , . . . , such that the infinite product. P = α1 × α2 × · · ·
is convergent and non-zero; we can take, for example, .αn = 4n4n−1
2
2 . Let us divide the interval
.(a, b) into three parts, with the middle one having a length .(b − a)(1 − α1 ), and the two
extreme ones being equal. Let us remove the middle interval and operate on the remaining
two intervals as we did on .(a, b), with .α1 being replaced by .α2 , and so on. Let . R be the
set of points remaining after all the operations. If we use the successive divisions which has
given . R for calculating the exterior extent of . R, we see that this extent is . P(b − a), which
is different from zero. However, the interior extent is zero, since . R is non-dense. Therefore
. R is not . J measurable.
12
An absolutely similar construction can be done in case of plane; we could, for example,
divide a rectangle, by two series of three parallels to its edges, in nine rectangles and remove
the interior points to that of the middle one, which we will choose in a manner that its area be
.(1 − α1 ) times that of the initial rectangle. Then we operate on each of the above rectangles
12 If we had .α = 2 , we would have the set . Z which is . J measurable, because . P would be zero.
n 3
13 PEANO, Sur une courb qui remplit toute une aire (Math. Ann., Bd . X X X V I ).—HILBERT, Ueber
die stetige Abbildung einer Linie auf ein Flachenstuck (Math. Ann., Bd . X X X V I I I ). The curve of
M. Hilbert is defined on page .23 of volume . I of the second edition of Traite d’Analyse de M.Picard.
The method of definition which will be indicated, which differs from that of MM. Peano and Hilbert,
4.2 Definition of the Integral 85
specifically for the functions .x and . y of .t defined on . Z . And this follows from the fact that,
when .t (belonging to . Z ) is sufficiently close to .θ (also belonging to . Z ), the first .2n digits
.a1 , a2 , . . . , a2n of .t, written in system of base .3, are the same as those for .θ. In other words,
the first .n digits of .x(t) and .x(θ) on the one hand, and those of . y(t) and . y(θ) on the other
hand, are the same when we write these coordinates in the system of base .2.14
Our curve indeed fills the entire square, and it even passes several times through some
points. We will easily show that it cannot be otherwise.15
What has just been done in the case of one and two dimensions can obviously be repeated
in the case of any number of dimensions.
could be used for spaces of any number of finite dimensions and similarly for spaces of countably
infinite number of dimensions (see LEBESGUE, Journal de Mathematiques, .1905).
14 Translator’s Note: The original text says ‘system of base .2,’ but this is likely a typo. Given the
earlier reference to base-.3 digits, ‘system of base .3’ seems intended.
15 This follows from the earlier work of Lüroth (Sitz. Phys. med. Soc. Erlangen, t..10); for limiting the
order of the multiplicity of singular points, see LEBESGUE, Fundamenta Mathematicæ, . I I , 1921.
We will find, in Chap. 7 (Sect. 5), an example of the application that can be made in some arguments
involving Peano and similar curves.
The Peano curve is . J measurable and of non-zero extent, but it cannot be used to bound a
domain. There exist curves without multiple points which are non-quadrable; these curves are not . J
measurable, and they can be used to bound non-quadrable domains. See W.-F. OSGOOD, A Jordan
curve of positive area. (Trans. of Amer. Math. Soc., .1903) or H. LEBESGUE, Sur le probléme des
aires. (Bull. de la Soc. math. de France, .1903).
86 4 Geometric Definition of the Integral
In particular, in the case of three dimensions, we will define the volume of a domain.
That would require, a priori, the precise definition of a closed surface and, for defining a
domain, studies similar to that of Jordan on closed curves.
Let . f (x) be a continuous and positive function, defined on a positive interval .(a, b), and
let the domain .abB A be the one which we have attached to it (Fig. 2.1). Let us find if this
domain is quadrable. For that, let us divide .(a, b) into partial intervals .δ1 , δ2 , . . . , δ p . The
largest rectangle, of base .δi and all the points of which belong to the domain .abB A, has a
height equal to the lower limit .li of . f in .δi . The smallest rectangle, of base .δi and which
contains all the points of domain which project onto .δi , has a height equal to the upper limit
. L i of . f in .δi .
converge to definite limits which are the interior and exterior extents of domain, when the
maximum of .δ converges to zero. However, . S − S converges to zero, because the continuous
functions are of mean oscillation zero; the domain .abB A is therefore quadrable.
If we use the initial method, if we call the definite integral of . f in.(a, b) the area of.abB A,
we arrive at Cauchy’s integral. There is only a difference of form, between this definition
and that of Cauchy.
In case where . f (x) is not always positive, the curve . AB meets the .x-axis finite or infinite
number of times and we have two types of domains, the ones above the .ox, the others below.
Each of these domains is quadrable as explained earlier.
The sum of the areas of those which are above .ox, minus the sum of the areas of those
which are below, is by definition, the integral of . f (x).16
Let us now consider any function . f (x), defined on the positive interval .(a, b). Let . E( f )
be the set of points whose two coordinates are linked by only the condition that . y cannot be
exterior to the positive or negative interval .[0, f (x)]. In other words, we have
2
. y f (x) ≥ 0 and 0 ≤ y 2 ≤ f (x)
The.x-axis partitions this set into two other sets: the points situated above.ox form. E 1 [ f (x)],
and those which are below .ox form . E 2 [ f (x)]. Regarding the points situated on .ox, we put
them indifferently either in . E 1 or in . E 2 , that is not of much importance for what follows,
because they form an integrable group of the plane.
16 The two sums or the series which appear in this definition indeed exist, since the set of all domains
can be enclosed in a circle of finite radius.
4.2 Definition of the Integral 87
By analogy with the previous definition, it is natural to call integral of . f the difference
. I = ei [E 1 ( f )] − ee [E 2 ( f )], I = ee [E 1 ( f )] − ei [E 2 ( f )].
We will calculate these two limits of indetermination and for that we first assume that . f
is never negative, that is, . E 2 does not contain any points. The calculation of interior and
exterior extents of . E (or . E 1 ) is done as in the case where . f is continuous, that is, these
extents are the limits of numbers . S and . S. The extents are therefore the integrals by upper
sum and lower sum of . f .
To study the general case let us set . f = f 1 − f 2 , where . f 1 is equal to . f when . f is
positive or zero, and equal to zero when . f is negative. We then have, obviously,
ei [E 1 ( f )] =
. f 1 d x, ee [E 1 ( f )] = f 1 d x,
ei [E 2 ( f )] = f 2 d x, ee [E 2 ( f )] = f 2 d x,
therefore
. I = f1 d x + − f 2 d x, I = f1 d x + − f 2 d x.
It is, in general, impossible to replace sums of upper or lower integrals with the upper or
lower integrals of the sum, because the maximum of a sum is, in general, less than the sum
of the maxima of the terms of the sum, while the minimum is, generally, greater than the sum
of the minima. But here, in any interval, the maximum (or the minimum) of . f = f 1 − f 2 is
indeed the sum of the maximums (or of minimums) of . f 1 and .− f 2 . We can therefore write
. I = f d x, I = f d x.
We have thus found the Darboux integrals and we have their geometric significance.
Let us note that . E( f ) is . J measurable when both . E 1 and . E 2 are. And conversely, if . E( f )
is . J measurable, then so are . E 1 and . E 2 . Thus, our geometric definition of the integral is
applicable when . E is . J measurable. However, in this case, and only in this case, . I and . I are
equal, that is, their integrals . f d x and . f d x are equal, and therefore:
88 4 Geometric Definition of the Integral
The geometric definition of integral is completely equivalent to the analytic definition given
by Riemann.
The Functions of Bounded Variation
5
a = a0 ≤ a1 ≤ a2 ≤ · · · ≤ an = b;
.
the sum
is what we call the variation of . f (x) for the system of points .a0 , a1 , · · · , an . If, for any
system of points of division,.ν is bounded, the function is said to be of of finite total variation
or, simply, of bounded variation. Finite or infinite total variation is, by definition, the greatest
limit of .ν, when the maximum .λ of the length of partial intervals used, tends to zero. It is
to be noted that if we add new points between the chosen points of division, we increase .ν
or, at least, we do not decrease it; thus by indefinitely adding new points, in a manner such
that .λ tends to zero, we have a sequence of numbers .ν tending to a limit, whether finite or
1 It is moreover obvious that an unbounded function cannot satisfy the following definitions.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 89
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_5
90 5 The Functions of Bounded Variation
infinite, which is at least equal to a number .ν from where we started. We can therefore say
that the total variation of . f is the upper limit of the set of numbers .ν.2
We also see very simply that, in the previous definition, we can replace .ν by
o = ω1 + ω2 + · · · + ωn ,
.
2 And no more than the limit superior of the limit of indetermination of the numbers .ν.
5.1 The Functions of Bounded Variation 91
and let . p be the sum of those increments . f (ai ) − f (ai−1 ) which are positive and .−n the
sum of those which are negative. We obviously have
ν = p + n,
. f (b) − f (a) = p − n,
hence
ν = 2 p + f (a) − f (b), ν = 2n + f (b) − f (a),
.
p is the positive variation for the chosen division, .n the negative variation. The last two
.
equalities show that the limit superior .V , P, N of .v, p, n that we call total variation, total
positive variation, total negative variation, are linked by the same relations as that of.v, p, n.
This being said, let .V (x), P(x), N (x) be the three total variations in .(a, x), (x > a), we
have
. f (x) = f (a) + P(x) − N (x).
But. P(x) and. N (x) cannot decrease when.x increases, therefore the statement of the theorem
is proved.
We have, moreover,
. V (x) = P(x) + N (x).
Hence it follows that, if one of the three variations is finite, so are the other two, since we
have the two relations between the variation and the increments . f (b) − f (a)
. V = P + N, f (b) − f (a) = P − N .
A function of bounded variation could be put in the form of a difference of two increasing
functions in an infinite number of ways. If we add the same non-decreasing function .λ(x)
to . P(x) and . N (x), we obtain two non-decreasing functions . P1 (x) and . N1 (x) such that we
have
. f (x) = f (a) + P1 (x) − N1 (x).
We easily see that the most general non-decreasing functions . P1 and . N1 satisfying this
equality are the ones which we have just constructed; so . P(x) and . N (x), among all the
functions, . P1 (x) and . N1 (x), which are non-negative and non-decreasing, and satisfy the
previous equality, are the smallest.
To calculate the total variation of a discontinuous function as the limit of a sequence of
variations .ν, it is necessary to choose the division points in a very particular manner; for
92 5 The Functions of Bounded Variation
example, for a function which is zero everywhere, except at the origin, it is necessary that
the origin be a point of division. For the continuous functions, on the contrary, we have this
property: the variation of a continuous function, relative to an arbitrary division, uniformly
tends to the total variation of this function when the maximum .λ of the lengths of the intervals
used tends to zero.
Indeed, let two sequences of divisions be . D1 , D2 . · · · , 1 , 2 , · · · for which .λ tends to
zero, and let .λ j be the value of .λ for . j . The maximum of the oscillation of . f (x) in an
interval of extent .λ j is a number .ε j which tends to zero as .λ j does. Let us compare the
variations .νi , ν j , relative to . Di and . j .
The intervals of . j are always partitioned in two classes, let .d be those which do not
contain any points of division of . Di . Let us consider all those .d which are between .xi and
. x i+1 , they cover an interval whose origin . Oi is between . x i and . x i + λ j and whose extremity
. E i is between . x i+1 − λ j and . x i+1 . The values of . f (x) for this origin and extremity differ
from the numbers . f (xi ), f (xi+1 ) by .ε j at most. The contribution to .ν j of the considered
intervals is therefore at least
ν j ≥ νi − 2nε j ,
.
and one of the limits of .ν j is at least equal to one of the limits of .νi . However, we can
permute .ν j and .νi , therefore .ν j and .νi tend to the same definite limit.
The stated proposition is thus proved; and to make it clear, it is convenient to state it
as follows: the variation .ν of a continuous function . f (x), for a division of the considered
interval into partial intervals of length smaller than .λ, differs from the total variation .V of
. f (x) by at most an infinitely small .θ(λ) if . V is finite, and if . V is infinite, .ν is greater than
because .V (x) is increasing; and, since . f (x0 ) − f (xn ) tends to zero when the maximum
of .xi+1 − xi tends to zero, the value .V (x0 ) is at most equal to .V (x0 − 0). But .V (x) is an
increasing function, therefore we have
. V (b) − V (x).
Since, . P(x) and . N (x) are continuous functions, any continuous function of bounded
variation is the difference of two continuous non-decreasing functions.
The variation .ν, for the division . D, has been defined only in case where . D contains only
a finite number of intervals; in the following, it is useful to study a case where . D contains
an infinite number of intervals. This is the case where the points of division of . D form a
closed reducible set . E; then we call variation .u, for this division, the sum of the series
. | f (xi ) − f (yi )|, extended over all the intervals .(yi , xi ) contiguous3 to . E.
We can compare the set of variations .u which has just been defined with the set of
variations .ν defined previously.4
The set of .u contains the set of .ν, when we vary the reducible set . E; therefore, the limit
superior of the set .u is at least equal to the limit superior of the set .ν. It is sufficient to show
that .u is always smaller than the total variation for which it is to be proved that the limit
superior of .u is the total variation .V .
Let .(α, β) be an interval contiguous to . E . Let .α1 and .β1 be two points of . E belonging
to .(α, β); the contribution of the subset of . E in .(α1 , β1 ) for the calculation of .u is at most
3 An interval.(y , x ) is said contiguous to a set. E if it does not contain points of. E and if its extremities
i i
belong either to . E or to . E . The proof of the contiguous intervals is due to M. R. Baire.
4 Because we have not yet proved that the series defining .u is convergent, it does not exclude the case
that the variation .u has a value .+∞.
94 5 The Functions of Bounded Variation
equal to the contribution which it provides in .V , since . E contains only a finite number of
points in .(α1 , β1 ). Let points .α1 and .β1 tend to .α and .β, the proposition remains true and
we find that .(α, β) furnishes in .V a contribution at least equal to the one it gave in .u. If, in
.(α, β), there were no points of . E close to .α, it would be necessary to take .α1 = α and we
would operate in a similar way if, in .(α, β), there were no points of . E close to .β.
We could similarly prove that the proposition is true in an interval contiguous to . E or
. E , · · · ; but one of the derived sets of . E being empty in .(a, b), the proposition is true for
.(a, b).
terms, these two sums tend towards . P and .−N when .λ tends towards zero.
It is important to note that we cannot replace the reducible set . E by an arbitrary non
dense set without some of its previously mentioned properties ceasing to hold. Let, in fact,
the function .ξ(x) be defined by
a1 a2 a3
2ξ(x) =
. + 2 + 3 + ··· ,
2 2 2
5 The notation .[ f (x ) − f (y )] assumes that the contiguous intervals have been numbered. This
i i
is possible in infinitely many ways, but none is preferable. So that the terms of our series are not
actually arranged in a specific order; the series can, therefore, be convergent only if it is absolutely
convergent.
We will encounter this fact in the sequel for all series of numbers associated with the intervals
contiguous to a set.
5.1 The Functions of Bounded Variation 95
when
a1 a2 a3
. x= + 2 + 3 + ··· ,
3 3 3
where the numbers .a are equal to .0 or .2. Then .x belongs to the set . Z . We immediately
verify that, for the two extremities of an interval contiguous to . Z , .ξ takes the same value;
we require .ξ to remain constant in an interval. .ξ(x) is now defined everywhere; it is a non
decreasing function and yet, we find zero for .u, if the points of . Z are among the points of
division used.
We have seen that the finite or infinite total variation of a function is the limit superior of
the numbers .ν for this function, and therefore also of its corresponding numbers .u. Let us
show that, if the total variation of . f (x) is infinite, there is a reducible set . E for which the
corresponding numbers .u are infinite.
Let us suppose that there is an interval .(a, b) and let .x increases from .a to .b. If, from .a to
. x 0 , . f (x) is of bounded variation, . f (x) is a fortiori of bounded variation from .a to . x < x 0 ;
therefore, when .x varies in .(a, b), x attains a value .ξ which is either the first for which the
total variation from .a to .ξ is infinite, or the last for which this variation is finite; .ξ could
moreover be identified with .a, or .b.
In the first envisaged case . f (x) would have an unbounded total variation in any interval
.(ξ − h, ξ); in the second case its total variation would be infinite in any interval .(ξ, ξ + h).
We could choose in .(ξ, b) the points .b > b1 > b2 > · · · > b p > ξ, such that the variation
.ν in .(ξ, b) for this system of points exceeds .o + 1, o being the oscillation of . f (x) in .(a, b).
We therefore have
therefore
Let us choose in .(ξ, b p ) the points .b p > b p+1 > · · · > b p+q > ξ, such that the variation
.ν in .(ξ, b p ) calculated with the help of these points exceed .o + 1; since, in .(ξ, b p+q ) the
points
.b p+q > b p+q+1 > · · · > b p+q+r > ξ,
such that they give for variation .ν a value greater than .o + 1 in .(ξ, b p+q ), and so on.
It is clear that the set . E formed from point .ξ and this infinite sequence .b1 , b2 , · · · answers
the question, since the series . [ f (bi ) − f (bi+1 )] is divergent.
I would conclude by giving some examples of various specialities which have been
mentioned.
The function .x sin x1 is equal to .(−1) K +1 K π−1
π for . x =
1
K π− π
, therefore, if we use these
2 2
values of .x for calculating .u in the interval .(0, π1 ), we find
96 5 The Functions of Bounded Variation
1 2 2
u=
.
π + π + π + ··· ,
Kπ − 2 2K π − 2 3K π − 2
certain set . E 1 .
. f 2 (x) is a continuous function which vanishes at the points of . E 1 and which, in the
. f 2 (x) has the same total variation as that of . f 1 (x) because, in .(α, β), the total variation of
β−α
. f 2 (x) is .
2 V.
The function . f 1 + 212 f 2 has an infinite number of maxima and minima, in each interval
.(α, β). Indeed, if . f 1 = a, it is of unbounded variation in .(α, β) and this implies the con-
sequence that it has an infinite number of maxima and minima. It is because, if a function
has only a finite number of maxima and minima, it is sufficient to calculate the number .ν
relative to a division .x1 , x2 , · · · , x p at which appears the abscissa of all the maxima and
minima, for calculating the total variation .V , which is therefore finite. If . f 1 = b, . f 1 has a
bounded derivative in .(α, β), while the derivative of . f 2 takes all the positive and negative
5.1 The Functions of Bounded Variation 97
values, hence again the existence of an infinite number of maxima and minima. Let . E 2 be
the set of values of .x for which . f 1 + 212 f 2 is maximum or minimum.
By operating on . E 1 + E 2 , as well as on . E 1 , we will form . f 3 , resulting in . f 1 + 212 f 2 +
1
f and . E 3 .6
32 3
By continuing so, we define the different terms of the series
1 1
. f (x) = f 1 + 2
f2 + 2 f3 + · · ·
2 3
which is uniformly convergent, because .| f i | is smaller than .1.
The continuous function . f (x) has a maxima and minima in every interval. In an arbitrary
interval .(l, m), indeed, provided that .n is large enough, there are more than two points in
. E n . Let us suppose that there are three consecutive points .r , s, t in . E n , with . f equal to the
sum .sn = f 1 + 212 f 2 + 312 f 3 + · · · + n12 f n for these three points, . f will have a maximum
or a minimum, at least, between .r and .t, depending on whether .s corresponds to a maximum
or a minimum.
The function . f admits all the maxima and minima of .sn , therefore . f is of unbounded
variation in any interval if . f 1 = a. On contrary if . f 1 = b, the total variation of .sn being
finite and smaller than .V (1 + 212 + · · · + n12 ), . f is of bounded variation in any interval (see
Sect. 5.1).
Let us focus now on the discontinuous functions of bounded variation.
Here is a property of singular points, which is easy to prove directly, and which follows
immediately from the construction of the most general function of bounded variation starting
from two increasing functions: all points of discontinuity of a function of bounded variation
are of first kind.
Let .x0 be a point of discontinuity; the quantity
is the jump at point .x0 , This being said, let us consider the jump function of . f (x)
6 To be absolutely rigorous, it would be necessary to show that the sum of the lengths of the intervals
contiguous to . E 1 + E 2 , intervals which play the role of .(α, β), is equal to .2, like the sum of the
differences .β − α. It is almost obvious and follows, if we want, from the fact that . E 1 + E 2 is of
exterior extent zero.
98 5 The Functions of Bounded Variation
.ϕ(x) = sd (xi ) + sg (xi ),
a≤xi <x a<xi ≤x
where each of the series contains all the .xi which satisfy the inequality placed below the
corresponding sign . . We easily see that these two series are absolutely convergent and
that, if we set
. f (x) = ϕ(x) + ψ(x),
.ψ(x) is the continuous function of bounded variation; the total variation of . f being the sum
of those of .ϕ and .ψ.
The most general discontinuous function which is of bounded variation is therefore
obtained either by taking the difference of two discontinuous increasing functions or by
adding to a continuous function of bounded variation a jump function .ϕ(x). This second
method shows that we could construct functions with bounded variation by choosing at will
the countable set of points of discontinuity, and even the jumps to the right and to the left
.sd and .sg , provided that the series . sd (x), sg (x) are absolutely convergent.
For example, the set of points of discontinuity could be the set of rational numbers, the
jumps being, when .x is written . ab in the irreducible form,
1 1
. sd = (−1)a , sg = (−1)b 3 3 .
a 2 b2 a b
Let us show that, for calculating the total variation of a discontinuous function . f , it is
sufficient to take the limit of numbers .ν provided by a sequence of divisions . D1 , D2 , · · ·
in intervals of length tending to zero and such that any point of discontinuity of . f being
a point of division of . Di for a certain value of .i. It is sufficient to prove this for the jump
functions .ϕ. Now, if .ξ is a point of discontinuity of .ϕ and belongs to . Di , Di+1 , · · · , there
exist in division . Di+ p an interval .(ξ, ξ + h p ) of origin .ξ, whose contribution to .νi+ p tends
to .|sd (ξ)| when . p increases indefinitely. Thus, in .νn , there are terms which tend to the first
. K terms of . |sd (xi )| + |sg (xi )|, for .n increasing to infinity. Therefore, the limit of .νn
cannot be smaller than the total variation of .ϕ, but since it cannot be greater either, the .νn
does indeed tend to the total variation of .ϕ.
We can also use the divisions . Di satisfying the indicated conditions but obtained with
the help of reducible sets of points and no longer only with the help of a finite number of
points. It is only necessary to define with precision what we mean by the number .u; this
would be, if we denote by .δi = (ai , bi ) the different contiguous intervals to the set . E of
division points and by .x1 , x2 , · · · the points of this set,
.u = | f (bi − 0) − f (ai + 0)| + |sg (xi )| + |sd (xi )|.
It is clear that the first term tends to the total variation of the continuous part .ψ of . f
and the second term ends up with containing the absolute value of all the jumps of . f , thus
providing the total variation of .ϕ.
5.2 The Rectifiable Curves 99
When at any point . f (x) is contained between . f (x + 0) and . f (x − 0) we can take for .u
the quantity .u
.u = | f (bi − 0) − f (ai + 0)| + | f (xi + 0) − f (xi − 0)|;
but, in the general case, this quantity would give a very small limit. When . f is of bounded
variation, the series constituting .u is convergent, therefore we could arrange, as per our
choice, the terms of
. [ f (bi − 0) − f (ai + 0)] + [ f (xi + 0) − f (xi − 0)].
Let us group those provided by .δi and the .xi interior to an interval .δ contiguous to the
derived set . E of . E. It is clear that the sum of these terms is . [ f (β − 0) − f (α + 0)] if .δ
is the interval .(α , β ). Therefore, by such groupings, we transform the sum related to . E into
a similar sum related to . E ; then into the sum related to . E , etc. And finally we conclude
that, if .(a, b) is the considered interval, we have, for . f of bounded variation,
. f (b) − f (a) = [ f (bi − 0) − f (ai + 0)] + [ f (xi + 0) − f (xi − 0)].
Let us consider a polygon . P inscribed in this curve, with its vertices, in the order they appear
on . P, corresponding to increasing values of .t,7 .a, α1 , α2 , · · · , α p , b. We can consider . P as
a curve defined on .(a, b) with the help of functions .ξ(t), η(t), ζ(t) equal to .x = x(t), y =
y(t), z = z(t) for the values .a, t1 , t2 , · · · , t p , b of .t.
This being said, let two sequences of polygons inscribed in .C, . Pi and . j , be chosen
such that the maximum of the differences .tk − tk−1 tends to zero as . 1i on one hand, and as . 1j
on the other hand. The length of a polygon is, by definition, the sum of the lengths of these
edges; we would compare the length .si of . Pi with that of .σ j of . j .
Let us suppose that the two consecutive vertices .m 1 , m 2 of . Pi correspond to .t = θ1 and
.t = θ2 . The points .μ1 , μ2 of . j which correspond to these values of .t tend to .m 1 , m 2 , when
. j increases to infinity. The smallest limit, for . j infinite, of the arc length .μ1 μ2 is therefore
at least equal to the length of the edge .m 1 m 2 . But, this being true for each edge, the smallest
limit of .σ j is at least equal to .si . And since we could permute . Pi and . j , the lengths .si and
.σ j tend to the same limit when .i and . j increase to infinity, and they are always smaller than
their limit.
7 When we speak of a polygon inscribed in a curve, we always assume this last condition satisfied.
100 5 The Functions of Bounded Variation
When the maximum length of the edge of a polygon inscribed in a curve tends towards
zero, the length of this polygon tends towards the upper limit of the lengths of polygons
inscribed in the curve. It is this limit that we call the length of the curve.
A curve is said to be rectifiable if it is of finite length. The study of the rectifiable
curves was initiated by Ludwig Scheeffer,8 then continued by Jordan9 to whom we owe the
following result:
For a curve to be rectifiable, it is necessary and sufficient, that the functions.x(t), y(t), z(t)
which define it be of bounded variation.
Indeed, any edge of a polygon inscribed in the curve has a length at least equal to
each of the projections .δx , δ y , δz of this edge onto the axes, and of length at most equal to
.δ x + δ y + δz . But the sum of the projections .δ x is the variation .ν x of the function . x(t) for the
values of .t corresponding to the vertices.10 The length of the polygon is, therefore, greater
than .νx ; it is, similarly, greater than .ν y or .νz , but it is less than .νx + ν y + νz ; the property
is proved.
Moreover the arc length from .t0 to .t (t > t0 ) of a rectifiable curve is a continuous non-
decreasing function of .t, since the increment of this arc, in an arbitrary interval, is contained
between the increments of .νx and .νx + ν y + νz .
To calculate the length of a curve, we could use the polygons having an infinite vertices
corresponding to the values of .t forming a reducible set; because the initial reasoning applies
to these polygons.
A rectifiable plane curve is quadrable, because if we divide it into .n pieces of equal
, and the sum . πs
2
lengths of . ns , each of them can be enclosed in a circle of radius . 2n s
4n of areas
of these circles tend to zero as . n1 .
Let us suppose that .x(t), y(t), z(t) have the integrable derivatives; then .|x (t)|, |y (t)|,
|z (t)| are also integrable, because we can write
√
. x 2 = u, |x | = + u,
and if we square or if we take the arithmetic square root of an integrable function, we still
have an integrable function.
If .l, m, n, L, M, N are the lower and upper limits of .|x |, |y |, |z | in an interval .(t1 , t2 ),
the sums such as . (t2 − t1 )(L − l), extended to any division of .(a, b) into partial intervals,
tends to zero when the intervals used tend to zero.
The chord .(t1 , t2 ) has a length .δ which satisfies the inequality
.a = (t2 − t1 ) l 2 + m 2 + n 2 ≤ δ ≤ (t2 − t1 ) L 2 + M 2 + N 2 = A.
Therefore, an inscribed polygon has a length contained between the corresponding sums
. a, A. If the length of the edges of polygon tends to zero, . a and . A tend to the
same limit; because we have
. A− a≤ (t2 − t1 ) · (L − l) + (t2 − t1 ) · (M − m)
+ (t2 − t1 ) · (N − n).
b
The limit of . a and . A is the length of the curve. But, since the integral . a
x 2 + y 2 + z 2 dt, which exists according to our hypothesis, is also always contained
between . a and . A, we can conclude that, if .x , . y , .z exist and are integrable, the arc
length over .(a, b) is
b
. x 2 + y 2 + z 2 dt.
a
The previous reasoning also shows that if . f (x) exists without being integrable, and we will
see that it is possible, the length of the curve . y = f (x) is included between the integrals by
lower and upper sums of . 1 + f 2 .
We will obtain the generalisation ofthis proposition to the curves .x(t), . y(t), .z(t), as well
as a result related to the case where . x 2 + y 2 + z 2 is a derivative, using the following
considerations.
We suppose that .x , y , z exist; then, from point .x0 , y0 , z 0 , t0 , whatever it may be, as the
origin, we can draw a chord whose length . δx02 + δ y02 + δz 02 differs at most by .εδt0 , from
the quantity .δt0 x02 + y02 + z 02 ; and we could even subject .δt0 to be smaller than a certain
predetermined quantity .λ.
The curve being defined on .(a, b), from point .a = t1 as origin, we could draw a chord
satisfying the indicated conditions; it corresponds to .(t1 , t2 ). Form .t2 we could draw a new
chord which corresponds to .(t2 , t3 ) and so on. If, after a finite number of operations, we
arrive at .b, the construction is thus achieved. Otherwise, .tn has a limit point .tω starting from
which, and taking it as origin, we can draw a chord .(tω , tω+1 ), then from .tω+1 we draw
.(tω+1 , tω+2 ) and so on. If we do not attain .b, we approach a limit point .t2ω , starting from
. I = (tα+1 − tα ) x 2 (tα ) + y 2 (tα ) + z 2 (tα ),
by at most
ε
. (tα+1 − tα ) = ε(b − a)
If .ε and .λ simultaneously tend towards zero, .ε(b − a) tends to zero, the sum of the lengths
.s of the curve, . I therefore tends to .s. But, according to the form
of chords tend to the length
of . I , we can write, if . x 2 + y 2 + z 2 is bounded,
b b
. x 2 + y 2 + z 2 dt ≤ s ≤ x 2 + y 2 + z 2 dt.
a a
Let us now suppose that . x 2 + y 2 + z 2 , bounded or not, be the derivative of a function
.σ(t). If we have chosen each interval .(tα , tα+1 ) in a manner which satisfies, not only the
. I tends to the increment .σ(b) − σ(a) of .σ(t) in .(a, b) when .ε and .λ simultaneously tend
towards zero. We therefore have
.s = σ(b) − σ(a).
constructed, by this method, an interval .(t1 , t2 ) with origin .t1 = a, then an interval .(t2 , t3 )
with origin .t2 and so on, and if necessary, an interval .(tω , tω+1 ) whose origin is the limit of
.t1 , t2 , · · · , and so on. It has been proved that in this way we necessarily arrive at .b after a
finite or countably infinite number of operations, so that the constructed chain will indeed
cover the entire .(a, b).11
11 When the given method associates multiple intervals with the same origin .t , one must choose
α
among these intervals the one that will be designated as.(tα , tα+1 ). This choice can be made arbitrarily
if the need to choose arises only a finite number of times. If it arises infinitely many times, to avoid
the difficulties that arise from the expression "choose infinitely many times," it is better to eliminate
the choice by indicating the rule according to which .(tα , tα+1 ) will be determined among all possible
intervals. In the previous proof each interval .(tα , tα+1 ) can be required to be the largest one which
satisfies the imposed conditions; moreover, within the set of these intervals, there is indeed one
interval larger than all the others.
In the final Note we find a detailed study of use of these chains of intervals.
The Search for Primitive Functions
6
Let . f (x) be a bounded integrable function defined on .(a, b); the function
x
. F(x) = f (x) d x + K
a
may be a derivative different from . f (α), which is the case for .α = 0 when . f (x) is zero
everywhere except for .x = 0; there may not be any derivative, which is the case for .α = 0
when . f (x) = cos L|x| for .x = 0 and . f (0) = 0.2
Thus, integration can lead to functions that do not have a derivative everywhere. This
consequence was indicated by Riemann, who drew attention to the indefinite integral of the
function
(nx)
. f (x) = .
n2
1 I leave it to the reader to show that the total variation of . F(x) on .(a, b) is exactly equal to
b
.|
a | f (x)| d x|. This proposition had also been proved incidentally (Sect. 5.1) as a particular case
of the result related to the length of a curve .x(t), y(t), z(t), when .x (t), y (t), z (t) are integrable.
Indeed, it is enough to consider the curve .x(t) ≡ f (t), y(t) ≡ z(t) ≡ 0.
2 Then the indefinite integral is . x (sin L|x| + cos L|x|).
2
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 103
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_6
104 6 The Search for Primitive Functions
This3 indefinite integral . F(x) admits as its derivative . f (x) when .x is not of the form
2 p+1
.
2n .
2 p+1
Let us suppose .α = 2n and .β increases towards .α. We just saw that . f (β) tends to
π2
. f (α) +
16n 2
. Therefore, according to the mean value theorem, the same holds true for
F(β)−F(α)
.
β−α .
π2
On the contrary, this ratio would tend to . f (α) − 16n 2
if .β decreases towards .α; therefore
F(x) does not have the derivative for the values of the form . 2 p+1
.
2n .
This is the first known example of a function for which it is not clearly legitimate to say
that it generally√admits a derivative. Many functions were known, such as Cauchy’s function,
for example .+ x 2 , which did not have derivatives at certain points. However, these points
were exceptional, they never formed an everywhere dense set. In Riemann’s example, on the
contrary, there are points without a derivative in every interval. The principle of condensation
of singularities will give us as many examples as we want, of functions similar to that of
cos L|x−a p |
Riemann; if the numbers .a p are all rational numbers, . p2
d x is one of these
functions.
The integration produces functions which do not always have a derivative. By an entirely
different method, Weierstrass constructed a function that is nowhere differentiable4 . It is
obvious that the integration cannot yield such functions: The points at which an indefinite
integral does not have a derivative form a set of measure zero, since these points belong to
the set of points of discontinuity of the integrated function . f (x). Thus, the points without
derivatives are still, in some sense, exceptional.
When a function . f (x) is bounded, but not integrable, we can attach to it the two indefinite
integrals by upper and lower sums
x x
. F(x) = f (x) d x + K , F(x) = f (x) d x + K .
a a
These two functions are continuous, of bounded variation, and admits . f as its derivative at
all points where . f is continuous.5
The concept of an indefinite integral is related to an important generalisation of the
definite integral.
If a function . f (x) defined on .(a, b) is non-integrable over .(a, b), but integrable over any
interval .(α, β) interior to .(a, b), we can hope to define an integral of . f (x) over .(a, b) by
assuming, in principle, the continuity of the indefinite integral and by applying the Cauchy’s
methods.
We easily see that the assumed conditions are never realised if. f (x) is bounded. However,
if . f (x) is unbounded, Cauchy’s method can lead us to a definite number; it will be so in
particular, around .a and .b, .| f (x)| is less than a function of definite infinitesimal order at
infinity, less than .1.6
Regarding Riemann integration, all reasoning applied about Cauchy integration and the
Cauchy–Dirichlet procedures can be repeated; I will not dwell on this point.7
The integration applies to the functions that are not derivative function. A function zero
everywhere, except at .x = 0, is not a derivative function, since its primitive function, if it
exists, would be continuous, constant for positive and negative .x, therefore always constant
and however its derivative would not be zero for .x = 0. This shows that the notions of
indefinite integral and primitive function are different.
It seems that it has long been admitted that the first of these notions includes the second,
and as a result, the integration always allows to resolve the problem of the search for primitive
functions. In any case, instead of dealing with this problem, we have study what services
could the integration provide in solving the problems, generalisations in the various senses
of the problem of primitive functions.
For the study of this problem it will be useful for us to know a few properties of derivative
numbers.
Let . f (x) be a continuous function,8 let us consider the ratio
f (x0 + h) − f (x0 )
r [ f (x), x0 , x0 + h] =
. ;
h
and let.h tend to.0. If we subject.h to take only the negative values, the smallest and the largest
of the limits of the ratio are two derivative numbers to the left at point.x0 . These two numbers,
which have been defined and studied by P. Du Bois Reymond and Dini, are still called the
anterior oscillatory extremes. The smallest limit is the left-hand inferior derivative number,
6 In a more general sense, we can apply all standard theorems given related to the existence of an
integral when the quantity placed under the integration sign becomes infinite at a point.
7 These questions are related to a generalisation of the integral given by Jordan in Volume . I I of the
second edition of his Cours d’Analyse. If generalisations in the text allow us to define the integral of
.f (x) in any interval contiguous to a closed set . E, Jordan calls the integral of . f (x) the sum of the
integrals over the intervals contiguous to . E. For the integral of a sum to be the sum of the integrals, it
is necessary to add that the outer extent of . E must be zero. These questions are connected to the work
of Harnack (Math. Ann., Bd . X X I , X X I V ), Holder (Math. Ann., Bd . X X I V ), de la Vallée Poussin
(J. de Liouville, serie .4, vol. .V I I I ), Stolz (Wiener Berichte, Bd .C V I I ), Moore (Trans. Amer. Math.
Soc., vol. . I I ).
8 We can also consider the case of discontinuous functions, but the definitions of the text would be
sufficient for us.
106 6 The Search for Primitive Functions
the greatest limit is the left-hand superior derivative number. By giving to .h the positive
values, we define the two right-hand derivative numbers or posterior extreme oscillators.
These four numbers, which are not necessarily finite, are denoted
λg , g , λd , d ;
.
If we want to emphasise the function. f and the value.x0 in question, we write.λg f (x0 ), g f (x0 ).9
The geometric significance of these numbers is simple. Let . y = f (x) be a curve, let us
consider an arc . AB of this curve corresponding to the interval .(x0 , x0 + h); let us assume it
to be positive. All the straight lines joining . A to any point of . AB are all the lines of a certain
angle . X AY . Let .h tend to zero, the angle . X AY varies in such a manner that, for the value
of .h, it contains all the angles corresponding to the values less than .h.
This would be enough for us to conclude the existence of the right limits .ξ A, η A for
. X A and .Y A. The angular coefficients of these two right limits are the derivative numbers to
right.
We could make figure for the curve . y = x sin x1 ; for .x = 0 the two inferior derivative
numbers are .−1 and the two superior derivative numbers are .+1. For this curve the angle
. X AY is fixed. On the contrary, it is true for the function
1 1
. y = x sin + x 2 sin ,
x x
which admit the same derivative numbers as the previous one for .x = 0. The derivative
numbers could replace, in some studies, the ordinary derivatives. In the study of the variation
of a function for example: if all the four derivative numbers are positive, the function is
increasing; if the two posterior derivative numbers are positive and the two anterior numbers
negative, the function admits a minimum for.x = x0 ; if the two right-hand derivative numbers
are of opposite signs, the function is neither increasing nor decreasing to the right of .x = x0 ,
but if one of the two is zero we cannot say anything anymore.
When .d = λd , we say that the function admits a right-hand derivative equal to .d ; if
.g = λg , the value of .g is the left-hand derivative.
. l ≤ r [F(x), α, β] ≤ L,
if . F is one of the three indefinite integrals and if .l and . L are the lower and upper limits of
.f in .(α, β); we can even suppose that .α is excluded from the interval .(α, β).
If .β tends towards .α by values less than .α, we see that the left-hand superior derivative
number at .x = α of one of the indefinite integrals of a bounded function . f (x) is at most
equal to the limit superior of . f (x) to the left of .α and the left-hand inferior derivative
number of . f (x)11 is at most equal to the limit inferior of . f (x), to the left of .α. Here, we
are talking about limit inferior and limit superior at .α, excluding the point .α (see Sect. 3.1).
Let us suppose that . f (α − 0) exists, then the two limits of . f (x) to the left of .α are
. f (α − 0), therefore; when . f (α − 0) exists, one of any indefinite integrals of the bounded
There exists a proposition for the derivative numbers similar to the theorem of finite
increment12 :
If . L and .l are the upper and lower limits of any one of the four derivative numbers of
the function . f (x) in .(a, b), we have
.l ≤ r [ f (x), a, b] ≤ L.
I suppose that .l and . L are related to .d ; the other cases reduce to this one, because we
obviously have:
We, therefore, assume that .l and . L are limits of .d ; to demonstrate the second inequality,
it suffices to prove that there exist values of .d at least equal to
11 Translator’s Note: The original text states ‘left-hand inferior derivative number of. f (x)’. However,
given the context and standard mathematical conventions, M. Lebesgue may have intended to refer
to the ‘left-hand inferior derivative number of one of the indefinite integrals of . f (x)’ ensuring con-
sistency with the earlier reference to the ‘left-hand superior derivative number of one of the indefinite
integrals of . f (x)’.
12 We know that this theorem is stated as follows:
If a function . f (x) is continuous in the interval .(a, b), and admits a well-determined derivative for
each value of .x interior to .(a, b), there exists a number .ξ in this interval such that
This statement does not assume that . f (x) to be bounded or even finite, but if . f (x) is infinite, this
must be .+∞, and not .±∞.
108 6 The Search for Primitive Functions
.r [ f (x), a, b].
For this, I adopt the geometric language because it appears to me more expressive; we can
translate it easily, if we want, in analytic language. The property is obvious if the curve .C
which . f (x) represents reduces itself to the chord . AB joining its extremities (Fig. 6.1).
If this is not the case and if there exist points of the curve .C above . AB (i.e. on the edge
of . y = +∞), I displace the line . AB parallel to itself at . A B in a manner that it cuts .C.
Above. A B there are arcs of.C, let. P Q be one of them. At point. P of. A B ,.d and.λd are
obviously greater than or at least equal to the angular coefficient of . P Q, i.e., .r [ f (x), a, b]
and the property is proved in this case.
Finally, if .C does not have a point above . AB (Fig. 6.2), I displace . AB parallel to itself
towards . y = −∞, and let . A B be the last position in which it has common points with .C.
If . P is one of any of these points, at this point .d and .λd are at least equal to .r [ f (x), a, b].
Hence, the property is proved in all the cases. In both the cases shown in the figure, we have
reasoned on an arc . Pp , of origin . P, and located above parallel to . AB drawn through . P.
Furthermore, for the sake of what follows, we took . P to be different from . A.
From the previous theorem, it follows that the four derivative numbers have the same
upper limit and the same lower limit in any interval.
Indeed, let us compare the upper limits . L and . L of .d and .λg . Since .d has for limit . L
and .d is the limit of ratio .r [ f (x), α, β], where .α and .β belong to the considered interval
.(a, b), we can find .α and .β in .(a, b) such that .r [ f (x), α, β] is greater than . L − ε. The
maximum of .λg in .(α, β), and therefore in .(a, b), is, as a result, at least equal to . L − ε. This
is enough to prove that . L and . L are equal.
The common value of . L and . L is at the same time upper limit of the ratio .r [ f (x), α, β].
The property stated for the upper and lower limits in an interval implies the same property
for the limit superior and the limit inferior at a point. In particular, if for one of the derivative
numbers these two limits are equal, the same holds true for the others. This can be stated as:
If at a point .x0 , one of the derivative numbers is continuous, the same holds true for the other
three derivative numbers and furthermore, the function admits a derivative for .x = x0 .
Here is another obvious consequence: if the four derivative numbers are bounded, they
admit the same upper integral and the same lower integral; if one them is integrable, then
so are the others and they have the same integral.
In the case of derivatives, the Rolls13 theorem is a particular case of the theorem of finite
increment; in the case of derivative numbers the theorem, similar to the Rolle’s theorem, can
be stated as: If the continuous function . f (x) vanishes at .a and .b, the limits of the derivative
numbers in .(a, b) are, either all zero, or all different from zero and of opposite signs.
This statement is justified by noting that if . f (x) is not constant, .r [ f (x), α, β] takes
positive and negative values.
We can therefore say: if the continuous function . f (x), not constant in .(a, b), vanishes
at .a and .b, there exist points, interior to the interval .(a, b), for which the two right-hand
or left-hand derivative numbers are positive and non-zero and the other points where they
are negative and non-zero. If in Figs. 6.1 and 6.2, we suppose .bB > a A, the two derivative
numbers at point . P, interior to .(a, b), are in fact different from zero and positive.
The converse can be stated in the following form: if we know that the two right-hand or
left-hand derivative numbers of . f (x) are never both different from zero and are of the same
sign, then . f (x) is a constant.14
Among the continuous functions it is necessary to note that the functions of bounded
derivative numbers possess many properties of the differentiable functions. This class of
functions include the indefinite integral of bounded functions. The functions of bounded
derivative numbers are those for which we always have
defined on .(a, b), and let .s(t) be its arc from .a to .t.
The equation .s(t) = s can be solved in .t when .s is in the interval .[0, s(b)]; it admits a
unique solution, except in the case where .x(t), y(t), z(t) would be constant simultaneously
in an interval. Except in this case, .t(s) is a well-determined increasing function,
represents the given curve and these functions of .s are the continuous functions of derivative
numbers at most equal to .1.
The study of rectifiable curves, and as a result, of the functions of bounded variation,
is therefore linked, initially, to the study of functions of bounded derivative numbers. We
would have the opportunity of using this remark.
Furthermore, there exist continuous functions of bounded variation and unbounded
derivative numbers, the function .x 2 sin x1 is such an example.
The functions we have just seen are defined on the whole interval, but it is clear that the
concept of derivative and the derivative numbers extend immediately to a function defined
only for points of a set, or considered only for points of a set. The function .χ(x) (Sect. 3.1)
is discontinuous at every point; however it admits a derivative equal to zero, in the set of
rational numbers at rational points .x and a derivative equal to zero, in the set of irrational
numbers at irrational points .x.
Let us return to the search for the function primitives. The problem:
. A . To find a function whose right-hand superior derivative number (or one of the other
derivative numbers) is given.
. B . To determine, if a given function is right-hand superior derivative number of an unknown
function
.C . To find a function when its right-hand superior derivative number is known.
We would first clarify the indetermination of the solution of problem .C by showing that a
function is determined, up to an additive constant, when we know the finite value of one of
its derivative numbers for each value of the variable.
Let, in fact, two functions. f 1 (x) and. f 2 (x) have at each point the same right-hand superior
derivative number. We have, by hypothesis,
d f 1 (x) = d f 2 (x)
.
and also
.λd [− f 2 (x)] = −d f 2 (x),
as we see by referring to the geometric or analytic definition of the derivative numbers. This
definition also provides the inequality
has positive or zero derivative numbers at the points of . Z . Therefore, if we find a continuous
function . f 1 (x) having a definite derivative at every point, this derivative being .+∞ at the
points of . Z , the continuous function . f 2 (x) = f 1 (x) + ξ(x) has, at every point, the same
derivative as that of . f 1 (x) without differing from it by a constant.
112 6 The Search for Primitive Functions
the summation being extended to the intervals .δn included between .0 and .x. If .(α, β) is an
interval contiguous to . Z , we would take
α+β
. f 1 (x) = f 1 (α) + (x − α)k , for α ≤ x ≤
2
and
α+β
. f 1 (x) = f 1 (β) − (β − x)k , for ≤ x ≤ β.
2
It is clear that . f 1 (x) is continuous at point . α+β
2 , because
k
β−α
. f 1 (β) − f 1 (α) = 21−k (β − α)k = 2 ;
2
therefore, for any .x greater than the value .x0 , belonging to . Z we have
. f 1 (x) − f 1 (x0 ) ≥ (x − x0 )k .
And as a result . f 1 (x) has, at point .x0 , right-hand derivative, at least equal to that of .(x − x0 )k
at this point, therefore equal to .+∞.
Similarly, we will prove that . f 1 (x) has a left-hand derivative equal to .+∞ at the points
of . Z . Therefore . f 1 (x) does indeed fulfil the imposed conditions; the functions . f 1 (x) and
. f 2 (x) show us that a function is not determined up to an additive constant by the knowledge
a derivative number, is bounded if one could not establish cases where the solution functions
to the problems .C and .C remain determined, up to an additive constant, although the given
function may not be finite everywhere. It is this what we are going to do:
If the finite number .d f (x) is given for all the values of the variable, except for the
points of a set . E, the continuous function . f (x) is determined, up to an additive constant, in
every interval which does not contain points of . E in its interior; therefore, the same holds for
every interval if . E is reducible, as we have seen by revisiting the reasoning used in chapter
.1 in the context of research of Cauchy and Dirichlet.
We would have a similar result whenever we know a solution set of one of the following
problems:
D. In what set of points is it sufficient to know the finite derivative of a function for which
this function is determined up to an additive constant?
. D . In what set of points is it sufficient to know the finite value of the superior derivative
number to the right of a function for which this function is determined up to an additive
constant?
We have just quoted a family of sets responding to the question: the sets complementary
to the reducible sets; meaning that, the ones which we obtain by removing the points of
a reducible set from the considered interval. We owe Ludwig Scheeffer a more general
solution:
A function is determined, up to an additive constant, when we know for each value of .x,
except, may be, for those of a countable set . D, the finite value of the right-hand superior
derivative number of this function.
15 The previous example is due to M. Hans Hahn (Mon. Math. Phys., .1905). It was constructed in
response to a question that I had asked in the first edition of this book.
Since then other examples have been given, in particular by M. Denjoy. See also M. Ruziewicz
(Fund. Mat., t..1).
114 6 The Search for Primitive Functions
Let . f 1 (x) and . f 2 (x) be the two functions having, in general, the same finite right-hand
superior derivative number we will show that we always have
d φc (xc ) ≤ 0.
.
. d φc (x) ≥ c > 0.
H
To each value of .c of the interval . 0, 2(b−a) thus, corresponds a point .xc of . D. But, if .c
and .c1 are different, then .xc and .xc1 are as well; because the equality
implies
. c(xc − a) = c1 (xc1 − a)
6.3 Functions Determined by One of Their Derivative Numbers 115
. x α2 and not overlapping with .δ1 . If . x α3 is the first of . x i which belongs neither to .δ1 nor to
.δ2 , . x α3 is the mid-point of an interval of incommensurable length that overlaps neither with
The function . f (x), equal to the sum of the lengths of the intervals .δ and the parts of the
interval .δ, contained between .0 and .x, is a continuous increasing function of .x, which admits
.+1 as derivative for all the rational values of . x. However, this function is not necessarily of
the form .x+const., since . f (+∞) − f (0) is the sum of the lengths of .δ, a sum that can have
any desired positive value.
The continuous function . f (x) − 1 is not constant and in any interval there exist points
where its derivative is zero.
It was in the context of studying functions whose derivatives vanish in every interval,
that Ludwig Scheeffer embarked on his research on the determination of a function by its
derivative numbers.
As functions whose derivative vanishes in every interval, we can still cite the function
.φ(x) (Sect. 3.3) and the function .ξ(x) (Sect. 5.1).
The previous proof, inspired from certain methods of the proof of the theorem of finite
increment or from the Taylor’s formula, is quite artificial, and here is another:
The two functions . f 1 and . f 2 having the same .d at any point, except, possibly, at the
points of. D, the function. f (x) = f 1 − f 2 has, at any point not belonging to. D , a.d positive
or zero and a .λd negative or zero. If .α is such a point, let us associate it with the largest
interval .(α, α + h) such that we have
Let us suppose that the points of . D are arranged in a simply infinite sequence, .x1 , x2 , . . ..
Let us associate .xn with the largest interval .(xn , xn ) such that we have
ε
. f (xn ) − f (xn ) < .
2n
16 The previous proof is, almost exactly, that of L. Scheeffer. I have also followed his statement, but
it is worth noting that the proof only assumes that . D does not have the cardinality of the continuum,
which does not perhaps mean that . D is countable.
116 6 The Search for Primitive Functions
Each point of .(a, b) is now the origin of an interval .δ attached to this point; we could
cover .(a, b), starting from .a, with the help of a chain of intervals .δ (Sect. 5.2). Let us use
these intervals for calculating . f (b) − f (a), we find that this quantity is at most equal to
1
ε
. h+ε ≤ ε(b − a − 1);
2n
but, .ε is arbitrary, therefore . f (b) = f (a); and, since this reasoning could be used for any
subset of .(a, b), the function . f (x) is constant.
This method of proof leads to another result. Let us suppose that the exceptional set is,
no longer a countable set . D, but a set . E of measure zero. This means that the points of . E
could be covered with the help of an infinite countable number of intervals .d whose sum of
lengths is as small as we want.
To a point .α, not belonging to . E, let us associate an interval .(α, α + h), or .δ, as it has
been done above. To a point .α of . E we now associate, as the interval .δ, the interval .δ1 ,
whose origin is .α and whose extremity is the extremity of the interval .d containing .α.
We recover .(a, b) from .a with the help of a chain of intervals .δ and .δ1 ; this chain gives,
as upper limit of the increment . f (b) − f (a) of . f (x) in .(a, b), the number .ε h increased
by the sum of increments of . f (x) in the intervals .δ1 . The sum .λ of the lengths of the .δ1 is
smaller than the sum relative to .d, therefore it is as small as we want. In general, this does
not allow us to conclude that the corresponding sum of increments of . f (x) is as small as
we want. However, if . f 1 and . f 2 have derivative numbers less than . M in absolute value, this
sum is smaller than .2Mλ. Thus:
A function, of bounded derivative numbers, is determined, up to an additive constant,
when we know its right-hand superior derivative number, for any value of .x, except at those
of a set of measure zero.
This statement does not provide us with any information relating to the indetermination
of the problem .C when the given derivative numbers are not bounded.
However, let us notice that it is only for the points of . E for which we needed to know
that the upper limit of the derivative numbers of . f (x) is of the form .2M, without having
the necessity to know . M any longer; this would lead us to the following statement which
contains all the previous ones:
A continuous function is determined, up to an additive constant, when we know that it
has a finite right-hand superior derivative number at any point, except, possibly, at points
of a countable set . D, and when we know this finite derivative number, except at most at the
points of a set . E of measure zero.
Let . f = f 1 − f 2 always be the difference of two functions fitting the given data and let
us cover .(a, b), starting from .a, with a chain of intervals. Those intervals which have for
origin the points of . D, or not belonging either to . D or . E, are chosen as described and give
in . f (b) − f (a) a contribution equal to .ε(b − a + 1) at most. Let us partition . E into sets
. E i , where each set . E i consists of points where right-hand superior derivative number of . f
.i − 1 ≤ |d f | < i;
and encloses . E i , which is of measure zero, in the intervals .di of total length . 2εi i . An interval
.(x, x + δ) of the chain having for origin a point of . E i would be taken interior to .di and such
that
f (x + δ) − f (x)
. < i.
δ
Then the intervals of the chain having for origin the points of . E i have a contribution to
f (b) − f (a), at most, equal to their length multiplied by .i, therefore less than . 2εi i × i = 2εi .
.
We keep only one restriction: the right-hand superior derivative number is finite.
This restriction is moreover necessary: The function .ξ(x) (Sect. 5.1) is not a constant,
although it has its derivative zero everywhere, except, possibly, at the points of . Z , which is
of measure zero.
The previous theorem could be advantageously transformed; for these transformations I
would use a useful generalisation of the notion of the lower limit and upper limit which is
due to M. Baire.17
Let . f (x) be a function; the upper limit of . f (x), in an interval .(a, b), is a number . L such
that the set . E( f > m), formed from the points .x of .(a, b) such that . f (x) be greater than
.m, exists as soon as .m is smaller than . L, while it does not contain any point for .m > L; the
This example would be enough to understand what we mean by the limit superior or limit
inferior, in an interval or at a point, of a function when we neglect the countable sets, or the
non-dense sets, or the sets of measure zero. If, by neglecting certain sets, we obtain the equal
limit inferior and limit superior, we can say that,up to these sets, the function is continuous.
Given these definitions, here are the two propositions which I have in view: The limit
superior and inferior of a derivative number are the same, whether or not we neglect the
countable sets.
The limit superior and inferior of a finite derivative number are the same, whether or not
we neglect the countable sets.
I show, for example, the first of these two propositions. If the limits superior . L and . L 1
of a derivative number .φ(x), obtained by taking into account and then by ignoring the
countable sets, are equal, and if . K is a finite number contained between . L and . L 1 , the
derivative number .[φ(x) − K x] is negative, except for the points of a countable set for
which it is positive.
However, it is sufficient to revisit, by modifying it slightly, one of the two reasonings that
led us to the theorem of Scheeffer to see that it is impossible.
We will try to solve the problems . B and .C in the case where the function, given as .d , is
bounded.
Let us divide the positive interval.(a, b) into partial intervals. In.(α, β) the lower and upper
limits of . f (x) are .l and . L, therefore, if . F is the sought function such that .d F(x) = f (x),
we have
.(β − α)l ≤ F(β) − F(α) ≤ (β − α)L.
If we add similar inequalities, relative to the partial intervals, we have, by letting these
intervals tend towards zero,
b b
. d f (x) d x ≤ F(b) − F(a) ≤ d f (x) d x.
a a
From this inequality it follows, in particular, that: if one of the derivative numbers of
a function . f (x) is integrable, in which case the three others are as well and have the
same integral, its indefinite integral is of the form . f (x)+const.. More specifically: when a
derivative is integrable, there is an identity between its primitive functions and its indefinite
integrals.
These statements would obviously apply to the cases where the given function becomes
infinite in the neighbourhood of the points of reducible sets, on the condition that the gen-
eralisation of the integral mentioned in Sect. 6.1 is used.
If we take into account the theorems stated at the end of the previous section, we see
that if we know the derivative numbers everywhere, except for, at the values of a countable
set,—or if we know it everywhere, except for, at the values of a set of measure zero, and
moreover if we also know that it is bounded everywhere, we could still apply the previous
theorems on the condition of extending the integral which appears there, to the set in which
we know the derivative number.
To this observation is linked another one, which is more important. The case in which
we can solve problem .C is the one where the given derivative number is integrable. This
derivative number then has points of continuity; at these points, the integral of this derivative
number has a derivative equal to the given derivative number, and we know the derivative of
the unknown function everywhere, except at the points of a set of measure zero. It would be
sufficient to use the known values of the derivative to obtain the function. The solved case
of problem .C therefore is actually reduced to the problem .C.
The previous reasoning allows us to answer the questions . B and . B in an important
case: the one where the given function is integrable. To determine, for example, if a given
6.5 Riemannian Integration Considered as an Inverse … 119
integrable function . f (x) is an exact derivative, we will form its indefinite integral . F(x),
then we would find if we have
F(x + h) − F(x)
. f (x) = lim .
h→0 h
We, therefore, have a regular method of calculation allowing to determine if . f is or is not
an exact derivative. It is true that it is necessary to investigate if a certain expression does
or does not have the known limit . f (x). However, a derivative is, by definition, a limit, it is
unlikely that we could replace the indicated method of calculation by another, in which we
do not use the limits.
We have found a necessary and sufficient condition for an integrable function to be a
derivative. It does not take the usual form of such conditions. Most often, a necessary and
sufficient condition for the existence of a fact . A is stated as the existence of a property . B
that always accompanies fact . A and is always accompanied by it. However, to avoid a mere
tautology, it is necessary that we know a regular method of calculation allowing to determine
if we have or not, the property . B. It is this method which has been directly provided for the
case at hand.
If we want to state the necessary and sufficient condition found in the usual form, we
could, as Darboux called it, the mean value in .(a, b) of an integrable function . f (x) the
b
a f (x) d x; then we call the mean value at a point . x 0 , if it exists, the mean
1
quantity . b−a
value in .(x0 − h, x0 + k), when the positive numbers .h and .k tend to zero; and we have the
following statement:
For an integrable function to be a derivative function, it is necessary and sufficient that it
has, at any point, a definite mean value and, that it be equal to its mean value everywhere.
We have seen that we have generalised the problem of finding primitive functions in various
ways. Let us now search if one of these generalisations allows us to consider the integration
in the sense of Riemann as the inverse problem of differentiation.
If we recall that an indefinite integral admits as derivative the integrated function at all
the points where the integrated function is continuous, we are led to pose, along with M.
Volterra, the following problem: Find a continuous function which admits a given bounded
function . f (x) as its derivative at all points where . f (x) is continuous.18
18 In reality, M. Volterra is looking for the functions which admit . f (x) as its derivative at all points
which are neither points of discontinuity of . f (x), nor limit points of such discontinuities. Moreover,
M. Volterra assumes implicitly that the functions he is looking for have bounded derivative numbers.
For these two reasons the results he obtains differ from those in the text. Moreover, any function
is obviously a solution of M. Volterra’s problem, if the points of discontinuity of . f (x) form an
everywhere dense set. In contrast, only very specific functions satisfy the statement in the text.
120 6 The Search for Primitive Functions
This problem is always possible, because both the lower and upper integral of . f (x) can
be used to address the question. However, in general, it is indeterminate, meaning not all
its solutions can be expressed in a formula of the form . F(x)+const. When . f (x) is not
integrable, the problem is always indeterminate. If . f (x) is integrable, the problem may be
determinate; for example when the set of points of discontinuity is reducible. However, it
may also be indeterminate, as is the case when the set of points of discontinuity contains
a perfect set . E. We have learned (Sect. 2.2) how to construct a continuous function that is
not constant everywhere but is constant in every interval contiguous to . E. When added to
a function that solves the proposed problem, this function provides a new solution to the
problem.
Thus, our problem encompasses as a particular case the problem of indefinite Riemannian
integration, but it is more general than the previous problem.
Now, let us propose to find a function of bounded derivative numbers which admits a
bounded function . f (x) as derivative at all points where . f (x) is continuous.
This new problem is always possible and still admits two integrals of . f (x) for solution.
However, if . f (x) is integrable, it is determined, because the derivative of the sought function
of bounded derivative numbers is known everywhere, except at the points a set of measure
zero. This problem is, therefore, determined only for integrable functions; and when it is
determined, its solution is the indefinite integral of . f (x).
We could thus, in a certain sense, consider Riemannian integration as the inverse operation
of the differentiation.
Definite Integration Using Primitive Functions
7
We have obtained the theorems that theoretically allow, in the extended cases, to determine
if a given function is a derivative function and, if it is so, to find its primitive function.
In practice, only one of these theorems is commonly used: every continuous function is
a derivative function. As for the actual calculation of the primitive functions, it is never
done by means of a definite integral,1 but with the help of a method known by the name
of integration by parts and integration by substitution. These two methods can be applied,
whether the functions are continuous or not.
We can also use the following theorem: A uniformly convergent series of derivative
functions represent a derivative function.
Its primitive function is obtained by taking the sum of the primitive functions of the terms
of the given series, the constants being so chosen that the series obtained is convergent for
one of the values of the variable.
1 However, it is sometimes possible to practically perform the search for a primitive function with
the help of definite integrals. We find an example of such a search in l’introduction a l’etude des
fonctions d’une variable reelle of J. Tannery, p. .284.
Moreover, the mathematicians, especially those who used the method of indivisibles, have per-
formed a number of quadratures by applying what would later become the definition of the definite
integral. After the invention of differential calculus and the introduction of the notions of derivative
and primitive function, the quadratures performed earlier formed the first elements of the table of
derivatives and the primitive functions. Currently, this table is primarily presented in the form of
a table of derivatives; we see that the historical order is exactly the reverse of that adopted in our
courses.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 121
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_7
122 7 Definite Integration Using Primitive Functions
Let
. f = u 1 + u 2 + · · · = u 1 + · · · + u n + rn = sn + rn ,
F = U1 + U2 + · · · = U1 + · · · Un + Rn = Sn + Rn ,
be the given series and the series of the primitive functions, which is, by hypothesis, con-
vergent for a certain value .x0 .
Let us choose .n sufficiently large, for which we have, for any positive . p,
This inequality shows that the series. F is uniformly convergent in.(a, b), since it is convergent
for .x0 .
Let us evaluate the ratio
F(x + h) − F(x)
r [F(x), x, x + h] =
. = F(x),
h
F = Sn + Rn = Sn + lim (Sn+ p − Sn ).
p→∞
The quantity .(Sn+ p − Sn ) is less than .ε in absolute value, according to the theorem of
finite increment. Therefore, if we let .h tend to zero, any of the limits of .F differs from the
limit .sn (x) of .Sn by at most .ε. Since .ε is arbitrary, it is thus shown that . F(x) admits . f (x)
for its derivative.
This theorem also allows to use the principle of condensation of singularities in the
construction of the derivative functions.
When a derivative function is given by a series of non-negative derivative functions, we
can take the primitive functions term-by-term on the condition of choosing the constants in
a manner that the obtained series is convergent.
To demonstrate this, I will retain the earlier notation, and to simplify the language, I
assume that the series. F is convergent for the origin of the interval.(a, b) under consideration,
and that .U1 , U2 , . . ., vanish for .x = a. Let .F be one of the primitive functions of . f which
vanishes at .x = a. It needs to be shown that . F = F.
All the .Ui are positive, therefore . Sn increases with .n. However, since . f is at least equal
to .sn , .F(x) is at least equal to . Sn (x), and . Sn (x) converges to a limit . F(x), at most equal to
.F(x).
The same reasoning when applied to the positive interval .(x, x + h) shows that .F(x +
h) − F(x) is at least equal to . F(x + h) − F(x), and as a result, . f (x), the derivative of .F(x),
is at least equal to .d F(x).
7.1 Direct Search for Primitive Functions 123
On the other hand. F(x + h) − F(x) is greater than. Sn (x + h) − Sn (x), therefore.λd F(x)
is at least equal to the derivative .sn (x) of . Sn (x), and, since .n is arbitrary, .λd F(x) is at least
equal to . f (x).
Therefore,. F(x) has a right derivative equal to. f (x); by a similar reasoning in the negative
interval .(x, x − h), we see that . F(x) also admits . f (x) for the left derivative. Hence, the
theorem is proved.
We can also say: if derivative functions . f n increase to a derivative function . f , their
primitive functions converge to the primitive function of . f (x), if the constants are suitably
chosen.
Indeed, we can write
. f = f1 + ( f2 − f1 ) + ( f3 − f2 ) + · · · ,
and all the terms, which are derivative functions, are positive, with possible exception of the
first term.
The theorem is still true if, instead of considering the functions increasing with the
integer .n, we consider the derivative functions . f (x, α) increasing with the parameter .α and
converging to a derivative function . f when .α tends to .α0 .
Finally, it must be noted that it is necessary to know that the function . f , whether a limit
or a sum, is a derivative function, in order to have the right to apply the previous theorem:
the function
−αx 2
. f (x, α) = −e
increase to the function . f (x) zero everywhere except for .x = 0 where it is equal to .−1,
when .α increases to infinity. However, . f (x, α) is a derivative function whereas . f (x) is not.
These two properties will allow us to perform the search for the primitive functions in
the extended cases.
First of all, when a function is the sum of a uniformly convergent series of derivative
functions, it is a derivative function for which we know how to find the primitive functions.
This is an important theoretical application.
Let a continuous function . f (x) be defined on .(a, b). Let us mark the points .a =
x0 , x1 , x2 , . . . , xn = b taken sufficiently close to each other so that, in .(xi , xi+1 ), the oscil-
lation of . f is smaller than .ε.
In the curve . y = f (x), let us inscribe the polygon line . y = ϕ(x) where vertices have the
abscissa .x0 , x1 , x2 , . . . , xn , and . f (x) and .ϕ(x) differ by less than .ε. This means that .ϕ(x)
converges uniformly to . f (x), when .ε tends towards zero. Therefore, it will be sufficient to
show that .ϕ(x) is a derivative function, for us to assert the same for . f (x). However, .ϕ(x)
being in .(xi , xi+1 ), the polynomial of first degree
f (xi+1 ) − f (xi )
.ϕ(x) = f (xi ) + (x − xi ) ,
xi+1 − xi
(x − xi )2 f (xi+1 ) − f (xi )
(x) = (xi ) + (x − xi ) f (xi ) +
. .
2 xi+1 − xi
It is shown that any continuous function is a derivative function, and this is done without
resorting to integration.2
When we are able to represent a function in the form of a series of derivative functions,
all having the same sign, we will have a systematic method of calculation that allows us to
determine if . f is an exact derivative, since the primitive function of . f cannot be other than
the sum of the primitive functions of the terms of the given series (compare Sect. 6.4).
Thus, the two theorems on the primitive functions of series allow us, in certain cases, with
respect to the determination of the primitive functions, to accomplish what the theorems on
the integration enable us to do for integrable functions.
I leave aside similar remarks related to the search for a function whose derivative num-
ber is a given function. I will now outline some properties of derivative functions which
will sometimes immediately allow us to recognise that a given function is not a derivative
function.
A derivative function cannot pass from one value to another without taking all the intermedi-
ate values. Indeed, let us assume that we have . f (a) = A, . f (b) = B, and let .C be a number
included between . A and . B. We can take .h > 0 sufficiently small for .r [ f (x), a, a + h] to be
included between . A and .C and .r [ f (x), b − h, b] to be included between . B and .C. The func-
tion .r [ f (x), x, x + h] is, .h being fixed, a continuous function of .x. When .x varies from .a to
.b − h, it passes from one value included between . A and .C to a value included between .C and
of finite increment shows that in .(x0 , x0 + h) there exists a value .c such that . f (c) = C.3
Therefore, the derivative functions possess one of the properties of the continuous func-
tions. Darboux, in his Memoire Sur les fonctions discontinues,4 emphasised a lot on this
property. We have taken, in France, the habit of defining a continuous function as the ones
which cannot pass from one value to another without passing through all the intermediate
2 We might be tempted, to apply the theorem on the uniformly convergent series of by relying on a
proposition attributed to Weierstrass: every continuous function can be represented by a uniformly
convergent series of polynomials. However, to make this method suitable for the purpose, it is nec-
essary to establish Weierstrass’s theorem without using the integration. The proof I gave in Bulletin
des Sciences Mathématiques de .1898, in a Note Sur l’approximation des fonctions, satisfies this con-
dition. In another Note: Remarques sur la définition de l’Intégrale, published in the same Bulletin in
.1905, I used a different idea than what we have just used here.
3 This does not assume that . f (x) is finite, but only that . f (x) is always well-defined in magnitude
and sign.
4 Annales de l’École Normale, .1875.
7.2 Properties of Derivative Functions 125
values, and this definition was considered as equivalent to that of Cauchy. Darboux, who
constructed in his Memoire, a discontinuous derivative function in the Cauchy’s sense, was
able to show that the two definition of continuity were very different.5
It is easy to cite the discontinuous functions which do not pass from one value to another
without taking, at least once, each intermediary value. This is the case of the function equal
to .sin x1 for .x = 0 and to any value in the interval .(−1, +1) for .x = 0.
It is very curious that a function can possess this property, which has been taken for the
definition of continuity, and yet be discontinuous at every point. To construct such a function,
I write the number .x, taken between .0 and .1, in a system of numeration, the decimal system,
for example
a1 a2 a3
.x = + + 3 + ···
10 102 10
Let us consider the sequence of digits of odd rank .a1 , a3 , a5 , . . .. If it is not periodic, we
will take .ϕ(x) = 0; if it is periodic, and if the first period starts at .a2n−1 , then we will take
a2n a2n+2 a2n+4 a2n+6
ϕ(x) =
. + 2
+ 3
+ + ···
10 10 10 104
It is obvious that the function .ϕ(x) thus defined takes all the values in the interval .(0, 1) no
matter how small the interval is. Therefore, .ϕ(x) is discontinuous at every point. Moreover,
.ϕ(x) does not take values outside of .(0, 1). So it does not pass from one value .a to another
value .b without taking all the values of .(0, 1), and, consequently, all the values between .a
and .b.
It is necessary to note that, with the definition criticized by Darboux, the sum of two
continuous functions is no longer necessarily a continuous function. In fact, if
1
. f 1 (x) = sin for x = 0 and f 1 (0) = 1,
x
and if
1
. f 2 (x) = − sin
for x = 0 and f 2 (0) = 1.
x
the two functions . f 1 and . f 2 cannot pass from one value to another without taking all the
intermediary values and this does not hold for . f 1 + f 2 , since
The sum of two derivative functions being a derivative function, it is necessary, according
to the previous remark, to state as a new property that the sum of two derivative functions
cannot pass from one value to another without taking all the intermediary values. We can also
5 Allow me to point out that in .1903 we still taught in high schools of Paris the definition criticized
by Darboux in .1875. This is all the more surprising since the property which is stated in Cauchy’s
definition is the one which appears directly in almost all the proofs, while the property of continuous
and derivative functions is hardly used except in the theorem of substitutions and its consequences.
126 7 Definite Integration Using Primitive Functions
say that the difference of the two derivative functions cannot change sign without vanishing,
which, if we think about the geometric representation, we can state thus: Two derivative
functions cannot cross each other without meeting.
Here is an example of the application of this property. Let .ψ(x) be a function equal to
the function .ϕ(x) (Sect. 7.2) when .ϕ(x) is not equal to .x, and equal to .0 when .ϕ(x) =
x. .ψ(x), as .ϕ(x), cannot pass from one value to another without passing through all the
intermediary values, therefore the first theorem does not permit to ascertain that .ψ(x) is not
a derivative function; but, since .ψ(x) crosses the continuous function .x in any interval and
meets, however, only for .x = 0, the second property shows that .ψ(x) is not a derivative.
Before investigating if the function .ϕ(x) is a derivative, I will show, how an important
special case of Scheeffer’s theorem is deduced immediately from the theorem of Darboux.
Let us assume that the derivative of a function . f (x) is always well defined in magnitude
and sign (we do not suppose it to be finite). Then, if it is not always equal to a given number. A,
the set of values of .x for which . f (x) is different from . A has the cardinality of the continuum.
In fact, either . f (x) is constant and the property is proved, or . f (x) takes two values . B and
.C, and in that case, it also takes all the values included between . B and .C, which are all
different from . A, except may be one. The set of these values of . f (x), different from . A has
the cardinality of the continuum, and the same holds for the set of corresponding values of
. x.
With this in mind, if . f (x) always has a derivative, and if this derivative is zero, except
may be for a countable set of values of .x, we can assert that it is always zero. This is the
theorem of Scheeffer in a particular case.
Let us return to the function .ϕ(x). Is it a derivative? The previous two theorems do not
seem to easily provide an answer to this question. One method is to apply a previously
proved theorem; a bounded derivative function has the same maximum whether or not we
neglect the sets of measure zero.6 It is not difficult to show that .ϕ(x) is non-zero only for set
of values of .x of measure zero (see Sect. 8.2), .ϕ(x) is, therefore, not a derivative function.
This result may be obtained in a totally different manner. A derivative cannot be discon-
tinuous at every point, and .ϕ(x) is discontinuous at every point.
This property of derivative functions follows from a theorem due to M.R. Baire. . f (x) is
the limit, for .h = 0, of the function .r [ f (x), x, x + h], continuous at .x when .h is constant;
therefore, it is a function of first class, that is, a limit function of continuous functions. But
M.Baire showed that if we consider a function of class one on any perfect set, there exists
points where it is continuous on this perfect set; this is expressed by saying that it is point
wise discontinuous on any perfect set.7
In a lot of cases, we can, without resorting to integration, determine whether a given function
is a derivative, and we could also hope to find the primitive function of a given derivative
without integration. Previously, we resolved these questions by using the definite integral;
one might wonder if, conversely, we could define define the integral with the help of primitive
functions. This is the method of Duhamel and Serret.8 For these authors a function . f (x)
has an integral in .(a, b) when it admits a primitive function .F(x) in .(a, b). This integral . Iab
is, by definition,
. Ia f (x) d x = F(b) − F(a).
b
This definition is not equivalent to Riemann’s definition. On one hand, we know that there
exist functions that are integrable in Riemann’s sense, which are not the derivative functions;
on the other hand, as we will see, there exist derivative functions which are not integrable
in Riemann’s sense.
The first example of such a function is due to M. Volterra (Giornale de Battaglini, .1881);
here is how we obtain it:
Let . E be a non-dense perfect set which is not an integrable group (Sect. 4.1). Let .(a, b)
be an interval contiguous to . E, let us consider the function
1
.ϕ(x, a) = (x − a)2 sin ;
x −a
its derivative vanishes an infinite number of times between .a and .b, let .a + c be the greatest
value of .x not greater than . a+b 2 at which .ϕ vanishes. Given this, we define a function . F(x)
by the following conditions: it is zero at the points of . E; in any interval .(a, b) contiguous
to . E, it is equal to .ϕ(x, a) from .a to .a + c; from .a + c to .b − c, the function . F is constant
and equal to .ϕ(a + c, a); from .b − c to .b, . F is equal to .−ϕ(x, b).
This function . F(x) is obviously continuous. It has a derivative; this is obvious for the
points which do not belong to . E; let .x0 be a point of . E, the ratio .r [F(x), x0 , x0 + h] is zero
if .x0 + h is point of . E. If .x0 + h is not a point of . E, it belongs to an interval contiguous
to . E, let .α be one of the extremities of this interval which is in .(x0 , x0 + h); we obviously
have
F(x0 + h) (x0 + h − α)2
. |r [F(x), x 0 , x 0 + h]| = ≤ ≤ |h|,
h |h|
therefore . F(x) has a zero derivative at all the points of . E.
8 In reality, Duhamel and Serret hardly considered anything other than continuous functions. For
these functions, according to what has been mentioned earlier, their definition is equivalent to that of
Cauchy; it is only a matter of differences in exposition at that point.
128 7 Definite Integration Using Primitive Functions
The derivative . F of . F is bounded, because the derivative of .x 2 sin x1 , which is zero for
. x = 0, and which, for . x different from zero, is equal to
1 1
2x sin
. − cos ,
x x
is bounded. However, this derivative . F is not integrable, in Riemann’s sense, because at all
points of . E, the maximum of . F is .+1 and its minimum is .−1, as is the case at point .x = a
for the function .ϕ (x, a); but . E is not an integrable group, by hypothesis.
By a suitable application of the principle of condensation of singularities, we obtain a
derivative function which is not integrable over any interval, however small it may be.9
Duhamel’s definition applies, therefore, to the bounded functions to which Riemann’s
definition does not apply. Furthermore, Duhamel’s definition applies to the unbounded func-
tions because there exist unbounded derivatives, but still always finite derivatives of.x 2 sin x12 ,
for example.
To the definition of Duhamel and Serret, we can apply the generalisation used by Cauchy
and Dirichlet. I will not dwell on this generalisation, nor, for the time being, on the following
one, which contain as a particular case the definition of Riemann and Duhamel for the
bounded functions: A bounded function. f (x) is said to be summable, if there exists a function
of bounded derivative number . F(x) such that . F(x) admits. f (x) as its derivative except for a
set of values of .x of measure zero. The integral in.(a, b) is then, by definition,. F(b) − F(a).10
Let us adopt the definition of Duhamel and Serret without generalisation. The integral of
Duhamel (Integral . D) possesses some of the properties of Riemann’s integral.
We have
. Ia + Ib + Ic = 0.
b c a
The sum of two . D integrable functions is a . D integrable function and has an integral that
is sum of the integrals; but the product of two . D integrable functions is not necessarily . D
integrable.11
A uniformly convergent series of . D integrable functions is a . D integrable function and
the integration can be performed term-by-term; this is the proposition of Sect. 7.1. From the
proposition in Sect. 7.1 we deduce that if the . D integrable functions, . f n (x), increase to a . D
integrable function . f (x), the integral of . f n increases to that of . f , for a positive interval of
integration.
9 M. Kopke has constructed differentiable functions of bounded derivative which vanish in every
interval. These derivatives are obviously not integrable.
10 Compare with Sect. 6.5, where, as soon as. f is given, we know at what points we may not necessarily
have . F (x) = f (x); here on the contrary, we do not know.
Different functions . F(x) corresponding to the same function . f (x) differ only by an additive
constant.
We will encounter these bounded
summable
functions in the following chapters.
11 For example the product . x x 2 sin 1 is not . D integrable.
x
7.3 The Integral Deduced from the Primitive Functions 129
The similar proposition for Riemann’s integral is true. We would base its proof on the
one provided in Sect. 7.1.
Let us keep the notation of Sect. 7.1. . f , u 1 , u 2 , . . . , are now the positive integrable
functions. .F, U1 , U2 , . . . , are their indefinite integrals which vanish at the origin .a of the
considered interval.
We obviously have . f ≥ sn , hence .F ≥ Sn , and since the . Sn are increasing the series of
.U is convergent. The increment of .F, in an arbitrary positive interval, is at least equal to that
12 We can note that this property remains true when dealing with so-called summable functions,
which are integrable according to the generalisation mentioned in Sect. 7.3.
13 I can only mention another application of the . D integrals: when a bounded derivative function
admits a trigonometric expansion, the coefficients of its expansion are given by well-known formulae
of Euler and Fourier, with the integrals appearing in these formulae being . D integrable.
Indeed, there exist bounded derivative functions which are not Riemann integrable, but still admit
trigonometric expansion. For the proof of these properties, we can refer to a Memoire Sur les series
trigonometriques that I published in the Annales de l’École Normale(November .1903).
130 7 Definite Integration Using Primitive Functions
The new studies on the integral are, however necessary, because we have still not solved
the problem of finding primitive functions; moreover, for calculating the length of a curves
with tangents, both integrations are insufficient.14
Further, I would like to add that if the two integrations we have studied generally seem
sufficient, this is solely because, almost always, we deliberately restrict ourselves to the con-
sideration of continuous functions, and often even to the consideration of analytic functions.
2
14 It is easy to see that . 1 + x 2 sin x1 is not an exact derivative.
Starting from there, we show without any difficulty that the quantity . 1 + F (x)2 , where . F is
non-integrable derivative function of M. Volterra, neither integrable in the sense of Riemann nor in
the sense of Duhamel.
Therefore, the curve . y = F(x) cannot be rectified, by either of the two methods used.
For the application mentioned in the previous Note, both integrations are also insufficient. This
becomes obvious when considering the sum of non-integrable derivative that can be represented
trigonometrically, and a non-derivative function that can be represented trigonometrically.
The Definite Integral of Summable Functions
8
3.
b b b
. [ f (x) + ϕ(x)] d x = f (x) d x + ϕ(x) d x;
a a a
4. If we have . f ≥ 0 and .b > a, we also have
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 131
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_8
132 8 The Definite Integral of Summable Functions
b
. f (x) d x ≥ 0;
a
5. We have 1
. 1 × d x = 1;
0
6. If, when the index .n increases, . f n (x) increases to . f (x), the integral of . f n (x) tends
towards that of . f (x).
The significance, necessity and consequences of the first five conditions of this problem of
integration are quite obvious; we will not dwell on them.
The condition .6 has a special role. It neither has the same nature of simplicity as the first
five, nor the same trait of necessity.2 Moreover, while it is easy to construct the numbers
satisfying any four of first five conditions, without satisfying all five, showing that all the
five conditions are independent, we do not know if the six conditions of the problem of
integration are independent or not.3
By stating the six conditions of the problem of integration, we define the integral. This
definition belongs to the class of what can be called descriptive definitions; in these defi-
nitions, we state the characteristic properties of the ‘being’ that we want to define. In the
constructive definition, we state the operations that are necessary to obtain the ‘being’ that
we want to define. Constructive definitions are more often used in Analysis; however, we
sometimes use descriptive definitions4 as well. The definition of integral, according to Rie-
mann, is constructive, while the definition of the primitive functions is descriptive.
2 It appears so insignificant that it is generally unknown, even for the case where . f and . f are
n
integrable in the sense of Riemann or even continuous. However, it could be the case that some of its
consequences have, on the contrary, a very high degree of necessity. To prepare for the introduction
of this condition .6, I worked in Sects. 7.1 and 7.3 with integration and with search for the primitive
function of the limit of an increasing sequence of the functions.
3 The answer to this question does not matter for the application, but it is of interest from the viewpoint
of principles. If it was shown that this sixth condition is independent of the other five, it would be
necessary to try to replace it with a simpler sixth condition and especially to explore whether, among
the system of numbers which satisfy only the first five conditions, there are any as useful as the one
being studied.
Just before the second edition, M. St. Banach studied the question posed in the first edition, and
concluded that the six conditions of the problem of integration are independent (Fund. Math., t. . I V ).
4 The use of these descriptive definitions is indispensable for the initial stages of a science when
we want to construct that science in a purely logical and abstract manner. See the thesis of M. J.
Drach (Annales de l’École Normale, .1898) and the Memoire of M. Hilbert Sur les fondements de la
Géométrie (Annales de l’École Normale, .1900). Such a definition is then called axiomatic, since it
enumerates the necessary axioms. Thus, it is self-sufficient and forms a complete whole.
In contrast, the descriptive definitions introduced in the course of development of a theory, such
as the definition of the integral, do not claim to enumerate all the axioms on which it relies. They do
not form a complete whole and cannot be isolated from the exposition of the rest of the theory.
8.1 The Problem of Integration 133
When we state a constructive definition, it is necessary to show that the operations men-
tioned in this definition are possible. Similarly, a descriptive definition is also subject to
certain conditions: it is necessary that the stated conditions be compatible.5 The general
method used so far to show that the conditions are compatible is the following: we choose
in a class of ‘beings’ previously defined, ‘beings’ which possess all the stated properties.
This class of ‘beings’ is typically the class of integers6 ; and it is assumed that the descriptive
definition of these integers does not contain any contradictions.
It is also necessary to study the nature of the indetermination of the ‘beings’ that we want
to define. Let us assume, for example, that we have shown the impossibility of the existence
of the two different classes of ‘beings’ satisfying the specified conditions, and additionally,
we have shown the compatibility of these conditions by choosing a class of ‘beings’ that
satisfy them. This class of ‘beings’ will be the only one defined, so that the constructive
definition used to make the choice is exactly equivalent to the given descriptive definition.
We would find a constructive definition equivalent to the descriptive definition of the
integral.7
First, we will easily show, relying on the conditions .3 and .4, that we have condition . S
b b
. k f (x) d x = k f (x) d x (S)
a a
where .k is a constant. This being said, let . f (x) be any function. We denote by . E[α <
f (x) < β] the set of values of .x for which we have .α < f (x) < β, and by . E[ f (x) = α]
the set of values of .x for which . f (x) = α; and we will use similar other notations.
Let .(l, L) be a positive interval containing in its interior the interval of the variation of
. f (x) ; let us partition this interval into partial intervals with the help of the numbers
8
5 Meaning thereby, none of their consequences are of the form: . A is not . A. It is also necessary, as I
have already said, to investigate if the conditions are independent.
6 See the Memoires already cited by M. Hilbert. This is because we can show the compatibility of the
stated conditions in the descriptive definitions of the early stages of the Geometry with the help of the
system of integers that it is legitimate to say that the Geometry can be entirely constructed starting
from idea of number. From the viewpoint of the arithmatisation of the science, the main interest of
the precise definition of the integral, as posed by Cauchy, is that it reduces the various notions of
magnitude involved in geometry (area, volume, length of the curves, etc.) to that of the length of a
segment, i.e., the difference of the two numbers. This definition of Cauchy completes the work of
Descartes who, by the use of coordinates, reduced all the geometries to that of a straight line.
7 By placing ourselves in the same point of view, we can say that the work presented in this book
has the primary goal of seeking a constructive definition, equivalent to the descriptive definition of
primitive functions.
8 In other words, the lower and upper limits of . f (x) are included between .l and . L but not equal to
.{l, L}; with the notations of the text, there would be no need to change anything if .l and . L were equal
to the limits of. f (x); however, at other times, we would need to take some precautions for the extreme
values of the indices which appear in the summation. Here, we could extend the first summation only
from .0 to .n − 1 and the second from .1 to .n.
134 8 The Definite Integral of Summable Functions
.i (i = 0, 1, 2, . . . , n) the function equal to .1 when . x belongs to . E[li−1 < f (x) < li ] or
i=n
i=n
ϕ(x) =
. li ψi (x) ≤ f (x) ≤ li i (x) = (x).
i=0 i=0
When we know how to integrate the functions .ψ which take only the values .0 and .1, we will
deduce, thanks to the conditions .3 and . S, the integrals of .ϕ(x) and .(x), which include the
integral of . f (x) (conditions .3 and .4).9
Moreover,.ϕ(x) and.(x) differ from. f (x) by.ε at most, and therefore converge uniformly
to . f (x) when .ε tends towards zero; it is easy to conclude that their integrals converge to that
of . f (x).
b
Indeed, if the lower and upper limits of.g(x) are.m and. M, according to.3 and.4,. a g(x) d x
is included between
b b b b
. m dx = m d x and M dx = M d x;
a a a a
now, by making
g(x) = f (x) − ϕ(x),
.
we have
ε ≥ M ≥ m ≥ 0,
.
b
therefore, the integral of .g(x) is smaller in modulus than .ε a d x, a quantity which tends
towards zero with .ε.
To be able to calculate the integral of any function, it is sufficient to know how to calculate
the integral of functions .ψ which take only the values .0 and .1.
It should be noted that we have incidentally shown the possibility of term-by-term inte-
gration of uniformly convergent series, if the problem of integration is possible.
b
The quantity, . a d x which appears in the previous proof, is easily calculated; by using
.1, 2 and .5, we see that it is equal to .b − a.
If the function is included between .l and . L, its integral in .(a, b) is included between
.l(b − a) and . L(b − a); this is the mean value theorem.
If we apply this theorem after having decomposed .(a, b) into partial intervals, we find
b
that . a f (x) d x is included between the sums which are used to define the integrals by
lower and upper sums; the integral is therefore included between the integrals by lower and
upper sums. In particular, if the problem of integration is possible for integrable functions
in the sense of Riemann, then for these functions it does not admit, a solution other than the
Riemann’s integral.
Let us now concentrate on functions .ψ which take only the values .0 and .1. Such a function
is entirely defined by the set . E[ψ(x) = 1] of values where it is different from .0; the integral
of such a function, in a positive interval, is a positive number or zero which we can consider
as attached to the subset of the set . E[ψ(x) = 1] included in the interval of integration. If we
translate in the geometric language, the conditions of the problem of integration of functions
.ψ, we have a new problem, the problem of the measure of sets.
To state it, I recall that the two sets of points on a line are considered equal if, by the
displacement of one of the two, we can make them coincide. Moreover, a set . E is considered
the sum of sets.e if every point of. E belongs to at least one of the sets.e.10 Here is the question
to be resolved:
We propose to associate with each bounded set . E, formed from points of .ox, a positive
number or zero, .m(E), that we call the measure of . E, and which satisfies the following
conditions:
The condition .3 replaces the condition .5; the condition .2 results from the application of
conditions .3 and .6 to the series
ψ = ψ1 + ψ2 + · · · ,
.
in which all the terms and the sum are the functions .ψ; as for condition .1 it is equivalent to
condition .1. However, an explanation is necessary. There are two types of equal sets: those
which can be made to coincide by a translation along the .ox and those which can be made
to coincide by a rotation of .π around a point of .ox. Condition .1 applies only to the former.
I did not put this restriction in the statement because, in the subsequent reasoning, we can
10 With our definition, the sets .e could, therefore, have common points.
136 8 The Definite Integral of Summable Functions
restrict ourselves to using only translations as displacements, and yet we will still always
obtain equal measures11 for two equal sets in either case.
A simple consequence of conditions .1 , 2 , 3 is that any positive interval .(a, b) has a
measure equal to its length.b − a, whether or not the extremities make a part of the interval.12
If we refer to Chap. 3, we immediately see that, if the problem of the measure is possible,
we have
.ei (E) ≤ m(E) ≤ ee (E);
for the . J measurable sets, the problem of the measure is possible in more than one way and
measure corresponds to extent in the sense of Jordan.
Now, let . E be an arbitrary set. We can enclose its points in a finite or countably infinite
number of non-intersecting intervals. The measure of the set of points of these intervals
is, according to .2 , the sum of the lengths of these intervals. This sum is an upper limit of
the measure of . E. The set of these sums has a lower limit denoted by .m e (E), the exterior
measure of . E, and we obviously have
Let .C AB (E) be the complement of . E with respect to . AB, which means, the set of points not
making a part of . E but making a part of a segment . AB of .ox containing . E. We must have
therefore
.m(E) = m(AB) − m[C AB (E)] ≥ m(AB) − m e [C AB (E)];
the lower limit thus found for .m(E), a limit which is necessarily positive or zero, is called
the interior measure of . E, denoted as .m i (E); it is obviously greater or at least equal to the
interior extent of . E.
For comparing the two numbers .m e , m i , we use a theorem due to M. Borel:
11 All the conditions required for the integration problem concerning the functions.ψ have been stated.
However, we are afraid that this is not sufficient for integral of arbitrary functions—one defined as
soon as the integral of functions .ψ are—will also satisfy these conditions. In the sequel, it is shown
that these fears are not justified.
We can now show this without using the values of the integral of functions .ψ, and we could
also show that, if we remove the words or a countably infinite from .2 , we obtain a new problem
of measure that corresponds exactly to the problem of integration posed with conditions .1, 2, 3, 4, 5
without condition .6.
12 This has already been expressed by the equality . b d x = b − a.
a
8.2 The Measure of the Sets 137
If we have a family of intervals . such that every point of an interval .(a, b), including .a
and .b, be interior 13 to at least one of the ., then there exists a family consisting of a finite
number of the intervals of . which has the same property [every point of .(a, b) is interior
to one of them].
Let .(α, β) be one of the intervals of . containing .a, the property to be shown is obvious
for the interval .(a, x), if .x is included between .α and .β. I mean that this interval can be
covered with the help of a finite number of intervals of ., which I express by saying that
the point .x is reached. It needs to be shown that .b is reached. If .x is reached, all the points
of .(a, x) are reached; if .x is not reached, then none of the points of .(x, b) are reached.
Therefore, if .b is not reached, there must be a first ‘unreached’ point, or a last reached point.
Let.x0 be this point. It is interior to an interval of., (α1 , β1 ). Let.x1 be a point of.(α1 , x0 ), x2
be a point of .(x0 , β1 ); x1 is reached by hypothesis, and the finite number of intervals of .
which are used to reach it, plus the interval .(α1 , β1 ), allows us to cover .x2 > x0 . Therefore,
. x 0 is neither the last reached point, nor the first ‘unreached’ point; hence, .b is reached.
14
From the theorem of M. Borel it follows that if we cover the whole interval .(a, b) with
the help of a countably infinite number of intervals of ., the sum of the lengths of these
intervals is at least equal to the length of the interval .(a, b).15 Indeed, we can also cover
.(a, b) with the help of a finite number of intervals of . and the theorem, being obviously
true when we consider only these intervals in a finite number, is all the more true, when we
consider all the intervals of ..
Let us now go back to the set . E and its complement .C AB (E). Let us enclose the first in
a countably infinite number of intervals .α, and the second in intervals .β, we have
is measurable.
Let. E 1 , E 2 , . . . be the measurable sets, in a finite or countable number, which are pairwise
disjoint, and let . E be their sum set.
From the definition of the measurable sets, it follows that we can enclose. E i in a countably
infinite number of intervals .α1 and .C AB (E i ) in the intervals .βi in a manner that the measure
16 It is only for these sets that we will study the problem of measure. I do not know if we can define,
or even if there exist sets other than measurable sets. If they exist, what is stated in the text is not
sufficient to affirm either that the problem of measure is possible or it is impossible for these sets.
On the subject of possibility and determination of the problem of the measure for all sets, see the
work of M. Banach, cited in Sect. 8.1. As for the question of the existence of the non-measurable
sets, there has not been much progress since the first edition of this book. However, the existence of
such sets is certain for those who accept a certain mode of reasoning based on what we call the axiom
of Zermelo. By this reasoning, we indeed arrive, at the conclusion: there exist non-measurable sets;
but this affirmation must not be considered contradicted if we came to show that no man will ever be
able to name a non-measurable set!
On these questions we could consult the second edition of Leçons sur la théorie des fonctions of
M. Emile Borel.
17 The geometric definition of the measure not only allows us to compare two equal sets, but also two
similar sets. The ratio of the measure of two similar sets with a ratio .k is .|k|, a condition that could
have been imposed a priori; it corresponds to condition . S1 for the the problem of integration
b b
k
. f (x) d x = k f (kx) d x (S1 )
a
a k
The conditions. S (Sect. 8.1) and. S1 constitute what we can call the condition of similitude, they reveal
how should an integral transform under a transformation
of the common subsets of .αi and .βi is equal to .εi . The positive numbers, .εi , are chosen in
such a way that the series . εi is convergent and has the sum .ε.
Let .α2 , β2 be the subsets of .α2 , β2 which are contained in the intervals .β1 ; let .α3 , β3 be
the subsets of .α3 , β3 which are contained in .β2 and so on. . E i is enclosed in .αi . Therefore,
. E is enclosed in .α1 + α + · · · , and hence, its exterior measure is at most equal to the sum
2
.m(α1 ) + m(α ) + m(α ) + · · · = s; let us evaluate this sum. We obviously have
2 3
hence, by addition,
and this is sufficient for showing that the series .s is convergent; moreover we have
therefore, .s is included between . m(E i ) and . m(E i ) + ε, This gives
.m e (E) ≤ m(E i ).
The complement of . E, C AB (E), can be enclosed in .βi . Now, .βi has, in common with
.α1 + α + α + · · · , the intervals .α
2 3 i+1 + αi+2 + · · · , along with a subset of the intervals
common to .α1 , β1 , a subset of those common to .α2 , β2 , . . . and a subset of those common
to .αi , βi , βi . Therefore, its measure is at most equal to
[m(AB) − s] + ε1 + ε2 + · · · + εi + m(αi+1
. ) + m(αi+2 ) + ··· ,
and, as a result,
m e [C AB (E)] ≤ m(AB) −
. m(E i ),
that is,
m(AB) − m e [C AB (E)] ≥
. m(E i ),
or
.m i (E) ≥ m(E i ).
the lower and upper limits found, respectively, for .m e (E) and .m i (E) show that these two
quantities are equal. Therefore, the set . E is measurable and of measure . m(E i ), and hence
the condition .2 is indeed verified.
140 8 The Definite Integral of Summable Functions
The set of the measurable sets contains the set of . J measurable sets, but it is much wider,
as we will see. Indeed, we can perform the following two operations on the measurable sets
without leaving the set of measurable sets:
To show this, let us first note that the second operation does not differ essentially
from the first one, because if . E is a subset common to . E 1 , E 2 , . . . , C(E) is the sum of
.C(E 1 ), C(E 2 ), . . .. Therefore, it is enough to concentrate on the first operation; Let
. E = E1 + E2 + E3 + · · ·
. E = E 1 + E 2 + E 3 + · · · ,
the terms of the sum being pairwise disjoint. Now, it is easy to see that . E 2 is measurable.
Indeed, let us enclose . E 1 in the intervals .α1 , C(E 1 ) in the intervals .β1 , E 2 in .α2 , C(E 2 ) in
.β2 and let .ε1 and .ε2 be the lengths of the subsets common to .α1 and .β1 on one hand and,
to .α2 and .β2 on the other. If .α2 and .β2 are the subsets of .α2 and .β2 common to .β1 , E 2
can be enclosed in .α2 and .C(E 2 ) in .α1 + β2 and the subsets common to these two systems
of intervals have a measure at most equal to .ε1 + ε2 , therefore . E 2 is measurable. Hence, it
follows that
. E1 + E2 = E1 + E2
is measurable, and therefore the set . E 3 , which is a subset of . E 3 and does not belong to
the measurable set . E 1 + E 2 , is also measurable and so on. All the . E i are measurable,
and consequently, . E 18 is measurable. An interval being a measurable set, by applying the
operations . I and . I I a finite or a countably infinite number of times starting from intervals,
we obtain measurable sets; these are the ones which M. Borel called measurable sets; let us
call them . B measurable sets. These are the most important of the measurable sets. For an
arbitrary set, while can only assert the existence of the two numbers .m e , m i , without being
able to specify the sequence of operations needed to calculate them, it is easy to obtain the
measure of a . B measurable set by following step-by-step construction of this set. We will
make use of the property .2 whenever we use operation . I ; when we use the operation . I I ,
we will employ a theorem with an immediate proof:
The measure of the subset common to the sets . E 1 , E 2 , . . . is the limit of .m(E i ) if each
set . E i contains all those with a higher index.19
The closed sets are . B measurable, since they are complements of the sets formed of
points interior to a finite or a countably infinite number of intervals. Let . E be such a set,
the measure of its complement is obviously the interior extent of this complement, therefore
the measure of a close set is its exterior extent. Hence, it follows that the property which we
have used: a closed set of measure zero is an integrable group (Sect. 3.2).
As an application of these theoretical considerations, let us calculate the measure of a set
. E of points of .(0, 1) such that the sequence of their odd decimal digits is periodic (Sect. 7.2).
Let
a1 a2 a3
.x = + 2 + 3 + ···
10 10 10
such a number, written as
a2 a4 a6
. x =y+ 2
+ 4 + 6 + · · · = y + z;
10 10 10
where . y is rational. The set of numbers . y is countable. To each rational number there
corresponds a set of numbers .x that have same measure as the set of the numbers .z whose
digits of odd rank are all zero. To show that . E is measurable and of measure zero, it is,
therefore, sufficient to show that the set of numbers .z possesses this property.
1
Now, this set is obtained by removing from the interval .(0, 1) the interval . 10 , 1 . Then
1
p p+1
from . 0, 10 we remove the intervals . 102 + 103 , 102 , where . p is an integer smaller
1
than .10. After that, from each remaining interval . 10p2 , 10p2 + 1013 , we remove the intervals
p
.
102
+ 10q 4 + 1015 , 10p2 + q+1
104
9
, and so on. In each operation, we remove. 10 of the remaining
intervals. The set of .z is therefore . B measurable and of measure zero.
19 M. Borel has indicated (note .1, p. .48 of Leçons sur la théorie des fonctions) the principles which
guided us in the theory of the measure. When we seek to construct the sets on which the application
of these principles permits to attach a measure, we are immediately led to the class of the . B mea-
surable sets. This class has, hitherto, sufficed in practice. The primary advantage of reasoning about
measurable sets and not just on . B measurable sets, is not that we consider a larger class of sets, but
that we start from the fundamental property of the sets to which we can attach a measure, rather than
from a construction process that is perpetually evolving. As a result, as we have seen above, it was
possible to show quite simply the compatibility of the conditions of problem of the measure of all
measurable sets, whereas this proof has not been achieved when considering only the . B measurable
sets.
Among the . B measurable sets, it seems that M. Borel initially considered only those obtained
by performing only a finite number of operations . I and . I I ; following the links I made between the
. B measurable sets and the functions considered by M. Baire, the use of the definition in the text
became more widespread. On these questions and on the existence of the non . B measurable sets,
see a Memoire that I published in the Journal de Mathematique, in .1905, and the book of M. de la
Vallée Poussin, cited in Sect. 7.1. For more recent research, see the collection of Fundamenta and a
Memoire of M. M. Lusin and Sierpinski (Journ. de Math., .1923.)
142 8 The Definite Integral of Summable Functions
is measurable; but when we give to .ε a sequence of decreasing values tending towards zero
ε1 , ε2 , . . . , we have
.
necessary and sufficient that the set . E[α < f (x)] is measurable for any .α; or equivalently,
that the sets . E[α ≤ f (x)] are measurable. We would also see that . f (x) is measurable if, and
only if, for each value of .α, there exists a set, which we will denote by . E[α ≤ f (x), f (x) <
α], contained in . E[α ≤ f (x)] and containing . E[α < f (x)] which is measurable.
The sum of two measurable functions is a measurable function. Let . f 1 and . f 2 be two
measurable functions. For any number .ε, there corresponds a division of their interval of
variation, whether it is finite or infinite, with the help of the numbers .li such that .li+1 − li is
at most equal to .ε. Now, let us consider the sets . E i j of values of .x, such that we have both
E i j is measurable as a subset common to . E[li < f 1 (x)] and to . E[l j < f 2 (x)].
.
The sum . E(ε) of sets . E i j is measurable, since each of them is measurable; and if we give
to .ε the values .εi tending towards zero, we have
by taking the sum of sets . E n ; where . E n is the subset common to the sets . E[ f n (x) <
α], E[ f n+1 (x) < α], . . . and all these sets are measurable, if the functions . f n are measur-
able.
Let us apply these results; the two functions. f = const., and. f = x are obviously measur-
able, therefore any polynomial is measurable; Any function which is the limit of polynomials
is also measurable. Therefore, according to a theorem of Weierstrass, any continuous func-
tion is measurable. The discontinuous functions which are limits of continuous functions,
that M. Baire called functions of first class, are measurable. The functions which are not of
first class and which are limits of functions of first class (M. Baire called them functions of
second class) are measurable functions.
Let us again note that the functions thus formed step by step are . B measurable, meaning
that the corresponding sets are . B measurable; these are the functions we will only encounter.
We can often show that a function is measurable by using the following property: if,
when disregarding a set of values of .x of measure zero, the function . f (x) is continuous, it
is measurable. This is because the limit points of a set . E[α ≤ f (x)] which do not make a
subset of this set necessarily belong to the neglected set of measure zero, therefore they form
a set of measure zero. The set . E[α ≤ f (x)], being closed except for a set of measure zero,
is measurable. In particular, this shows that any Riemann integrable function is measurable;
we also see that the function .χ(x) of Dirichlet, which is not integrable, is measurable.
Now, let us define the integral of a bounded measurable function by assuming the interval
of integration .(a, b) is positive. We know that, if it is a function .ψ, this integral is
.m[E(ψ = 1)],
and, if we are dealing with an arbitrary function . f (x), the integral must be the common
limit of the integrals of .ϕ and . (Sect. 8.1) when the maximum of .li+1 − li tends towards
zero. According to the conditions of the integration problem, these integrals are
i=n−1
.σ= li m{E[ f (x) = li ]} + m{E[li < f (x) < li+1 ]},
i=0
i=n
= li m{E[li−1 < f (x) < li ]} + m{E[ f (x) = li ]}.
i=1
We already know that these two numbers differ by less than .ε(b − a) because . − ϕ is
smaller than .ε. If we let .ε tend towards zero, by interspersing the new numbers between the
.li , then .σ increases, . decreases, . − σ tends towards zero; therefore .σ and . have the
same limit.
144 8 The Definite Integral of Summable Functions
Let .σ1 , 1 ; σ2 , 2 ; . . . be the sum obtained by this method. Let .σ1 , 1 ; .σ2 , 2 ; . . . be
the sums obtained by letting .ε tend towards zero in another manner20 ; let .σ1 , 1 be the
sums obtained by combining the numbers .li yielding .σ1 , 1 and .σ1 , 1 ; let .σ2 , 2 be the
ones obtained by combining the .li yielding .σ2 , 2 , σ1 , 1 ; σ2 , 2 ; and so on. We obviously
have
σi ≤ σi ≤ i ≤ i ,
.
the second of these inequalities shows that .σi and .i have the same limit as .σi and .i
because, we know that .σi and .i have a limit and .i − σi tends towards zero. The first one
shows that this limit is also that of .σi and .i .
The value of the limit is, that is, the integral, is therefore independent of how the maximum
of .li+1 − li tends towards zero.
Let us complete this definition by setting
b a
. f (x) d x = − f (x) d x.
a b
It remains to see if the integral satisfies the conditions of the integration21 problem.
First, let us get rid of the conditions .1, 2, 4, 5. There is nothing to say about the latter.
The condition .4 follows from the fact that .σ, for example, is positive when . f is positive or
zero.
The condition .1 follows, for the case of a positive interval, from the fact that the sums
b b+h
.σ formed for the two integrals .
a f (x) d x, . a+h f (x − h) d x using the numbers .li , are
identical. And hence we pass on to the case of negative intervals.
To verify the condition .2, it is obviously sufficient to examine the case .a < c < b; then
if we use the same .li for calculating the approximate values .σac , σcb , σab of the integrals
c b b
. f (x) d x, f (x) d x, f (x) d x,
a c a
obviously, we have
σab = σac + σcb ;
.
because we have the similar equalities between the corresponding measures of the sets which
appear in the three sums .σ.
Therefore, there remain the conditions .3 and .6 that need to be verified for a positive
interval .(a, b). To do this, let us first prove that, in such an interval, we have
20 The .l that yield .σ and . do not necessarily contain those that yielded .σ
i p p p−1 and . p−1 , while
the .li that yield .σ p and . p include the .li related to .σ p−1 and . p−1 .
21 In the case where non-measurable functions exist, it is necessary to add that we confine ourselves
to the consideration of only measurable functions.
8.4 Constructive Definition of the Integral 145
b b b
. f dx ≤ g dx ≤ f d x + η(b − a),
a a a
when we have
. f ≤ g ≤ f + η.
Let us use the same numbers chosen appropriately, .l, L, li to calculate the approximate
values .σ( f ) and .σ(g) of two integrals. Let us denote by . Fi and .G i the two sets
we have
i=n−1
.σ( f ) = li m(Fi ) =ln−1 m(F0 + F1 + · · · + Fn−1 )
i=0
i=n−2
− (li+1 − li )m(F0 + F1 + · · · + Fi ),
i=0
i=n−1
σ(g) =
. li m(G i ) =ln−1 m(G 0 + G 1 + · · · + G n−1 )
i=0
i=n−2
− (li+1 − li )m(G 0 + G 1 · · · + G i ).
i=0
Since we have
g ≤ f + η,
.
therefore, we have
b b
. g dx ≤ ( f + η) d x;
a a
to calculate approximately the last integral, let us use the numbers .l + η, L + η, li + η; it is
clear that we find
146 8 The Definite Integral of Summable Functions
i=n−1
.σ( f + η) = (li + η)m(Fi )
i=0
i=n−1
i=n−1
= li m(Fi ) + η m(Fi ) = σ( f ) + (b − a)η.
i=0 i=0
Hence
b b b
. g dx ≤ ( f + η) d x = f d x + η(b − a).
a a a
We can again say that, if two functions differ by less than .ζ the integrals differ by less
than .ζ(b − a). Because, saying that . f and .g differ by less than .ζ, means that .g is included
b
between . f − ζ and . f + ζ; therefore . a g d x is included between
b b
. f d x − ζ(b − a) and f d x + ζ(b − a).
a a
This being said, let . f and .ϕ be two measurable and bounded functions in the positive
interval .(a, b); we have learned (Sect. 8.1) to associate with them the functions . f 1 and .ϕ1
taking only the finite number of values and differing, respectively, from . f and .ϕ by less
than .ε.
Then . f 1 + ϕ1 differs from . f + ϕ by less than .2ε. Therefore, we have
b b
. f dx − f 1 d x < ε(b − a),
a a
b b
ϕ dx − ϕ1 d x < ε(b − a),
a a
b b
( f + ϕ) d x − ( f 1 + ϕ1 ) d x < 2ε(b − a).
a a
Now, let us suppose that . f 1 takes only the values .a1 , a2 , a3 , . . . at the respective points of
the sets . E 1 , E 2 , . . .; and that .ϕ1 takes only the values .b1 , b2 , . . . at the points of .e1 , e2 , . . ..
And let . E i j be the set of points common to . E i and .e j , we have
8.4 Constructive Definition of the Integral 147
b
. ( f 1 + ϕ1 ) d x = (ai + b j )m(E i j )
a i, j
⎡ ⎤
= ai ⎣ m(E i j )⎦ + bj m(E i j )
i j j i
= ai m(E i ) + b j m(e j )
i j
b b
= f1 d x + ϕ1 d x.
a a
The condition .3 is, therefore, fulfilled indeed. The condition .6 is also satisfied, as shown by
the following property:
If the measurable . f n (x), bounded for all .n and .x, have a limit . f (x), then the integral of
. f n (x) tends towards integral of . f (x).
If we always have .| f n (x)| < M and if . f − f n is less than .ε in . E n , where . f − f n is less than
the function equal to .ε in . E n and . M in .C(E n ), then the integral is at most equal in modulus
to
.εm(E n ) + Mm[C(E n )].
But .ε is arbitrary, and .m[C(E n )] tends towards zero with . n1 , since there is no point common
to all the . E n , therefore b
. ( f − fn ) d x
a
22 M. Osgood, in a Memoire of American Journal .1897 : On the non uniform convergence, has shown
a particular case of this theorem in which . f and . f n are continuous. The method of M. Osgood is
totally different from the one presented in the text.
148 8 The Definite Integral of Summable Functions
+∞
.σ= li m{E[li ≤ f (x) < li+1 ]},
−∞
+∞
= li m{E[li−1 < f (x) ≤ li ]}.
−∞
By revisiting the previous arguments, we immediately see that if one of them is convergent,
and consequently absolutely convergent since all the terms with sufficiently small indices
are negative or zero and all those with sufficiently large indices are positive or zero, then the
other is also convergent. Under these conditions, both.σ and. converge to a well determined
limit, when the maximum of .li+1 − li tends towards zero in any manner. This limit is, by
definition, the integral of . f (x) in the positive interval of integration. Hence, we can extend
it to a negative interval as done previously.
We call summable functions those functions to which the completed constructive defini-
tion of the integral applies.23 Any bounded measurable function is summable.24 If . f is a
summable function, .| f | is also a summable function and, if the interval of integration .(a, b)
is positive, we have
b b
. f dx ≤ | f | d x,
a a
because the approximate values of the two members are .|σ| and the sum of the absolute
values of the terms of .σ. If . f is summable and if . f M,N denotes the function equal to . f when
. f is included between .−M and .+N and zero otherwise, then as . M and . N tend towards
.+∞, we have,
b b
. f d x = lim f M,N d x,
a a
because the approximate values of the two members are .σ and the limit of the contribution
to .σ of terms with indices ranging between .−m and .n, when .m and .n increase indefinitely.25
We would have the same equality if we has subjected . f M,N to be respectively equal to .−M
and .+N , instead of being zero, when . f is respectively less than .−M or greater than .+N .
23 I depart here from the language used in my thesis, where I called functions that I now refer to
as measurable functions the summable functions. With the conventions of the text, which are now
adopted by all, the word summable plays in the theory of integration, the same role as the word
integrable in Riemannian integration.
When a function is measurable without being summable, it does not have integral. However, when
it concerns a function that is always positive or bounded from below (or always negative function
or bounded from above) we may still say that it has an infinite integral, for the reasons that become
clear immediately.
24 The bounded summable functions considered here, are the same which have been discussed at the
end of the previous chapter, this will become apparent in Chap. 9.
25 The converse is exact, that is to say that if . b f
a M,N d x tends towards a definite limit when . M and
. N tend towards .+∞ independently and in an arbitrary manner, . f (x) is summable and its integral
is the considered limit.
8.4 Constructive Definition of the Integral 149
We do not know any bounded non-summable function, on the contrary, it is easy to cite
unbounded non-summable functions. The function, zero for .x = 0 and equal to
1 1 2 1
2
. x sin
2
= 2xsin 2 − cos 2
x x x x
is an example; however, this function can be integrated by the methods of Cauchy and
Dirichlet developed in Chap. 1. We could, in some cases, apply these methods to non-
summable functions to define their integrals. We will return to this generalisation, but for
the time being let us limit ourselves to the summable functions.
But we are going to give this concept a new extension: let us suppose that a function. f (x) is
given, or is considered only at the points of a set. E; we can still form sets. E[li ≤ f (x) < li+1 ]
for any .li and .li+1 , but now these sets will all be contained in . E. If these sets are measurable,
regardless of the choice of .li and .li+1 , we will say that . f is measurable in . E. Let us
note that this implies that . E itself is measurable because . E is the sum of different sets
. E[li ≤ f (x) < li+1 ] associated with a choice of numbers .li .
If. f is measurable in. E, we can therefore form.σ and.. If one of these sums is convergent,
in which case both must be, . f is said to be summable over . E, and the integral of . f extended
to . E, E f d x, is the common limit to which .σ and . converge, when we vary the choices
of .li such that the maximum of .li+1 − li tends towards zero.
In short, nothing changes in the definition and we could have simply stated that
b
. f dx = ( f ) d x;
E a
from the properties of integrals in the intervals, the properties of integrals in the sets.
We have deviated from the problem of integration as we initially posed it in the beginning
of the chapter; how should we modify it to provide a descriptive definition of the integrals
of summable functions in the measurable sets? To answer this question, let us first note that
we still have a proposition analogous to the one in problem number .6 of the integration
problem. To be more precise, if, as the index .n increases, the functions . f n (x), which are
summable over a measurable set . E, monotonically increase to the function . f (x), which is
also summable over . E, then the integral of . f n (x) over . E monotonically increases to the
integral of . f (x).
Indeed, we can assume that the . f n are non-negative otherwise we would reason on the
functions . f n − f 1 . Since the . f n are positive or zero, the same holds for . f . Let us assume
. f = g + h, f n = gn + h n ,
150 8 The Definite Integral of Summable Functions
g being equal to . f when . f is less than an arbitrarily chosen positive number . N , and equal
.
to be truthful, this is only entirely clear when it concerns integrals extended to an interval,
but by converting . f to .( f ), we can always assume that it is so.
We had said, a moment ago, that if we let . N increase indefinitely, then . g d x converges
to . f d x, therefore . h d x converges to zero. Let us take . N sufficiently large, so that . h d x
is less than .ε; the same will apply to . h n d x, a fortiori. However, according to the condition
.6, for the bounded measurable functions .g and .gn , . gn d x converges to . g d x. Therefore,
we have
. lim f n d x = lim gn d x + h n d x
≤ lim gn d x + ε = g d x + ε ≤ f d x + ε,
. f = f1 + f2 + · · · ,
therefore
. f dx = f dx + f dx + · · ·
E E1 E2
26 This is the case of integration known as monotone sequences or series. A series is called monotone
if the sequence of sums .sn of this series is monotone, that is, it is either non-decreasing or non-
increasing.
8.4 Constructive Definition of the Integral 151
If . f is sometimes negative, we will arrive at the same result by applying the above formula
to .| f | + f , then to .| f | − f and by adding.
By combining this result with the previous one, we can say: the integral . E f (x) d x,
extended to a measurable set . E, of a summable function . f (x) in . E, is a number satisfying
the following properties:
. f (x) d x = f (x) d x + f (x) d x + · · · ;
E E1 E2
3.
. [ f (x) + ϕ(x)] d x = f (x) d x + ϕ(x) d x,
E E E
.ϕ(x) is assumed to be, like . f (x), summable over . E;
5. 1
. 1 × dx = 1
0
The reader can easily verify that these properties are characteristic of the integral; therefore,
we have here a descriptive definition of the integral of a summable function in a measurable
set. The new statement of the problem of integration now contains only five conditions, but
that does not reveal any significant difference between the new problem and the older one.
In reality, we could have combined the old conditions .3 and .6 into a single statement
related to a case of integration of sums or series. Furthermore, let us note that we have already
performed (Sect. 8.2), in the context of measure of sets, the transformation of the condition
.6, which pertained to a series of functions, into our new condition .2, which pertains to a
series of sets.
It is especially important for the effective calculation of the integrals of functions given
by a series expansion to know case of term-by-term integration. M. Vitali has written on this
152 8 The Definite Integral of Summable Functions
subject a very important Memoire, that I can only point out here27 ; I would confine myself
to providing a little more extensive case of integration than the previous ones.
A convergent series of summable functions . f n is term-by-term integrable, when all its
remainders .rn (x) are, in absolute value, less than a definite summable function .ϕ(x)
shows that . f (x) is summable. This being true, we partition the interval or the set . E in which
we integrate, into three measurable, pairwise disjoint sets: the first one, say . A, of these sets
is formed from points in which .ϕ(x) exceeds a number . N , the second one, say . B p , is formed
of points not belonging to . A and in which the remainder .r p is less than .ε in modulus and
the remaining points form the third set, say .C p . We have
. f (x) d x = f p (x) d x + rp dx + rp dx + r p d x;
E E A Bp Cp
rp dx ≤ |r p | d x ≤ ϕ d x,
A A A
Let us assume . N to be chosen in a way that . A ϕ d x is less than .ε:
. rp dx ≤ |r p | d x ≤ ε d x = εm(B p ) ≤ εm(E);
Bp Bp Bp
rp dx ≤ |r p | d x ≤ N d x = N m(C p ).
Cp Cp Cp
Now, when . p increases indefinitely, .m(C p ) tends to zero, because the set .C p is contained
in all those of lower indices, and there are no common points in all the .C p sets, since the
series would diverge at such a point. Therefore, for sufficiently large . p, we have
. f (x) d x − f p (x) d x < ε + εm(E) + ε1 .
E E
need to assume anything about the limit . f (x). To give this proposition its full scope, let us
define what is meant by a measurable function which is not always finite. It is a function
which, at every point in the interval or set . E under consideration, takes a definite value in
magnitude and sign, but not always a finite one. For such a function . f , there is generally a set
. E( f = +∞) and a set . E( f = −∞). By stating that . f is measurable, we express that these
two sets are measurable and . f is measurable at the set of points where it is finite. We can
also state that the set . E[α ≤ f (x) < β], or the set . E[α < f (x)], or the set . E[ f (x) ≤ β],
is measurable regardless of whether .α and .β are finite or infinite.
The statement made is related to the increasing sequence of measurable functions, such
a sequence necessarily has a measurable limit, but it is not necessarily finite everywhere.
Let . f (x) be the limit of a sequence of increasing, finite and summable functions . f n (x).
If the sequence of integrals of the functions . f n (x) converges, then . f is infinite only at
points of a set of measure zero, . f is summable over the set of points where it is finite, and
the integral of . f is the limit of the integrals of . f n .
If the sequence of integrals of . f n tends to infinity, then . f is infinite at points of set of
non-zero measure, or is not summable over the set of points where it is finite.
. f never takes the value .−∞ anywhere. Furthermore, we can reason solely on the set of
points where . f 1 , and therefore . f are positive. If . E( f = +∞) is of non-zero measure .λ,
then . f p will be greater than some number . N at the points of a set of measure . λ2 ; at least for
a . p large enough, the integral of . f p will be greater than . λ2 N . Since . N can be an arbitrary
value, the sequence of . f p d x cannot converge.
Therefore, let us assume that . E( f = +∞) is of measure zero, and let us remove this set
from the set of integration, which does not modify the integral of . f n . Now we are dealing
with increasing sequences of functions which are always finite and have a limit that is also
finite. Therefore, if . f is summable, the series of integrals of . f n converges to the integral
of . f (Sect. 8.4). If . f is not summable, it means that the integral of function .g, which is
equal to . f when . f is less than . N and zero elsewhere, increases indefinitely as . N increases
indefinitely. And since the limit of. f n d x exceeds. g d x, the sequence of integrals. f n d x
is divergent. This justifies the statement.
We have just defined the integral, both through its properties and through a construction,
and we have obtained methods for calculating the integral of functions given by series
expansions. At this point, it would be beneficial to take a step back and summarise what we
have achieved.
For the construction of the integral of a bounded measurable function defined on an
interval, we have deduced the condition of our original problem of integration:
154 8 The Definite Integral of Summable Functions
a. The value of the integral for functions .ϕ taking only two values, zero and a constant;
b. The case of term-by-term integration of monotone series or the series which becomes
monotone when we remove the first terms;
c. We have proved that any measurable function is the sum of a term-by-term integrable
series of functions .ϕ according to .b.
It is clear that this construction of integral could be modified in many ways; it would suffice to:
.a. Begin with knowledge of the integral of a sufficiently broad class of particular functions;
.b. and to use a term-by-term integration characteristic of series, that we laid down a priori,
and sufficiently general to assert that: .c. any measurable function is the sum of a series of
the considered nature of functions belonging to the envisaged class. I will29 examine very
rapidly a definition of M. W.-H. Young.
Sticking to the previous definitions,30 these functions would not encompass all measurable
functions, but they would include all . B measurable functions, that is, all the measurable
functions practically useful to consider. Furthermore, M. Young’s method provide, in the
29 We can take this opportunity to mention remarks or the work of M. M. Fubini, F. Riesz, Weyl,
Egoroff, Lusin, Borel. See, for example, the work I published in Annales sc de l’École Normale in
.1918.
30 M. Young, by a new extension, also covers all measurable functions See, for example, Proc. of the
Lond. Math. Soc, .1910.
The letters.l and.u which appear in the notation of M. Young, are the initials of the lower and upper;
the functions . fl and . f u are the lower and upper semi continuous functions of M. Baire (Sect. 3.1).
We can also use this method of definition for the unbounded functions and for the functions defined
on the sets.
Finally, one can also, either with this definition or with the others, deal with the case where
the interval or domain of definition of the function is not entirely at a finite distance but extends
indefinitely.
8.5 Other Forms of the Definition of Integral 155
wise disjoint sets . E 1 , E 2 , . . .. Let .δi be the measure of . E i , let .li and . L i be the lower and
upper limits of . f (x) in . E i , let us form the sums or series
.S = li δi , S = L i δi ;
and, vary the choice of . E i , let us determine the upper bound .m of . S and the lower bound . M
b
of . S. These two bounds are equal to each other and to . a f (x) d x.
Indeed, let us calculate the contribution of the points of
in . S and . S. The points of .en are distributed in some of the . E i ; they form the set .eni1 contained
in . E i1 , the set .eni2 contained in . E i2 , etc. For all the values of .i 1 , i 2 , . . . of .i the number .li is
at most equal to .(n + 1)ε, therefore the contribution of .enik is at most .(n + 1)ε × mes(enik )
and that of .en is at most .(n + 1)ε × mes(en ). Therefore, we have
.S ≤ (n + 1)ε × mes(en ),
But it is sufficient to take the . E n identical to .en for the difference . S − S to be at most equal
to .ε(b − a). Therefore, . M = m.
The similarity between the definitions of M. W.-H. Young and that of Riemann (as
explained on Sect. 3.2) is obvious. Note that our constructive definition of the integral is also
31 Proc. Lond. Math. Soc., .1905, and Ph. Trans. London, .1905.
156 8 The Definite Integral of Summable Functions
very similar to that of Riemann; the difference is that, while Riemann divided the intervals
of variation of .x in small partial intervals, it is the interval of variation of . f (x) that we have
subdivided.
This approach is necessary and its advantages are obvious. When we form the sum
.S = f (ξi )(xi+1 − xi ) for a continuous function . f (x), we group the values of .x that
yield values of . f (x) that are very close to each other. It is because these values are close
that we can replace them in . S by any one of the . f (ξi ). However, if . f (x) is discontinuous,
there is no longer any reason that the choices of intervals .(xi+1 , xi ) that become smaller and
smaller would lead to the grouping of values of . f (x) that are less and less different. And
this is why the method of Riemann succeeds only rarely and somehow by chance. Since
we want to group the values of . f (x) that are close, it is quite clear that we should, as we
have done in this chapter, subdivide the interval of variation of . f (x) and not the interval of
variation of .x.
We can still say, by adopting the language of the . X V I I century-the language of
indivisibles-that we are summing up the various indivisibles associated with the given func-
tion . f (x), i.e., the positive or negative ordinates of the points .[x, y = f (x)]. To do this, as
we have done in algebra when we perform the reduction of similar terms, as in arithmetic
when adding the numbers by summing the digits in the units place then the digits at tens
place, etc., we have grouped the indivisibles of the same magnitude or nearly the same
magnitude.
Now, we shall perform the summation of these indivisibles by grouping all those which
are positive and all those which are negative and thus, we will have a definition similar to
that in Chap. 3. For this, we will assume that the problem of the measure of sets formed
of the points in a plane has been solved. This problem is posed similarly to the case of the
straight line, with condition .3 becoming: the measure of the set of points whose coordinates
satisfy the inequalities
.0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
is .1.
We will easily prove that the measure of a square is its area in the elementary sense of the
word. Hence, we will deduce that the measure of an arbitrary set lies between its exterior
measure and interior measure, which are defined as in the case of the straight line, with
squares replacing the intervals.
To prove that the interior measure never exceeds the exterior measure, it is necessary to
show that a square .C can be covered with the help of a finite number of squares .ci only if
the sum of areas of .ci is at least equal to the area of .C, which can be done in an elementary
way.32 Then we must prove the theorem of M. Borel by replacing the word interval in its
statement with the word square or the word domain.
32 For this question and everything related to the measure of polygons, we can consult the Note . D of
the Géométrie élémentaire of M. Hadamard with interest.
8.5 Other Forms of the Definition of Integral 157
The proof can be given similarly to the case of the line. However, I would like to mention
here how we can use M. Peano’s curves and other similar curves (Sect. 4.1).
Let the domain . D be such that any point interior to . D or on the boundary of . D is interior
to one of the domains .. We can define, with the help of a parameter .t varying from .0 to .1,
a curve .C which fills the domain . D and which does not pass through any exterior points.33
Each domain . cuts arc on .C corresponding to certain intervals of variation for .t, let .δ be
these intervals. A domain . may also have points on its boundary common with .C, and
these points do not form intervals; we neglect these points and deal only with the intervals.
.(0, 1) is obviously covered with the .δ, and therefore, with a finite number of them, according
to the theorem of M. Borel for the case of line. Consequently, . D is covered with the . in a
finite number which correspond to these .δ.
Having established this property, the sequence of arguments and definitions continues
as in the case of line, with intervals always being replaced with squares. Just as in the case
of the line, we define the measurable sets, the . B measurable sets, and we show the same
properties regarding them.
It is necessary not to confuse the measure of the sets of points in the plane with that of the
sets of points on a line. We will distinguish between them when there is doubt by referring
to them as the surface measure .m s and linear measure .m l .34
Let us arrive at the definition of the integral.
To any function . f (x) let us associate two surface sets
by analogy with what has been done earlier (Chap. 3, Sect. 4.2), it is natural to call the
integral of the function . f , the quantity
. I = m s [E 1 ( f )] − m s [E 2 ( f )].
Let us study under what conditions this definition is applicable; we will show that it is when
the function . f is measurable, and only in this case. For this purpose, it will obviously be
33 To do this, we could establish a bijective and continuous correspondence between the points of a
square and those of domain . D, then choose the curve .C that corresponds to the Peano’s curve filling
the square. The existence of this correspondence is clear when the curve bounding the domain . D is
simple, like a polygon, for example. However, the general case requires delicate reasoning. We can
refer to, for example, the Thesis of M. Antoine (Journal de Math., .1921).
If we were to envisage the domains other than those bounded by a Jordan’s curve, the correspon-
dence may not exist anymore. Furthermore, for such domains, there may not always exist a curve,
similar to that of M. Peano, which exactly fills them. A slight modification of reasoning of the text
would be necessary in this case; but it is unnecessary to consider it here.
34 These definitions allow to define the measurable functions of two variables and the double inte-
grals related to these functions. I will not address these questions or others that are related, such as
integration by parts and integration under summation sign.
158 8 The Definite Integral of Summable Functions
sufficient to show it for the function .ϕ(x) equal to . f (x) when . f (x) is not negative, and zero
when . f (x) is negative; it is this function .ϕ(x) we are going to deal with.
When .α decreases, the linear set . E(ϕ ≥ α) does not loose any point. Hence, we deduce
that the lower and upper linear measures .m l,i [E(ϕ ≥ α)] and .m l,e [E(ϕ ≥ α)] are the non
increasing functions. Moreover, . E(ϕ ≥ α) is the set of points which belong to all the
. E(ϕ ≥ α − h); hence we deduce that .m l,i [E(ϕ ≥ α)] and .m l,e [E(ϕ ≥ α)] are the con-
tinuous functions of .α at left. This being said, let us suppose that we have
then it will still be the same in a certain interval .(α − h, α). Let us consider the subset . E
of . E 1 (ϕ) included between . y = α − h and . y = α. Let us enclose the points of . E in the
squares . A, and the points of .C(E) in the squares . B. We can suppose that . A and . B have their
edges parallel to .ox and .oy. They have in common the rectangles .C, whose sum of areas
is at least .m s,e (E) − m s,i (E) and differs as little as we want. The intersection of squares . A
with the straight line . y = K is composed of intervals .a which enclose . E[ϕ(x) ≥ K ], the
intersection of squares . B is composed of intervals .b which encloses .C{E[ϕ(x) ≥ K ]}, the
intersection of rectangles .C is formed by the subsets .c common to both .a and .b; therefore,
we have
.m l (c) ≥ m l,e {E[ϕ(x) ≥ K ]} − m l,i {E[ϕ(x) ≥ K ]};
.m l (c) is therefore greater than .ε when . K varies from .α − h to .α, and .m s,e (E) − m s,i (E) is
at least equal to .εh.E and consequently . E 1 (ϕ) is measurable only if .ϕ is measurable.
Let us suppose that .ϕ is bounded and measurable and let us partition the interval of
variation of.ϕ, using the numbers.li . Let. E be the subset of. E 1 (ϕ) included between. y = li−1
and. y = li , we will evaluate its measure. Let us enclose the points of. E(ϕ ≥ li ) in the intervals
.a and the points of .C[E(ϕ ≥ li )] in the intervals .b. Let .c be the intervals that are subset
of both .a and .b. Let us consider the set .A of points whose abscissa are the points of .a and
whose ordinates are included between .li−1 and .li ; Let .C be the set similarly related to .c. The
set .A − C being contained in . E, we have
Hence, we deduce
m s,i (E) ≥ (li − li−1 )m l [E(ϕ ≥ li )].
.
We have shown that both the quantities .σ and . converge to the same limit when the
maximum of .li+1 − li tends towards zero; therefore, . E 1 (ϕ) is measurable. The approximate
values of .σ and . found for the measure of . E 1 (ϕ) leads us to the definition of the integral
already given. Therefore, there is an identity between the current geometric definition and
the constructive definition studied35 earlier.
35 We can say that the reasoning of the text provides an expression of the surface measure based
on linear measure. Generalising suitably, these considerations yield the formula which allows for
replacing the calculation of a multiple integral by the calculations of successive simple integrals.
The Indefinite Integral of Summable Functions
9
.C being a constant; the definite integral of . f (x) in .(a, b) is the increment . F(b) − F(a) of
the indefinite integral in the interval .(a, b).
The indefinite integrals are the continuous functions. If. f (x) is a bounded function, this is
obvious by virtue of theorem of finite increment. Let us assume, then, . f (x) is summable but
unbounded; then, we can find . N sufficiently large such that the integrals of . f (x) over both
sets . E( f > N ) and . E( f < −N ) are both less than .ε in modulus. Let us set . f = f 1 + f 2 ,
where . f 1 is zero on both sets . E( f > N ) and . E( f < −N ) and . f 2 is zero on the set . E(−N ≤
f ≤ N ). Then the indefinite integral of . f 1 is a continuous function; and the integral of . f 2
in any interval being .2ε at most, around any point .x0 , therefore, we can find an interval in
which the increment of . f (x) is .3ε at most. This proves that . f (x) is continuous.
If . f (x) is summable then .| f (x)| is also summable and, in any interval, the indefinite
integral of . f (x) undergoes an increment at most equal to that of the indefinite integral of
.| f (x)| in modulus. Since this latter integral exists and is increasing, therefore it is of bounded
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 161
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_9
162 9 The Indefinite Integral of Summable Functions
proved.1 This leads, very naturally, to the study of the derivation of the indefinite integrals
and the search for the primitive functions. But first, we will give a new meaning to the words
“indefinite integral”.
Where does the name indefinite integral come from? It is clear that in the expression
“definite integral” and “indefinite integral”, indefinite does not have the meaning of infinite
but rather non definite, while definite has the meaning determined. These two expressions
β
should, therefore, be applicable to the same quantity . α f (x) d x; this integral will be called
definite when the interval of integration .(α, β) is itself definite, that is, determined and given
and it would be called indefinite when .(α, β) will be indefinite, meaning, non definite, non
determined, unknown, or variable. Going back to this primitive sense of denominations, we
will, therefore, say that the indefinite integral of . f (x) is the function .(α, β)
β
.(α, β) = f (x) d x = F(β) − F(α);
α
this will be a function of two variables or, even better, a function of the interval of integration
(α, β). The word “function” here implies, a “correspondence”, as in Sect. 3.1. But the task
.
is to associate a number to every interval .(α, β). The interval .(α, β) serves as the variable
or argument our function and the number is the value of the function. The indefinite integral
is a function of interval. Any value taken by this function is a definite integral.
The relation that links .(α, β) with . F(x) allows one to translate any property of . F(x)
into a property of .(α, β) and conversely. And this is why we ordinarily confine ourselves
to the consideration of . F(x), that we will call, when a doubt may arise, the indefinite integral
function of a single variable. In short, these are the properties of .(α, β) which we typically
study through the intermediary of . F(x). Our new language will be more directly suitable
to our purpose. However, as we have also considered the integral of . f (x) extended to a
measurable set . E, we should consider the indefinite integral of . f (x) as the set function
.(E) = f (x) d x;
E
it is implied that the argument . E of this function must be measurable and constructed using
points from an interval or a set, for points of which the values of. f (x) are known. To fix ideas,
we will always assume unless otherwise stated, that . f (x) is defined on an interval which
we will call .(a, b); which in reality is not a restriction (Sect. 8.4). Therefore, we will finally
consider: .a. the indefinite integral function of set, .(E); .b. the indefinite integral function
of interval, .(α, β) = (δ), where .δ denotes the interval .(α, β); .c. the indefinite integral
function of one variable, . F(x). In this chapter we will examine, whether the knowledge of
one of these functions implies the knowledge of other two and how the properties of these
functions correspond among themselves.
1 Only now we can make use of the maxima and minima obtained by neglecting sets of measure zero,
because if we modify the value of a function at points of such a set, we do not modify the integral of
this function.
9.1 The Three Indefinite Integrals.The Additive Functions of a Set 163
We will focus on two properties of .(E): the complete additivity and the absolute
continuity. We will see later that these properties characterise the functions of a set that are
indefinite integrals and consequently summarise and imply all other properties.
A function of a measurable set .(E) is called additive if . E 1 , E 2 , . . . are pairwise disjoint
sets, we have
.(E 1 + E 2 + · · · ) = (E 1 ) + (E 2 ) + · · ·
We can distinguish, with M.de la Vallée Poussin,2 the case of restrictive additivity, in which
the previous equality is assured only if the . E i are finite in number, and the case of complete
additivity, where the equality holds even if there are countably infinite. E i . Only the complete
additivity will be important for us.3 An indefinite integral is completely additive; this is the
property .2 of Sect. 8.4.
On its own, complete additivity implies many properties. First, let us note that an additive
set function which is unbounded, has some points at which it is not bounded. By this, it
should be understood that there exists a point .x such that, however small may be an interval
. I containing . x in its interior, and however large may be a number . N , it is possible to find a
set . E formed from points of . I , for which the function .(E), considered on this set, exceeds
. N in absolute value.
Indeed, if, to every point of .(a, b) we could attach an interval . I and a number . N for
which this is impossible, we could cover .(a, b) using a finite number, . p, of these intervals . I
and the corresponding numbers . N would have an upper bound .N. where . E is a set of points
of .(a, b) that could be considered the sum of sets . E 1 , E 2 , . . . , E p located in the considered
. p intervals, and we will have
|(E)| = |(E 1 + E 2 + · · · E p )|
.
2 On the functions of set, see the book of M. DE LA VALLÉE POUSSIN, Integrales de Lebesgue,
fonctions d’ensemble, classes de Baire.
3 The reader will easily verify that a function .(E), equal to the sum of the lengths of the intervals in
which the complement of. E is everywhere non-dense, possesses restrictive additivity but not complete
additivity.
Hitherto, the restrictive additivity has not been introduced in Analysis. However, there is a great
need to distinguish it from complete additivity. Let me explain it with an example. The condition .2
of problem of the measure imposes complete additivity on function .m(E). Let us suppose, on the
contrary, that we have modified this condition .2 in a way so as to require only restrictive additivity,
then the function would no longer be defined, except for the . J measurable sets, and this function
would have been the extent .e(E). Let us note that the function thus obtained, indeed possesses the
complete additivity in the domain of the . J measurable sets, since it does not differ from .m(E), but
the function.e(E) is not always defined for the sum. E 1 + E 2 + · · · , when it is defined for. E 1 , E 2 , . . .
164 9 The Indefinite Integral of Summable Functions
Let .x0 be a point at which .(E) is unbounded and let us consider two intervals .(x0 −
h, x0 ), (x0 , x0 + h). It is clear that .(E) is unbounded in one of these two. This being
said, to prove that any completely additive finite function is bounded, it would be sufficient,
therefore, to consider the case where the extremity .b of the considered interval .(a, b)4 is a
point where a function .(E) is unbounded, and to show that .(E) cannot be both finite and
completely additive at the same time. Let .a, a1 , a2 , . . . be a sequence of increasing values
tending towards .b. Let us denote by .δ1 , δ2 , . . . the sets of points defined respectively by
Let . N1 , N2 , . . . be the upper bounds of .|(E)|, respectively, for the sets formed of the points
of .δ1 , δ2 , . . ..
If . E is a set formed from points of .(ak , b) and if . E i is the subset common to . E and .δi
we have, when . is assumed to be completely additive,
∞
.|(E)| + |[ (E i )]| ≤ |(E i )| ≤ Ni ,
k
since .(E) is not bounded at point .b, the series of the last term is divergent for any .k.
Moreover, it may contain infinite terms.
Finally, let .ei be a set formed by points of .δi and for which .|(ei )| exceeds the smallest,
. Mi , of the two numbers . Ni −
1
i and .i; the series . Mi is divergent. If we partition .ei into
those which yield for .(ei ) the positive or zero values, and into those which yield for .(ei )
the negative values, if .i and .i denote, respectively, the indices of the first and of second, at
least one of the series . Mi and . Mi is divergent. Let us assume that the first of them is
divergent. Then we have
.( ei ) = (ei ) > Mi = +∞,
the function . therefore cannot be finite and completely additive. As we are discussing
only finite functions of sets, unless expressly stated otherwise, we can simply say: every
completely additive function is bounded.
A completely additive function5 is of bounded variation. By this we mean that, it is the
difference of the two completely additive functions that take only positive or zero values.
To show this a few definitions are necessary.
4 If it were a function.(E) defined only for measurable sets formed of points from a given measurable
set .E, we could extend the definition of .(E) to all measurable sets formed by points of an interval
.(a, b) containing .E, by agreeing that, by definition, for such a set . E whose subset common with .E is
.e, we have .(E) = (e); and if .e does not exist, .(E) = 0. Therefore, we can always suppose that
it is a function defined for all measurable sets of an interval .(a, b).
5 I will not explain it any longer, that we are dealing with a function taking definite and finite value
for each measurable set formed with the points of interval .(a, b) or a measurable set .E, a case that
reduces to the previous one.
9.1 The Three Indefinite Integrals.The Additive Functions of a Set 165
Let us consider the values taken by a completely additive function . for the sets .e
formed with the points of a set . E. These values are included between . M and .−M, where
. M is the upper bound of .|| in entire .(a, b). Let .−N (E) ≤ 0 ≤ +P(E) be the smallest
interval containing these values of . and zero. These two functions, . N (E) and . P(E), are
completely additive; let us verify this for . P(E). Let . E 1 , E 2 , . . . be pairwise disjoint sets, let
.e1 , e2 , . . . be formed from points of . E 1 , E 2 , . . ., we have
hence
. P(E 1 + E 2 + · · · ) ≤ P(E 1 ) + P(E 2 ) + · · · ;
but we can choose .ei in such a way that .(E i ) exceeds . P(E i ) − 2εi if . P(E i ) is greater than
.0 [if . P(E i ) = 0, we leave aside this value of .i], and then we have
or
. P(E 1 + E 2 + · · · ) > P(E 1 ) + P(E 2 ) + · · · − ε.
The complete additivity of . P follows from comparison of these two inequalities.
The function . P(E) is called the total positive variation of . in . E, . N (E) is its total
negative variation,
. V (E) = P(E) + N (E)
is its total variation. This total variation is also a completely additive function, because it
is clear that the sum or the difference of two completely additive functions is a completely
additive function. .V (E) is the upper bound of . |(E i )| for the division of . E into partial
sets . E i .
Let us prove that .(E) is of bounded variation by showing that we have
With the points of . E, we can form a set .e1 for which .(e1 ) exceeds . 21 P(E). With the points
of .e1 , we can form a set .e2 for which .(e2 ) is at most equal to .− 21 N (e1 ). We have
1
(e1 − e2 ) ≥ (e1 ),
. N (e1 − e2 ) < N (e1 ).
2
With the points of .e1 − e2 , we form a set .e3 , for which .(e3 ) is at most equal to .− 21 N (e1 −
e2 ), we have
1
(e1 − e2 − e3 ) ≥ (e1 − e2 ),
. N (e1 − e2 − e3 ) < N (e1 ).
22
By continuing in this way, we arrive at a set .e1 − e2 − e3 − · · · , or . E 1 . For this set, we have
166 9 The Indefinite Integral of Summable Functions
1 1
. N (E 1 ) = 0, P(E 1 ) = (E 1 ) ≥ P(E), P(E − E 1 ) ≤ P(E).
2 2
With the points of . E − E 1 , we can similarly form a set . E 2 such that
N (E 2 ) = 0, .
1
P(E 2 ) = (E 2 ) ≥ P(E − E 1 ),
2
1
P(E − E 1 − E 2 ) ≤ 2 P(E).
2
Then, with the points of . E − E 1 − E 2 , we will form . E 3 such that
N (E 3 ) = 0, .
1
P(E 3 ) = (E 3 ) ≥ P(E − E 1 − E 2 ),
2
1
P(E − E 1 − E 2 − E 3 ) ≤ 3 P(E),
2
and so on.
It is clear that, for . E 1 + E 2 + · · · = E p , we have
N (E p ) = N (E 1 ) + N (E 2 ) + · · · = 0,
.
1
P(E − E p ) ≤ P(E − E 1 − E 2 − · · · − E k ) ≤ k P(E).
2
Therefore
. P(E − E p ) = 0
and, since
. P(E) = P(E − E p ) + P(E p ),
we have
. P(E p ) = P(E);
moreover
.(E p ) = P(E p ).
In a similar way, we will form . E n such that we have
Let .e be the set of points common to . E p and . E n ; then, from the two relations,
it follows that both non-negative numbers . P(e) and . N (e) are zero; consequently .(e) is
zero. Therefore, if we remove .e from . E n we do not alter either . P(E n ) or . N (E n ) or .(E n ).
We can therefore suppose that the two sets . E p and . E n are disjoint. Similarly, we will
see that . P, N and . are zero for the set of points of . E not belonging, either to . E p , or to . E n ,
so that this set can be added to . E p or to . E n , without changing our relationships. Finally, we
see that we can assume . E to be divided in two disjoint sets . E p and . E n and such that we
have
(E p ) = P(E p ),
. N (E p ) = 0.
(E n ) = −N (E n ), P(E n ) = 0.
But
. E = E p + En,
therefore, we have
(E) = (E p ) + (E n ) = P(E) − N (E).
.
Therefore, we can express a function of bounded variation in the form of a difference of two
non-negative function in an infinite number of ways. Furthermore, the method we have just
mentioned is the most general one, that is, if we have
where . P1 (E) and . N1 (E) are two completely additive and non-negative functions, we have
where .λ(E) is a completely additive and non-negative function. Indeed, let .λ(E) be the
function defined by this double equality. It is absolutely continuous. Let us show that it is
non-negative. Let . E p be the set that we have attached to . E. We have, since . E p is contained
in . E,
. P1 (E) ≥ P1 (E ) = P(E ) + λ(E ), N1 (E p ) = N (E p ) + λ(E p ).
p p p
therefore
is not negative.
In other words, the variations . P(E) and . N (E) of .(E) are, among all the completely
additive functions, . P1 (E) and . N1 (E), which are non-negative and satisfy the identity
the ones which are the least. This property corresponds exactly to the one in Sect. 5.1 for
functions of bounded variation of one variable. However, additionally, we have incidentally
seen that . P(E) and .−N (E) are two limits, upper and lower of .(e), as .e varies in . E,
and these limits are actually attained, respectively, for .e = E p and for .e = E n . These are
the properties that do not have counterparts in functions of one variable.6 For the previous
properties to be entirely proved, it is nevertheless necessary to show that .(E) is necessarily
zero in certain sets, because if, for example, .(E) was constantly positive, .−N (E) would
be constantly zero and would not be the lower limit of .(E), and . E n would not exist. But
this is impossible,7 because the sets . E reduced to a point, which give to .(E) a non-zero
value, form at most a countable infinity.
In fact, there cannot be infinitely many points, each forming a set . E where .(E) exceeds
the positive number . K . For the sets formed by a countable infinite number of these points,
.(E) would be infinite. Then by allowing . K to run through a sequence of positive numbers
tending towards zero we see that the points for which .(E) has a positive value form a
countable set. The same conclusion is applicable to the points for which .(E) is negative.
Each point, constituting on its own a set in which . is not zero is called a point of
discontinuity of . E. Let us form the function .ψ(E) equal to the sum of the values taken by
. at those points of discontinuity which belong to . E. It is clear that .ψ(E) is completely
additive, and it has for positive and negative variations, the functions.π(E) and.ν(E), formed
in similar way with. P(E) and. N (E), respectively. Its total variation is the sum.π(E) + ν(E),
which is the function which can be deduced in the same way from .V (E) = P(E) + N (E).
.ψ(E) is called the function of jump of .(E).
The function .(E) − ψ(E) no longer has the points of discontinuity, nor do the func-
tions . P(E) − π(E), N (E) − ν(E), V (E) − π(E) − ν(E). Let us show that, for such
6 However, these properties correspond to the properties of certain functions of one variable since,
in a moment, we will define .(E) using a function . F(x) of one variable. Nonetheless, these are
the properties that we would have hardly thought of, had we not discussed the functions of set. It is
because of such properties, there is an interest in considering functions of set.
7 An exception is made for the case where the set of definition .E would be composed of a finite or
countably infinite number of points.
9.1 The Three Indefinite Integrals.The Additive Functions of a Set 169
which is
. V0 (E) = V (E) − π(E) − ν(E).
Let us choose in the interval .δ = (a, b), the values .ai increasing towards .x0 and the values
bi decreasing towards .x0 ; let .αi and .βi be the sets defined, respectively, by
.
We have
δ = x0 +
. αi + βi ,
since the sets of the right-hand side are pairwise disjoint, and .V0 (x0 ) is zero, therefore, we
have
. V0 (δ) = V0 (αi ) + V0 (βi ).
The sums of the right-hand side being convergent, if we take .k sufficiently large we have,
for .δk = (ak , bk ),
∞
∞
. V0 (δk ) = V0 (αi ) + V0 (βi ) < ε;
k+1 k+1
Thus a completely additive function, defined on an interval .(a, b), is continuous at any
point if, and only if, it takes a value zero for any set formed of single point.9 An indefinite
integral is, therefore, continuous.
Now, let us examine what are the properties of .(δ) and . F(x), that correspond to those
of .(E), that we have just considered.
When a function .(E) is given, a function of interval is thereby given .(δ) = (E).
This function is defined for any positive or null interval, meaning, an interval which reduces
to a point. Generally, it does not have the same value for an open interval.
α < x < β,
.
. α ≤ x ≤ β;
8 There is a need to show that since we set two definition: the one for the points of continuity, and
other for the points of discontinuity.
9 If such a function takes the values . A and . B, it takes all the values included between . A and . B.
170 9 The Indefinite Integral of Summable Functions
α < x ≤ β, α ≤ x < β.
.
However, there is no need to make this distinction if .(E) is continuous at every point, in
which case .(δ) is zero for any null interval.
To any additive property of ., there corresponds an additive property of ., which is
stated as: if an interval .δ is the sum of disjoint intervals .δ1 , δ2 , . . . , we have
If . has restrictive additivity, . also has restrictive additivity, that is, the .δi must be be finite
in number. The .δi could be countably infinite if . has complete additivity, . is then called
completely additive.
As for the phrase pairwise disjoint, it must be taken in the strict sense if . has points of
discontinuity; the constituent intervals can only be without common interior points if . is
continuous; for example, in the case of an indefinite integral.
. being assumed completely additive is of bounded variation; . is then of bounded
variation, that is, for the pairwise disjoint intervals .δ1 , δ2 , . . . , (same remark as above), the
sum . |(δi )| remains bounded; its upper bound, when .δi are taken from .δ, is, indeed, at
most the value .V (δ) that the function .V (E) takes when . E = δ. .(δ) presents itself as the
difference of the two non-negative functions of interval . P(δ) and . N (δ), deduced from . P(δ)
and . N (δ).
The points of discontinuity and continuity are defined as above; in short, the previous
definitions are applicable. However, we now only consider sets reduced to a closed or open
interval, positive or null. Consequently, a property that involves sets which cannot be reduced
to intervals does not have an equivalent; For example, the property: any function .(E) of
bounded variation attains its upper bound.
Now, let us pass on from a completely additive function of intervals to a function of .x.
We will denote by .[a ≤ x ≤ b] and similar notations the values taken by the function .
for the intervals defined by the inequalities written in the brackets. We can go from .(δ) to
. F(x) using either of the following formulae
in which .C denotes a constant. These two formulae are equivalent for all the points . X which
are points of continuity for .; for points of discontinuity, they yield different values for
. F(x). The definition of . F(X ), therefore, contains a certain arbitrariness. We will adopt the
first formula; the second choice would yield the results that can be subsequently deduced
immediately from the ones we will obtain.
9.1 The Three Indefinite Integrals.The Additive Functions of a Set 171
We have
we would have
And, as the right-hand side is at most equal to.V [a < x ≤ b], the function. F(x) is of bounded
variation.
In the formula of definition of . F, let . X decrease towards . X 0 , that is, let us give a sequence
of values . X , X , . . . to . X , decreasing towards . X 0 ; we have
= F(X 0 ) − (X 0 ),
the function . F(x) is left discontinuous at points where . is discontinuous, and only at these
points.
We thus recover, in particular, this result: an indefinite integral . F(x) is a continuous
function of bounded variation.
From the equalities
which follow from the preceding, it follows that the function . is defined by the function . F
only for intervals, null or non null, that do not have their origin at .a when . F is known only
in .(a, b). For . F to define . in entire .a ≤ x ≤ b, let us agree that the formula
. F(X ) = [a ≤ x ≤ X ] + C
by setting . F(a − 0) = F(a), . F(b + 0) = F(b), we would be led to consider that any of
the functions . F(X ) of bounded variation satisfying the previous relations are attached to
.. Two functions . F(X ) satisfying these conditions will differ, up to an additive constant,
only at some of their points of discontinuity. Conversely, if . F(X ) answers the question, any
function of bounded variation, equal to . F(X ) at all points where they are both continuous,
also answers it. We will often encounter this indetermination of . F(X ), which one should
immediately consider when arriving at conclusions that seem contradictory.
Let us examine the converse transition from a function of bounded variation . F(X ) to a
function of intervals defined by the formulae
and in those which follow from it, for open or half open sets when we want . to be additive.
We want to prove that the function ., thus obtained is completely additive.
Let us consider an interval . = (l ≤ x ≤ m) and let us divide it by a reducible set of
points that belong to . E, in the family of the open intervals
9.1 The Three Indefinite Integrals.The Additive Functions of a Set 173
contiguous to . E, including points .l and .m, .x1 , x2 , . . .. We would thus have the most general
division of an interval into disjoint subsets, except that we could combine .δi at one or two
of its extremities for constituting a semi closed or closed interval. The formula to be shown
is, therefore,
.() = (δi ) + (xi ),
that is
. F(m + 0) − F(l − 0) = [F(m i − 0) − F(li + 0)]
+ [F(xi + 0) − F(xi − 0)].
Now, this formula follows (Sect. 5.1) from the fact that . F is of bounded variation.
If we take the set . E suitably, the sum
. |(δi )| + |(xi )|
would yield a value, as close as we want, to the total variation of . in .. But this sum is
written
. |F(m i + 0) − F(li − 0)| + |F(xi + 0) − F(xi − 0)|,
quantity that comes as close as we want, to the total variation of the function . F1 (x), where
F1 (x) is the function deduced from . F by modifying the later at its points of discontinuity,
.
except.a and.b, if those are the points of discontinuity, in a way to obtain a function continuous
from the right, except, possibly, at .a.
Therefore, we have a relation between the total variation .V (δ) of .(δ) and the total
variation .ν(X ) of . F1 (X ) in .a ≤ x ≤ X ,
. V [a ≤ x ≤ X ] = ν(X ).
Between the total positive and negative variations of .(δ), say . P(δ) and . N (δ), and the total
positive and negative variations of . F1 (x), say . p(x) and .n(x), we have the relations
Hence
. P[a ≤ x ≤ X ] = p(X ), N [a ≤ x ≤ X ] = n(X ),
equations which further solidify the relationship between . F(X ) and .(δ).
We could easily see that the jump functions of.(δ), P(δ), N (δ), V (δ), which are defined
in the same way as those of .(E), P(E) N (E) V (E), are the interval functions, which
can be deduced from the jump functions of . F(X ) or . F1 (X ), as well as those of . p(x), .n(x),
and .ν(x).
174 9 The Indefinite Integral of Summable Functions
Having thus studied the transition from . F(x) to an interval function .(δ), let us ask whether
we can deduce from completely additive .(δ), a completely additive set function .(E).
It is clear that .(E) is defined for all closed or open intervals; from absolute additivity,
we can deduce the value of .(E) for all . B measurable sets, since these sets could be
obtained through the summation or differences from the intervals.10 However, for attaining
all . B measurable sets, it is necessary for us to bank on the second property of the indefinite
integral that we have named the absolute continuity.
A function .(E) is called absolutely continuous if, for any .ε positive, we can find a
number .η such that the following condition is satisfied
In reality, this property has only been used for additive functions, and it is only for such
functions that it deserves to be considered as defining a mode of continuity. Indeed, let .
be an additive function and . E 1 and . E 2 be two sets.
Let
. E 1 = E + e1 , E 2 = E + e2 ,
where . E is the subset common to . E 1 and . E 2 . The sets . E and .e1 on one hand, . E and .e2 on
the other, are disjoint.
Let us agree to say that two sets . E 1 and . E 2 are at a distance .η 11 if we have
.m(e1 ) ≤ η, m(e2 ) ≤ η.
We have
thus, to two sets . E 1 and . E 2 , not very distant, correspond values of functions not very
different. This is indeed a kind of continuity, and even uniform continuity.12 This mode of
continuity is first noted for indefinite integrals; we have, in fact, the following proposition:
10 However, any. B measurable set can be obtained in several ways through additions and subtractions.
Therefore, the definition of .(E) would only be complete for . B measurable sets if we could prove
that it is free from contradictions—and this must be done without using the condition of absolute
continuity which is about to be introduced.
11 M. Borel stated that . E and . E differ by two .η; a better expression in some respects.
1 2
12 A completely additive function of a measurable set, which is continuous in the manner described
in the text, that is, with respect to our notion of distance between two sets, for all measurable sets,
is necessarily uniformly continuous. This mode of uniform continuity is what we call the absolute
continuity.
9.2 The Absolutely Continuous Functions 175
The integral of a summable function . f (x), extended to a variable set . E, tends towards zero
with the measure of . E. Indeed, we know that we can choose . N in such a way that the integral
b b ε
.
a ϕ d x, . a ϕ N ,N d x differ by less than . 2 , where .ϕ is absolute value of . f and .ϕ N ,N is the
function that is deduced from .ϕ, as it has been indicated in (Sect. 8.4). Let, then, . E be a set
ε
of measure at most equal to . 2N . Let us divide . E into the set .e1 of those points where .ϕ N ,N
is zero and the set .e2 of those points where .ϕ N ,N is positive.
We have
f (x) d x = f (x) d x + f (x) d x
.
E e e
1 2
ε ε ε
≤ ϕ(x) d x + f (x) d x ≤ + N m(e2 ) ≤ + N = ε.
e1 e2 2 2 2N
If, conversely, we start from a function. F(X ) of bounded variation and absolute continuity,
we would deduce a function .(δ), having the two mentioned properties. Let us now look for
a set function .(E) that also has these two properties and reduces to .(δ) on the intervals.
It is clear that if it is a question of calculating .(E) for a set . E we could proceed thus: we
determine a set . Ai of intervals at a distance less than a positive number .ηi from . E; which is
easy, for example, by enclosing . E in the intervals. For . Ai , we know .(Ai ) as equal to the
sum of the values of . for the various intervals constituting . Ai . Then we let .ηi tend towards
zero and .(Ai ) tends towards the sought value .(E).
Therefore, this value .(E) will exist if, by replacing . Ai with another set of intervals,
. Bi , also at a distance less than .ηi , from . E, .|(Ai ) − (Bi )| tends towards zero with .ηi .
176 9 The Indefinite Integral of Summable Functions
However, it is indeed so, since . Ai and . Bi are obviously at a distance of .2ηi at most.13 Finally
.(E) itself is calculated using the sums of increments of . F(β) − F(α), for this reason we
will also say that .(E) is the increment of . F(X ) on the set . E.
Thus, the three families of functions, absolutely continuous and completely additive
set functions, interval functions having the same two properties, and functions of a single
variable that are absolutely continuous and of bounded variation, are entirely equivalent.
In particular, we see that the indefinite integral, a function of one variable, of a function
. f (x) completely determines the indefinite integral, a set function, of . f (x). And we have
learned to calculate . E f (x) d x, from integral of . f (x) in various intervals, by a method
similar to the one that allows us to calculate .m(E) from the measure of the intervals.
The previous property implies this very important consequence: two functions . f 1 (x) and
. f 2 (x), which have the same integral in every interval, are equal, except at most at points
of set of measure zero. Indeed, by hypothesis, two such functions have the same indefinite
integral . F(X ), and therefore, same indefinite integral .(E). Now, since . E[( f 1 − f 2 ) = 0]
is the limit, for .ε > 0 and tending towards zero, of . E[( f 1 − f 2 ) > ε] + . E[( f 2 − f 1 ) > ε],
for .ε sufficiently small, therefore, one of the two sets just been named, will be of non-zero
measure if . f 1 and . f 2 differ at a set of points of positive measure.14 And it is clear that, in
this set .e of non-zero measure, the integral of the function . f 1 − f 2 , either constantly greater
than .ε, or constantly less than .ε,15 would not be zero. In other words, . e f 1 d x and . e f 2 d x
would be different, which is contrary to the hypothesis.
Thus a function. f (x) is determined, except at the points of set of measure zero, by knowing
one of its indefinite integrals.
The indetermination that we encounter in this statement is indeed real; because, if we
arbitrarily modify . f (x) at points of an arbitrarily chosen set of measure zero, we do not alter
its indefinite integral. We will later explore how can we calculate . f (x) when we know one of
its indefinite integral. However, to give the results obtained the widest possible significance,
it would be useful to pursue the study of the set functions for a while.
13 This method of extending a function defined only on the family of interval sets to all measurable
sets is related to the proposition that M.Baire called the principle of extension (see Baire: Leçons
sur les théories générales de l’Analyse), whose most well-known application is the definition of the
exponential function. It can be stated as follows: If a function . f (x) is defined for all rational values
of .x and if it is uniformly continuous over the set of these values, we can extend it, and in a unique
way, to any value of .x such that it remains continuous.
We could combine this property with the one that serves our purpose into a single statement; we
would imitate for that the considerations developed by M. M. Fréchet in his Thesis (Rendiconti del
Circolo Matematico di Palermo, .1906).
14 Since . f and . f are summable, they are measurable, and the set of points where . f and . f are
1 2 1 2
different is indeed measurable.
15 Translator’s note: It should have been .−ε in the original text.
9.3 The Singularities of Non-Absolutely Continuous Functions 177
Of the two properties mentioned for the indefinite integral function of one variable: being
of bounded variation and being absolutely continuous, the first is contained in the second.
Indeed, for a function . F(X ) of unbounded variation, it is possible to choose (Sect. 5.1) a
countable system of non-intersecting intervals in such a way that the corresponding series
.[F(βi ) − F(αi )] diverges. However, such a series, according to the very definition of abso-
lute continuity (Sect. 5.1), is always convergent for an absolutely continuous function.
On the contrary, there exist continuous functions of bounded variation, which are not
absolutely continuous; the function .ξ(x) of Sect. 5.1 is an example. Indeed, .ξ(x) has a total
variation equal to .1 in any system of intervals enclosing . Z , even though . Z is of measure
zero.
When considering a function . F(X ) of bounded variation and assessing to what extent it
deviates from absolute continuity, it suffices to take sets of intervals of measure at most equal
to .η and to form for them the sums .+ and .− of positive and negative differences
. F(βi ) − F(αi ). By choosing .(αi , βi ) in any manner, . and . will have two upper limits
. M (η), M (η) which, as we let .η decreases towards zero, decreases towards two limits . p0
and .n 0 . It is clear that we can assert that . F(X ) deviates from the absolute continuity: in
terms of positive variation by . p0 , and in terms of negative variation by .n 0 , from the point
of view of total variation by . p0 + n 0 .
These numbers .n 0 and . p0 could have been defined by applying the previous procedure
no longer to . F(X ), but, respectively, to its positive variation . P(X ) and its negative variation
. N (X ). In other words we would have replaced .+ , for example, by the sum of positive
variation of . F(X ) in all the intervals .(αi , βi ). Indeed, by operating in this way, we have
.[P(βi ) − P(αi )]; but, in .(αi , βi ), we can find the non-intersecting intervals .(αi, j , βi, j ) such
that all the differences . F(βi, j ) − F(αi, j ) are positive, and their sum, for single variable . j,
differs as little as we want from . P(βi ) − P(αi ). We therefore have, by suitably choosing
the .(αi, j , βi, j )
. [P(βi ) − P(αi )] − ε < [F(βi, j ) − F(αi, j )] ≤ [P(βi ) − P(αi )]
and, as the measure of the set of .(αi, j , βi, j ) is at most that of .(αi , βi ), the two methods
of defining . p0 are quite equivalent. Let us add that we can obviously require the system
of intervals used to contain only a finite number. Obviously, to each method of defining
. p0 , .n 0 , and therefore .ν0 = p0 + n 0 , corresponds a different formulation of the condition of
absolute continuity.
Let us denote by . Ps (X ), . Ns (X ), .Vs (X ) the numbers . p0 , .n 0 , .ν0 relative to the interval
.(a, X ), it is obvious that in the positive interval .(α, X ) the numbers . p0 , .n 0 , .ν0 are . Ps (X ) −
Ps (α),. Ns (X ) − Ns (α),.Vs (X ) − Vs (α). It is also clear that these three numbers are positive
or zero, and are at most equal, respectively, to. P(X ) − P(α),. N (X ) − N (α),.V (X ) − V (α).
In other words, the six functions . Ps (X ), . Ns (X ), .Vs (X ); . P(X ) − Ps (X ), . N (X ) − Ns (X ),
178 9 The Indefinite Integral of Summable Functions
only in .a < X ≤ b. At point .a, we will take them equal to zero and set
. Fs (X ) = Ps (X ) − Ns (X ).
continuous, the one which has the least total variation, and which vanishes at .x = a.
It is obvious, according to the very definition of . Ps and . Ns , there cannot be a corrective
function .G s (X ) whose variation is less than . Ps (X ), Ns (X ), Vs (X ) in .(a, X ), and the only
function for which these values will attain minima is
. Fs (X ) + const.
as the two brackets of the right-hand side are positive or zero, it is, therefore, necessary to
prove that the number . p0 relative to the first bracket and the number .n 0 relative to the second
are zero.
If the number . p0 relative to . P(X ) − Ps (X ) was equal to .λ > 0, that is, we could find
finitely many points in .(a, b),
such that measure of the set of intervals of even rank .(x2i−1 , x2i ) is less than .η and however,
the sum
. {[P(x2i ) − Ps (x2i )] − [P(x2i−1 ) − Ps (x2i−1 )]}
exceeds .λ. This can be further written as
. [P(x2i ) − P(x2i−1 )] ≥ [Ps (x2i ) − Ps (x2i−1 )] + λ.
On the other hand, we can find in each interval of odd rank .(x2i , x2i+1 ), the intervals .(α, β)
whose total measure is as small as we want, and which yields a sum
. [P(β) − P(α)]
at least equal to . Ps (x2i+1 ) − Ps (x2i ), according to the very definition of . Ps . So that we can
assume that the set of .(α, β) with respect to all the values of .i has a measure less than .η and,
however, we have
9.3 The Singularities of Non-Absolutely Continuous Functions 179
. [P(β) − P(α)] ≥ [Ps (x2i+1 ) − Ps (x2i )].
Hence, by addition,
. [P(x2i ) − P(x2i−1 )] + [P(β) − P(α)] ≥ λ + [Ps (x j ) − Ps (x j−1 )]
= λ + Ps (b);
and this is impossible, according to the definition of . Ps , since the set of .(α, β) and of
(x2i−1 , x2i ) is of measure .2η as small as we want.
.
. F = S + C = Fs + AC = S + Cs + AC,
.C is the continuous part of . F (function .ψ of Sect. 5.1 .61) and .Cs is both the continuous part
of . Fs and the function of the singularities of the continuous part .C of . F.16
Let us consider a sequence . I1 , I2 , . . . of sets of intervals whose measures tend towards
zero and which yield the sums . 1 V , . 2 V , . . . tending towards the largest possible
limit .ν0 = Vs (b). The similar sums relative to . AC, tend towards zero, because of absolute
continuity of . AC. Therefore, the sums . 1 V , . 2 V , . . . also tend towards .ν0 .
By removing, if necessary, some of the first. I , we could assume that the series of measures
of. Is is convergent. So, if we denote by. I p the set of intervals. ∞ p
p Ik , the . I form a sequence
possessing all the properties mentioned for the sequence of . I , and additionally, . I p contains
p
.I
p+1 . Let . E be the set of points common to all the . I p ; it is of measure zero and for any
s
system of open intervals17 enclosing . E s , the sum . Vs = ν0 .
Indeed, let . J be such a system of intervals. For a sufficiently large value of . p, . I p is
contained in . J . Otherwise, as the set . K p , obtained by removing the subsets contained in . J
from . I p , contains . K p+1 , there would exist points common to all the . K p , hence to all . I p , and
not belonging to . J , which is impossible. Therefore, it yields a sum . Vs , at least equal to
the one yielded by . I p for . p very large, and therefore at least equal to .ν0 , and exactly equal
to .ν0 since no sum . Vs could exceed .ν0 = Vs (b).
16 The reader can show that the function of jumps is, among all the corrective functions . F such that
c
. F − Fc is continuous, the one which has the least total variation. . S and . Fs are, therefore, susceptible
to similar definitions.
I also leave aside a lot of propositions which may be easily proved, such as this one: the functions
of singularities and of jumps of a sum are the sums of the functions of singularities and of jumps of
the added functions.
17 That is, in the strict sense, the points of . E are interior to the considered intervals. For the construc-
s
tion of . E s , on the contrary, the intervals that formed the . I p were taken closed; that is, the extremities
of constituent intervals are considered to make a part of . I p .
180 9 The Indefinite Integral of Summable Functions
When a set is of measure zero and any system of open intervals enclosing it, yields a sum
. Vs equal to .ν0 , that is, yields a sum . V at least equal to .ν0 , this set is called the set
of singularities of . F because, in some sense, all the variation of . Fs is concentrated at the
points of this set.
Therefore, set . E s that we have just constructed is the set of singularities of . F, or, if we
want, a set of singularities, because it is clear that the set of singularities is very indeterminate.
For example, by adding an arbitrary set of measure zero to . E s , we again have a set of
singularities. Any set of singularities necessarily contains points of discontinuities of . F.
However, these are the only points that it necessarily contains. Indeed, let .x0 be a point of
continuity of . F and let us assume that it belongs to . E s . Consider a set . L of open intervals
enclosing. E s − x0 . This set. L encloses the subsets. E s1 and. E s2 of. E s located in.(a, x0 − ε) and
.(x 0 + ε, b), which are obviously the set of singularities of . F in these intervals. Therefore . L
yields a sum. Vs at least equal to.Vs (x0 − ε) + [Vs (b) − Vs (x0 + ε)]. But, by hypothesis,
. Vs (x 0 + ε) − Vs (x 0 − ε) tends towards zero with .ε, and therefore, . L yields a sum . Vs
equal to .Vs (b).
Similarly, from the set of singularities, we can remove an arbitrary countably infinite
number of points of continuity of . F, without it ceasing to be a set of singularities.
Let us consider the function
1 1 1
. F(x) = ξ(x) + 2
ξ(2x) + 4 ξ(22 x) + 6 ξ(23 x) + · · · ,
2 2 2
where .ξ(x) is the function of Sect. 5.1 but assumed to be extended outside .(0, 1), in a way
that it has a period .2 and, is even. . F(x) is continuous and of bounded variation in every
interval, it is not absolutely continuous in any interval, so that the set of singularities of . F(x)
is everywhere dense and nevertheless it does not necessarily contain any particular point.
Every point is equally likely to be part of this set of measure zero.18
Let . E s be the set of singularities of . F(X ); then for any sequence of set of open intervals
enclosing . E s and of measures tending towards zero, we have
. lim P + N = lim V = Vs (b) = Ps (b) + Ns (b);
now .lim P and .lim N cannot exceed respectively
. Ps (b) = p0 , Ns (b) = n 0 ,
18 This fact is paradoxical; however, it becomes less surprising when we consider that, to compute
. f (x) d x, it is necessary to retain some points of interval or set over which the integral is extended,
and yet, we can still remove any point from the set.
9.3 The Singularities of Non-Absolutely Continuous Functions 181
Conversely, the way we have determined the set of singularities of . P(x) and . N (x), shows
that their sum yields the set of singularities of . F(X ); because it is obvious that the sum of
sets of singularities of the terms of a sum is the set of singularities of the sum. . E s being the
set of singularities of . F(x), for any sequence of open intervals enclosing . E s and of measure
tending towards zero, the sum
. F = (P − N ) = P − N
tends towards
. p0 − n 0 = Ps (b) − Ns (b) = Fs (b).
In other words, the method which has allowed to associate with each measurable set . E, an
increment .A F (E), when . F was absolutely continuous—a method consisting of enclosing . E
in a sequence of sets of open intervals19 whose measures tend towards that of . E, and taking
the limit of the sums . F provided by this sequence of sets of intervals—applies to every
function . F of bounded variation, when we take its set of singularities as . E. But the set of
singularities is the only set for which this method still applies, at least when we are dealing
with a continuous function.
More precisely, we could show that if we call . F1 (x) as a function equal to . F(x), except
at the points where we have
for which
. F1 (x) = F(x − 0),
—that is, if we call . F1 (x) the function which provides the same function of intervals as
F(x), but which is free from the unnecessary singularities of . F(x)—the previously used
.
method of definition for the increment in a set of an absolutely continuous function can still
be applied to . F(x), but only for the sets of singularities for . F1 (x).
By a completely different procedure we will define the increment of . F(x) in an extended
class of sets. For that, let us decompose . F into . S + C. To the function of jumps . S, we will
attach in . E, an increment equal to
. [F(x + 0) − F(x − 0)],
E
where the summation is taken over those points of discontinuity of . F that belong to . E.
Let .V(x) be the total variation of .C from .a to .x, the change of variable
θ = x + V(x)
.
19 When it is a matter of continuous function it does not matter whether the intervals are open or
closed.
182 9 The Indefinite Integral of Summable Functions
transforms .C(x) in a function of .θ, say .C (θ).20 .C having in every interval .(θ1 , θ2 ) a total
variation .V(x2 ) − V(x1 ), less than .θ1 − θ2 , has, with respect to .θ, derivative numbers in
absolute value less than .1. .C (θ), being absolutely continuous, has a definite increment in
each measurable set . E x . We agree to define the increment in . E x as follows
AC (E x ) = AC (Eθ ).
.
The sets . E x we will reach at thus, are all measurable; because if we enclose .Eθ and its
complement .Fθ in two families of intervals, having common subsets of total measure .ε, by
the change of variable from .θ to .x, we deduce two families of intervals enclosing . E x and its
complement . Fx , and their common subsets have at most measure .ε. Indeed, to an interval
.(x 1 , x 2 ), there corresponds an interval .(θ1 , θ2 ) of equal or greater length.
However, we cannot guarantee that the increment is defined for all measurable sets . E x . In
any case, it is defined for all . E x which reduce to an interval or a point, because for them the
.Eθ are intervals or points. The change of variable transforms the additions and subtractions
of sets into additions and subtractions. It follows that the . B measurable sets . E x correspond
to . B measurable sets .Eθ , and consequently, the increment of . F is defined, in particular, on
every . B measurable set . E x .
To state the result, let us note that .V (x) will always denote the total variation of . F(x)
from .a to .x. We could have reasoned directly on . F(x) by setting
t = x + V (x),
.
20 I had used this fact to study the Stieltjes integral; It was M. de la Vallée Poussin who, in the
conference of Strasbourg congress, pointed out its current application. Prior to this, M. de la Vallée
Poussin had shown how to obtain the set function associated with a non-absolutely continuous function
. F(x), thanks to a method which exactly generalises the one which led to the measure of the sets. See
his book already cited and, later, the Chap. 11.
9.3 The Singularities of Non-Absolutely Continuous Functions 183
to define the increment of . F(x) using a method similar to the previous one and providing
the same result, as we immediately see. Therefore: we can attach to each function . F(x) of
bounded variation, a completely additive set function, which we call the increment of . F(x),
and which is defined on a family of measurable sets, that varies with . F, but which always
includes all the . B measurable sets; it is the family of sets which, by the change of variable
.t = x + V (x) interpreted suitably, provides measurable sets in .t.
We have seen that, using the same procedure that allowed us to construct . E s , we can
always ensure that the set of singularities of . F is . B measurable. Then, for this set, we have
two different definitions of increment of . F. It is necessary to verify that they are consistent
with each other.
Now, let . E x be a . B measurable set that corresponds to a set .Et ; let us enclose .Et in
open intervals .Jt 21 and let their measures approach that of .Et . If we take care to choose
the .Jt in such a way that they do not have any extremity in the interior of intervals .(t1 , t2 )
corresponding to the singular points of . F, which is possible, then .Jt correspond to a set
of open intervals . I x 22 enclosing . E x and of measure tending towards that of . E x . The sum
. I x F, then, tends towards .A F (E x ), because it is equal to . Jt F, which tends towards
.AF (Et ).
If, therefore, . E x is the set of singularities of . F, the only new aspect is that we are no
longer bound to use the particular set . I x deduced from .Jt for calculating .A F(x) (E x ). We can
use any other set .i x of open intervals enclosing . E x and of variable measure tending towards
zero.
If we restrict ourselves23 to the consideration of the sets . E x to which correspond the
measurable sets .Et , we can say that the set of singularities are those for which we have both
.m ( E x ) = 0, AV (x) (E x ) = Vs (b).
Hence, we deduce that if . E 1 and . E 2 are two sets of singularities for . F, of this category,
. B measurable say, the set . E
12 of points common to . E 1 and . E 2 , is also the set of singularities
of . F.
Indeed, we have
21 The intervals of .J which would have an extremity at .a or . V would be, however, taken closed.
t
22 In . I there could however be a closed interval at .a or at .b.
x
23 I do not know if this restriction is real, that is, whether there are sets . E , to which corresponds the
1
non-measurable set .Et .
184 9 The Indefinite Integral of Summable Functions
Hence
AV (x) (E 1 − E 12 ) = 0,
.
the example of . F zero everywhere in .(0, 1), except for .x = 21 shows it right away. .A(E) is
then identically zero, therefore, so is .V(E) and for . E reduced to point .x = 21 , we have
1
.A V (x) (E) = 2 F .
2
To rule out such singularities, let us modify . F(x) at its points of discontinuity, interior to
.(a, b), in such a way that the new function . F1 (x) does not have two jumps of opposite signs,
at any point.
To fix ideas, let us take . F1 (x) continuous from the right in the interior of .(a, b). Let
. V1 (x), P1 (x), N1 (x) be the three total variations of . F1 (x); we will prove that we have
V(E) = AV1 (x) (E); P(E) = A P1 (x) (E); N(E) = A N1 (x) (E);
.
.P(a ≤ x ≤ b) is the upper limit of the values of .A(E); but each value of .A(E) can be
calculated by using a system of intervals . I x , therefore, .P(a ≤ x ≤ b) is the upper limit of
numbers .A(I x ) for the system . I x of open intervals, except, possibly, at .a and .b. On the other
hand, . P1 (b) is the upper limit of numbers .A(Jx ) for the system . Jx of closed intervals. As we
exclude from a system . Jx of intervals all those which result in negative .F1 , we increase
.A(Jx ), and since we can combine two intervals that yield non-negative .F1 and have a
common extremity, we can assume that the intervals . Jx have no common extremities. Let
.(l, m) be one of the intervals, . F1 (x) being continuous from the right, the interval .l ≤ x ≤ m
9.3 The Singularities of Non-Absolutely Continuous Functions 185
yields the same increment as .l < m ≤ x and almost the same as .l < x < m + h, for .h very
small.
Therefore, .A(Jx ), for . Jx formed of closed intervals, differ as small as we want from
.A(I x ) for . I x suitably formed of open intervals, except, possibly, at .a and .b. And we have
P(a ≤ x ≤ X ) = A P1 (x) (a ≤ x ≤ X ),
.
hence it will follow that, in any open interval . I , closed or semi closed, we have
.P(I ) = A P1 (x) (I );
then we will deduce the equality .P(E) = A P1 (x) (E) for all the . B measurable sets.
The similar equalities related to .N and .V will follow.
Let . F1s , .V1s , . P1s , . N1s be the functions of singularities of . F1 , .V1 , . P1 , . N1 ; .V1s , . P1s , . N1s
are the three variations of . F1s . To these functions, which are continuous from the right in the
interior of .(a, b), correspond the increments .As (E), Vs (E), .Ps (E), .Ns (E) linked together
as.A, V, P, N. These functions.As , Vs , Ps , Ns are called the functions of singularities of.A,
.V, .P, .N. Since . F1s (x) is the function of smallest total variation such that . F1 (x) − F1s (x) is
mean any . B measurable set . E, which is of measure zero and for which the function .V(E),
the total variation of .A(E), attains the largest value that it can attain on the sets of measure
zero.
The set of singularities of .P(E), is consequently a set . E s,P at which
for every set . E, not having any point in common with . E s,P , .Ps (E) will be consequently
zero, since .Ps cannot be either negative or greater than . P1,s (b). The set . E s,P , is the set
on which .Fs (E), or .Ps (E), attains its upper limit. The set .Es,N of singularities of .N(E) is
that on which .Fs (E) or .−N(E), attains its lower limit. The set . E s,P + E s,N is the set of
singularities of .A(E).
The set of singularities of a completely additive function .A for . B measurable sets decom-
poses itself in two disjoint sets, which are respectively the sets of singularities of the positive
variation of .A and its negative variation; we have seen, indeed (Sect. 9.1), that the sets . E p
186 9 The Indefinite Integral of Summable Functions
and . E n , on which a function [here .Fs (E)] attains its upper limit and lower limit, can be
taken without common points.
Thus, the sets of singularities of . P1 (x) and . N1 (x) can be chosen to be disjoint, and we
can assume that any singular point of . F(x) does not belong to these sets, since these singular
points form only finite or countable set. But to obtain the set of singularities of . F(x) it is
sufficient to add to the set of singularities of . P1 (x) the points at which . F(x) has at least
one positive jump. Therefore, the sets of singularities of positive and negative variations,
. P(x) and . N (x), of a function of bounded variation, . F(x), can be taken without any other
common points than those at which . F(x) has a right jump and a left jump different from
zero and of opposite signs.
From all this analysis it is necessary to remember that a completely additive function
of . B measurable set is absolutely continuous if, and only if, it has a value zero on any . B
measurable set of measure zero. The function can then be extended to any measurable set.24
Let.F(x) be a function having a derivative. f (x), we know that. f (x) is measurable because it is
a function of first class (Sect. 7.2). Let us assume that. f (x) is bounded, then.r [F(x), x, x + h]
is also bounded, for any .x and .h. Since . f (x) is the limit of .r [F(x), x, x + h] for .h = 0 we
can write, according to a theorem stated in Sect. 8.4
x x
[F(x + h) − F(x)] d x
. f (x) d x = lim a = F(x) − F(a),
a h→0 h
smallest limits of .u n , taken for .x constant and .n increasing indefinitely. These are envelopes
of indetermination of limits of .u. Here is how we can obtain the upper envelope .u; .vi is the
function which, for each value of .x, is equal to the largest of the functions .u 1 , u 2 , . . . , u i ;
.wi is the limit of the increasing sequence .v1 , v2 , . . . ; .wi is defined from .u i , u i+1 , . . ., in
the same way as .w1 is defined from .u 1 , u 2 , . . . ; .u is the limit of the decreasing sequence
.w1 , w2 , . . .. If the .u i are the continuous functions, the same is true of .vi , so the .wi are at
most of first class and .u is at most of second class. A similar argument applies to .u. If we
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 187
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_10
188 10 The Search for Primitive Functions.The Existence of Derivatives
only assume that the .u i are measurable, we see that .u and .u are also measurable, and that
does not require .u i , .u and .u to be finite everywhere.
The definition of envelopes of indetermination could have been given for a function
. g(x, h), where .h is a parameter replacing the index of the function .u i . One of the derivative
1. Each interval has a length at most equal to a positive number .λ, that we will subsequently
allow to go to zero. Then, if we take the sums . p and .−n, formed respectively by the
positive and negative increments provided by the chain, we know that when .λ tends
2 This distinction is necessary because we have applied the classification of M. Baire only to functions
that are finite everywhere. However, we can extend this classification to all functions.
3 To satisfy the logical requirements, mentioned in the note in Sect. 5.2, it would be appropriate, and
easy, to specify these conditions so that the construction of the chain no longer involves choices; we
will not dwell on this.
The condition .1 is, in reality, unnecessary; conditions .2 and .3 are sufficient.
We can note that what follows does not assume for a moment that. f (x) is the right-hand superior
derivative number of . f (x), but only that . f (x) is measurable and lies at all points between the two
derivative numbers to the right, of . f (x); .λd ≤ f ≤ d . From our considerations, it follows that . f
is determined by . f , when . f is summable. This is the answer to a question posed by M. Denjoy.
10.1 The Search for Primitive Functions 189
towards zero, . p and .+n tend uniformly, in some sense, respectively towards the total
positive and negative variations . P and . N , as it has been said in Sect. 5.1.
We propose only to calculate . P. We know that we obtain an approximate value of . P by
summing the positive increments of . f (x), provided by the intervals of chain. Now, the
interval .(x0 , x0 + h) yields an increment equal to .hr [ f (x), x0 , x0 + h], and this number
.r is an approximate value of . f (x 0 ). It is necessary to specify this approximation; to
relative to the various values of integers .l, positive, negative and zero, where .ε is a very
small positive number, and in the sets
of . E . Those intervals, which originate from the points of . El , l > 0, have a contribution
i p
equal to their total length.l multiplied by a number lying between.(l − 1)ε and.(l + 1)ε.
In other words, this contribution is .l lε with an error of .l ε at most. This result is also
true for the intervals originating from the points of . E 0 , because the contribution of these
intervals, in the approximate value of . P, is at most .0 ε. Therefore, we can take for the
approximate value of . P
+∞
.p = l lε + M i p ,
0
∞
since we deviate from the value given by the chain only by .ε l at most, which is at
0
most .ε(b − a).
Now, it is necessary to be able to evaluate the .l and .i p . To do this, we will suppose
that we have enclosed each of the sets, that we have considered, in a set of intervals. We
will thus have the sets . Al , Ai p , Ain , each formed of non-intersecting intervals. Each of
190 10 The Search for Primitive Functions.The Existence of Derivatives
them exceeds in measure the set of the same index by a quantity .εl , εi p , εin that we can
choose as small as we want.
+∞
+∞
Therefore, let us suppose that the series .εi p + εin + εl and . |l|εl are convergent
−∞ −∞
and have arbitrarily small sums .η and .ζ. And let us set the condition:
3. Each interval of the chain is completely contained in set . A which contains the set to
which its origin belongs.
Then .l is at most equal to .m(El ) + εl , and at least equal to the largest of the two
numbers .0 or .m(El ) − η; for .i p we have similar limits. Hence, the lower and upper
limits for . p are deduced; the lower limit is
∞
p
. [m(El ) − η]lε + M[m(E i p ) − η],
0
only the positive terms are retained in . p .
Therefore, this sum contains only a finite number of terms, the number varying with .η. These
terms are less than those of
∞
. m(El )lε,
0
but they tend respectively towards these for .η tending towards zero. Now, we can let .η tend
towards zero, hence the limit
∞
. m(El )lε + Mm(E i p );
0
then we can take .ε arbitrarily small, . M arbitrarily large, therefore we have, the integral
having a finite or infinite value,
.P ≥ f d x + Mm(E i p )
E[0< f <+∞]
finite. Let us conclude by recalling that we could have taken for . f (x), any one of the four
derivative numbers, and there exists a similar inequality for . N , as found for . P.
10.1 The Search for Primitive Functions 191
where . f is any one of the four derivative numbers. The points where . f is infinite are
excluded from the sets or the intervals of integration.
Let us now assume that the right-hand superior derivative number . f is summable over
the set . E[0 < f < +∞], and assume the set . E i p is of measure zero; let us compute the
upper limit of . P
∞
. [m(El ) + εl ]lε + M [m(E i p ) + εi p ].
0
∞
The first term is less than . m(El )lε + ζε; for .ζ and .ε tending towards zero, therefore, it
0
tends towards . E[0< f <+∞] f d x.
Since .m(E i p ) is zero, the second term reduces to the product . M εi p of an arbitrarily large
number by an arbitrarily small number, therefore it tells us nothing.
4 We will agree to say that a property holds almost everywhere in an interval .(a, b), or on a set .E, if
the points of .(a, b) or of .E where it does not hold either do not exist, or form a set of measure zero.
This expression, introduced in the first edition of this book, has been generally adopted. If we
recall that M. Denjoy did not find it sufficiently precise and he rejected the expression: the point . P
is a point of set . E, we would not be surprised that he considered the expression almost everywhere
unacceptable. This is because, in his opinion, it had two meanings: one qualitative or descriptive,
the other quantitative or metric. I believe it’s reasonable to say that we could have agreed to give
almost everywhere the following meaning: exceptions made of the points forming a non-dense set
everywhere. Certainly, this is one possible interpretation; however, M. Denjoy says that a property
holds over a full thickness when I say that it holds almost everywhere. Could full thickness not have
received a different meaning than the one M. Denjoy chose to give it.
Almost everywhere would be unacceptable if, in ordinary language, this expression had a precise
meaning—but it does not. Consequently, the reader, when encountering a statement such as the one
above, cannot assign it any specific meaning without referring to its defined usage. Therefore, any
error is not possible.
Compelling the reader to refer to a definition had its own drawbacks. I would gladly concede
this to anyone except M. Denjoy, who has used a tremendous number of new words in his Memoirs.
Moreover, altering or even improving a vocabulary that is already gaining traction hardly mitigates
this inconvenience.
192 10 The Search for Primitive Functions.The Existence of Derivatives
Let us enclose . E i p in a set . I of non-intersecting intervals, let us suppose that the chosen
.A
ipare enclosed in . I . Then, if . P(I ) denotes the sum of the positive variation of . f in the
intervals . I , the second term . M εi p is at most . P(I ). Therefore, if we denote by . P(E i p ), one
of the limits of . P(I ), when we vary . I in such a way that .m(I ) tends towards zero, we will
have b
1
.P ≤ [ f + | f |] d x + P(E i p ),
a 2
And since, as .m(I ) tends towards zero, .m(I1 ) also tends towards zero, so does the last
integral of the right-hand side, while . P(I1 ) tends towards . P(E i p ), we have
b 1
. P≥ [ f + | f |] d x + P(E i p ).
a 2
By combining the two inequalities obtained in the opposite directions, we finally conclude
b 1
. P= [ f + | f |] d x + P(E i p ).
a 2
Therefore . P(E i p ), which denotes one of the limits of . P(I ), has a definite value; we can
state this result as follows:
If . f is one of the derivative numbers of a continuous function . f (x), for . f (x) to be of
bounded variation, it is necessary and sufficient that . f be summable over the set of points,
where it is finite and positive, that the set . E i p of points where . f is infinitely positive is
of measure zero and it may be enclosed in the intervals . I providing a sum of total positive
variations . P(I ) that is bounded.
P(I) then tends towards a finite and definite limit, when we vary . I in such a way that its
measure tends towards zero. This limit is the difference between the total positive variation
10.1 The Search for Primitive Functions 193
of . f (x) in the considered interval and the integral of . f over the set of points where it is
positive.
We have, of course, a similar statement by changing positive to negative, . E i p to . E in , . P
to . N , which expresses the equality
b 1
. N= [| f | − f ] d x + N (E in ).
a 2
From . P and . N , by addition and subtraction, we deduce the total variation .V , and the
increment . f (b) − f (a) of . f (x) in .(a, b). The set . E i = E i p + E in , of points where . f is
infinite, is of measure zero. Any set . I of non-intersecting intervals enclosing . E i , and whose
measure tends towards zero, can, therefore, be used simultaneously to calculate . P(E i p ) and
. N (E ), which can consequently be denoted by . P(E ), . N (E ); the limit of the sum . V (I )
in i i
. A(I ) = P(I ) − N (I ),
P(E i ), . N (E i ), .V (E i ), . A(E i ) are the finite and well-determined limits, towards which tend
.
the sums of the various total variations and increments of . f (x) in the non-intersecting
intervals of . I , which enclose the set . E i of points where . f is infinite, when we vary the
system . I of intervals, in such a way that .m(I ) tends towards zero.
194 10 The Search for Primitive Functions.The Existence of Derivatives
Let us leave this general statement aside for a moment and focus on the case where . E i
does not exist5 ; the numbers . P(E i ), . N (E i ), .V (E i ), . A(E i ), are then zero. If, then, we apply
the previous theorem to the interval .(a, x), we will have:
The necessary and sufficient condition for the derivative number . f , which is finite
everywhere, of a continuous function . f (x) to be summable, is that . f (x) is of bounded
variation.
The primitive function . f (x) of . f is, then, the indefinite integral, function of one vari-
able, of . f .
Therefore, we can say that we know how to solve the problems . A , . B , .C , (Sect. 6.2),
and as a result, the problems . A, . B, .C, for all summable functions.
There is another case in which . P(E i ), . N (E i ), .V (E i ), . A(E i ), are all zero, and that
is when . f (x) is absolutely continuous; because then .V (I ) tends towards zero with .m(I ).
Therefore: an absolutely continuous function is the indefinite integral of each of its four
derivative numbers6 its total variation is the indefinite integral of the absolute value of any
of its derivative numbers.
Now that we have this statement at our disposition, we could replace some of the previous
results by simple sufficient conditions of absolute continuity:
A continuous function . f , which has summable and everywhere finite derivative numbers
. f in .(a, b), is absolutely continuous.
A continuous function . f (x) of bounded variation, which has its derivative numbers . f
finite everywhere in .(a, b), is absolutely continuous.
Let us return to the general case in which . E i exists, and where the numbers . P(E i ),
. N (E ), . V (E ), . A(E ) are not necessarily zero.
i i i
5 The order of the text has been adopted to avoid redundancy, but it is worthwhile to note how much
the previous considerations simplify when we limit ourselves to the case of an always finite derivative
number . f ; in particular, it should be noted that in this case, the notion of a set function no longer
comes into play.
The historical order is reverse of the one in the text; the theorems related to the case of always
finite . f were included in the first edition of this book; the idea of evaluating the difference . f (b) −
f (a) − ab f d x as limit of. A(I ) is due to M. Vallée Poussin, who first obtained the general theorem,
initially for monotonic functions (Cours d’Analyse infinitesimale, .2nd edition, t.. I , p. .269).
We will note that the idea of M. de la Vallée Poussin already inherently contains the notions
of function and the set of singularities. These notions, along with all those related to set functions,
were examined in my Memoire: Sur l’integration des fonctions discontinues (Ann. sc. de l’Ec. Norm,
.1910).
6 It is understood that, in the definition of this indefinite integral, we neglect the set . E i of measure
zero, formed by points at which the considered derivative number . f is infinite. Or, equivalently,
we take integral of the function that is zero on the points of . E and equal to . f elsewhere.
10.2 The Differentiation of Functions of Bounded Variation 195
but . P(ei p ) is at most equal to . P(E i p ) and the integral of the right-hand side tends towards
zero with .m(J ). Therefore, when .m(J ) tends towards zero, the largest of the limits of . P(J )
is . P(E i p ), and this largest limit is attained when we take . J enclosing . E i p . In other words:
.E
i p is the set of singularities of the function . P(x) representing total positive variation from
.a to . x.
formulae which reveal the functions of singularities of . P(x), . N (x), .V (x), . A(x). As for the
set of singularities, it is . E i p for . P(x), . E in for . N (x), it is . E i for . P(x), . N (x), .V (x), . f (x).
We have, in particular
therefore, these numbers are all zero only in the previously examined case, where . f is
absolutely continuous.7 Earlier, we decomposed (Sect. 9.3) a continuous function of bounded
variation into its function of singularities and an absolutely continuous kernel, let . AC be
the kernel of . f (x); we just proved that we have
b
. AC(x) = f (a) + f d x,
a
We will now take into account that each of our results has four interpretations depending
on whether, by . f , we have denoted the right-hand superior derivative number of . f , or the
right-hand inferior derivative number, etc.
7 For . f non-absolutely continuous, one of the numbers . P(E i ), . N (E i ), . A(E i ) can be zero, but only
one.
196 10 The Search for Primitive Functions.The Existence of Derivatives
We first saw that . E i p plays a very special role, as we have just shown by proving that
.E
ip is the set of singularities of . P(x), when . P(x) is not absolutely continuous. There are
four sets . E i p , let .E i p be their common subset; .E i p is the set of points at which . f (x) has a
derivative equal to .+∞, but the common subset to several sets of singularities, if they are
. B measurable, is also a set of singularities (Sect. 9.3), therefore, .E
i p is a set of singularities
of . P(x).
Thus, in all that precedes, we can replace. E i p ,. E in ,. E i , respectively with the sets.E i p ,.E in ,
.E = E
i i p + E in formed by the points at which . f (x) has a definite derivative, in magnitude
and sign, equal to .+∞, .−∞, .+∞, or .−∞, respectively. These three sets, .E i p , .E in , and
.E are sets of singularities of . P(x), . N (x), . f (x) respectively, when these functions are not
i
absolutely continuous.
In particular, let us mention that a continuous function of bounded variation, which, at
any point does not have a well defined derivative, in magnitude and sign, equal to .+∞ or
.−∞, is absolutely continuous.
We then saw that the kernel . AC(x) of . f (x) was given by the formula
x
. AC(x) = f (a) + f d x,
a
therefore, the integral of right-hand side is same for each of the four derivative numbers;
thus (Sect. 9.2) these four derivative numbers are equal, except possibly at points of a set
of measure zero. A continuous function8 of bounded variation . f (x) has a derivative almost
everywhere and the kernel of . f (x) is, therefore, given by the formula
x
. AC(x) = f (a) + f (x) d x.
a
. f (x) and . AC (x)
are, therefore, equal almost everywhere as they have the same indefinite
integral, and consequently, the derivative of . f (x) − AC(x) is zero almost everywhere. In
other words: the derivative of the function of singularities . f s (x) of a continuous function9
of bounded variation . f (x) is zero almost everywhere. The converse is true; in a more
precise way: a continuous function10 of bounded variation and whose derivative is zero
almost everywhere, is its own function of singularities. Indeed, from the previous formula,
its kernel is identically zero.
As it has been mentioned in note, the previous three theorems could be extended to all
functions of bounded variation, whether continuous or discontinuous.11 To do so, it will be
sufficient to use the formula of Sect. 9.3,
. F = S + C = Fs + AC = S + Cs + AC,
and to demonstrate that a function of jumps has a zero derivative almost everywhere.
To each point of discontinuity .x0 of the considered function . S, let us attach two functions
.ϕ(x); the first equal to
.(x) = V S(x);
since each function .ψ is greater than the corresponding function .|ϕ| in the interval where
they differ.
The functions . and .ψ being non-decreasing and continuous, they simultaneously have
derivatives almost everywhere. At a point where all these derivatives exist, we have, for any
positive integer . p.
. = ψ1 + ψ2 + · · · + ψ p + ρ p ,
11 The differentiation of discontinuous functions was first studied by M. and Mrs. W. H. Young
(Quarterly Journal, .1910 and Proceedings of the London math. Soc., .1910).
198 10 The Search for Primitive Functions.The Existence of Derivatives
p ∞
=
. ψi + ρp = ψ + θ,
1 1
where .θ is positive or zero. Hence, according to the theorems of Sects. 10.1 and 8.4,
b ∞
b b
.(b) ≥ (x) d x = ψ d x + θ dx
a 1 a a
∞
b b
= ψ(b) + θ d x = (b) + θ d x,
1 a a
and, since .θ is never negative, .θ is zero almost everywhere. However, at points of . E, all
derivatives .ψ are zero. Therefore, almost everywhere in . E, we have
. = ψ = 0.
(x) = V S(x)
.
at points of . E, .(x) > V S(x) at points of .C E; at any point of . E the function .V S(x) has
derivative numbers to the right at most equal to those of .(x), therefore almost everywhere
in . E the total variation .V S, and therefore . S, has a right-hand derivative that is equal to
zero. But the measure of .C E is at most the sum . h which we could make arbitrarily
small. Therefore, . S(x) admits zero as right-hand derivative almost everywhere. A similar
conclusion is obviously applicable to the left-hand derivative; the theorem is proved. We are
now in a position to answer the question posed previously and to justify some assertions. We
have already, Sect. 9.1, alluded to the following property: for a function of one variable . F(x)
to be the indefinite integral of an unknown function . f (x), it is necessary and sufficient that
. F(x) be absolutely continuous.
12 The legitimisation of this statement is now immediate; we
have seen, indeed, that any indefinite integral is absolutely continuous, Sect. 9.2, and then
in Sect. 10.1, that every absolutely continuous function is an indefinite integral.
From this, it immediately follows that: for a function of set or interval to be the indefinite
integral of an unknown function . f (x), it is necessary and sufficient that be completely
additive and absolutely continuous; according to what we know about the transition from
such a function to an absolutely continuous function of one variable and on the converse
transition.
12 In the first edition of this book, I had indicated this statement, in note of the page .128, quite
incidentally and without proof. M. Vitali rediscovered this theorem and published the first proof
(Acc. Reale delle Sc. di Torino, .1904−1905) of it. It was on the occasion of this theorem that M.
Vitali introduced, for the functions of one variable, the concept of absolute continuity and showed
the simplicity and clarity that the whole theory acquires when we put this notion into its foundation.
10.2 The Differentiation of Functions of Bounded Variation 199
In Sect. 9.2, we formulated this problem: to find a function when its indefinite integral
is known. Let us take this indefinite integral in the form of a function of one variable
. F(X ); . F(X ) is absolutely continuous, therefore it has a derivative almost everywhere and
is the indefinite integral of this derivative, But two functions which have the same indefinite
integral are almost everywhere equal, therefore the function whose integration gave . F(X )
is almost everywhere equal to . F (X ). In other words: an indefinite integral . F(X ) admits
the integrated function as its derivative almost everywhere.
If the indefinite integral is given as function of set or interval, the problem is no less
solved. Let us examine what happens to the differentiation operation in this case. Let .h be
positive or negative, the ratio.r [F(x), x0 , x0 + h] is the quotient of the function., evaluated
on the interval .δ with extremities .x0 and .x0 + h, divided by the measure of this interval
(δ)
.r [F(x), x0 , x0 + h] = .
m(δ)
Therefore, it is through the consideration of such ratios that we obtain the derivative; with
the ordinary derivative, we would be led to the use the two families of intervals .(x0 − h, x0 ),
.(x 0 , x 0 + h) having the studied point as the extremity or origin. However, it is better to note
that the derivative can just as well be defined using the intervals .(x0 − h, x0 + k), because
we have
h
r [F(x), x0 − h, x0 + k] =
. r [F(x), x0 − h, x0 ]
h+k
k
+ r [F(x), x0 , x0 + k],
h+k
which shows that the value of .r in the left-hand side is included between those which appear
in the right-hand side. We are thus led to say: to differentiate at a point .x0 , the function of
intervals .(δ), we seek the limit of the ratio .(δ) when .m(δ) tends to zero, .δ denotes a
variable interval containing .x0 .13
This definition is satisfactory for a function of an interval, but for a function of a set,
we would like to define the limit of . (E)m(E) as .m(E) tends towards zero and the set . E tends
towards .x0 ; that is, . E is contained in an interval . containing .x0 and whose measure tends
towards zero. If . E were subject only to these conditions, we could take it, in particular,
reduced to a point .x0 and an interval .δ not containing .x0 . Therefore, it would be necessary
that the derivative appeared as the limit of
(δ)
. = r [F(x), x0 + h, x0 + h + k],
m(δ)
13 We will compare this definition, which is merely the translation of the definition of ordinary
derivative, with the one given by M. Volterra for the derivative of a function of line (Leçons sur
les équations intégrales et les équations intégro-différentielles, p. .12). The definition of the text can
obviously be applied to set functions of points in a plane, in space, etc. If we apply it to a set of points
in a plane, the resemblance with the definition of M. Volterra becomes more apparent.
200 10 The Search for Primitive Functions.The Existence of Derivatives
when .h and .k tend towards zero, even by the values of the same sign. According to the
mean value theorem, the right-hand side is included between the extreme values taken in
.(x 0 + h, x 0 + h + k), by the function . f whose indefinite integral is . F. Moreover, let . x 0 + h
be a point in the neighbourhood of .x0 at which . F exists, we could then find an interval
.(x 0 + h, x 0 + h + k) such that .hk is positive and as small as we want, and .r [F(x), x 0 +
let . f i be the function equal to . f at points of . E i and zero elsewhere, let finally .gi = f − f i .
A set of points of measure zero being excepted, the indefinite integrals . f i d x and . |gi | d x
have derivatives, in the ordinary sense of word, equal to the functions. f i and.|gi |, respectively.
Then, if the non exceptional point .x0 belongs to . El , we have
E f dx fl d x gl d x
. = E + E ,
m(E) m(E) m(E)
f dx
E fl d x E |gl | d x |gl | d x m()
− ≤ ≤ ;
m(E) m(E) m(E) m() m(E)
where . denotes the smallest interval containing both . E and .x0 . The first ratio of the
right-hand side tends towards the derivative of . |gl | d x at .x0 , which is zero by hypothesis.
Therefore, we will only have to deal with the influence of . fl , that is, with the points of
m()
. E l , if the ratio .
m(E) does not increase indefinitely. Hence the definition: We will call the
derivative at .x0 of a set function . the limit, if it exists, the ratio . (E)
m(E) , for sets . E belonging
to a regular family. By this, we mean that for an arbitrarily chosen positive integer .k, we
will consider a set . E only if, . being the smallest interval containing . E and .x0 , we have
and we will vary . E in such a way that .m() tends towards zero.
14 In short, it is necessary and sufficient that . f be continuous at . x when we neglect the sets of
0
measure zero [Sect. 9.1, in note].
10.2 The Differentiation of Functions of Bounded Variation 201
f dx f dx
E
We have just seen that with this choice of set . E the two incremental ratios . m(E) , . Em(E)
l
have the same limits almost everywhere. Let us study the second. Let . Fl be the set of points
common to . E and . El , and let .G l = E − Fl ; we have
E fl d x Fl f l d x F f l d x m(Fl )
. = = l .
m(E) m(E) m(Fl ) m(E)
The first factor of the last expression is included between .lε and .(l + 1)ε; as for the second,
l)
it can also be written .1 − m(G
m(E) . Now, if .h l denotes the zero function in . E l , and equal to .1
outside . El , we have
m(G l ) m(G l ) G hl d x 1 hl d x
. ≤ = l ≤ .
m(E) km() km() k m()
Outside an exceptional set of measure zero, all the indefinite integrals . h i d x have deriva-
tives equal to .h i . Therefore, if .x0 is also taken outside this new exceptional set, the last
expression
of the previous relations tends towards zero; therefore, the incremental ratio
E f dx
.
m(E) has all its limits included between .lε and .(l + 1)ε, that is, it differs from . f (x0 ) by
at most .ε. But, as .ε is arbitrarily small, if we take .x0 outside the sum of the exceptional
sets attached to the values .ε = 1, 21 , 13 , 14 , . . . , that is, outside a set of measure zero, the
derivative of set function . E f d x will be . f (x0 ). Therefore, an indefinite integral of a set
function has as its derivative, the integrated function, almost everywhere.
Therefore, we have the same statement for three kinds of indefinite integral. However, it
is quite necessary to note that, in its last form, it expresses a property which is much more
precise than in the earlier two forms, which were equivalent. If the indefinite integral .(E)
has a derivative at point .x0 , then . F(X ) also has one, and these two derivatives are equal; but
the converse may not hold.
Therefore, it is appropriate to express the results we have just obtained in terms of
. F(X ); further developments related to the calculation of .(E), when . F(X ) is known, lead
immediately to the statement: . F(X ) being an absolutely continuous function in .(a, b) and
. x 0 a point of .(a, b), we enclose . x 0 in an interval . and we choose, in ., non-intersecting
We can give the following form to this statement: . f (x) being a summable function, the
function . | f (x) − f (x0 )| d x admits a zero derivative for .x = x0 , provided that .x0 is not
taken in some exceptional set of measure zero.
202 10 The Search for Primitive Functions.The Existence of Derivatives
Indeed, let us show that there is an identity between this exceptional set .E and the .E1 of
points .x0 at which the set function of indefinite integral of . f (x) does not admit a derivative
equal to . f (x0 ).
Indeed, let us suppose that .x0 belongs to this latter set .E1 , which means that we can find
sets . E whose all the points approach .x0 indefinitely, which belong to a regular family of
parameter .k and for which we have
E f (x) d x
. − f (x
0 > α,
)
m(E)
E p [ f (x) − f (x 0 )] d x E [ f (x) − f (x 0 )] d x
. , n
m() m()
tends towards two definite limits .β and .−γ, and we have
E p [ f (x) − f (x 0 )] d x β γ
. > or < ,
m() 2 4
according to .β is positive or zero, and
10.2 The Differentiation of Functions of Bounded Variation 203
E n [ f (x) − f (x0 )] d x γ β
. <− or >− .
m() 2 4
according to .γ is positive or zero.
Furthermore, let us note that .β − γ is at least equal to .α, therefore that .β and .γ are never
both zero, and
.m(E n ) + m(E p ) = m(),
therefore, at least one of the two sets, . E n and . E p , belongs to the regular family of sets of
parameter .k = 41 .
If, . E p is regular and .β > 0, by taking . E ≡ E p , we have
E f (x) d x β m() β
. > f (x0 ) + > f (x0 ) + .
m(E) 2 m(E) 2
If these two conditions are not satisfied at the same time and if . E n is regular and .γ > 0, we
take . E ≡ E n ,
E f (x) d x γ m() γ
. < f (x0 ) − < f (x0 ) − .
m(E) 2 m(E) 2
If . E p is irregular, .β > 0, and if .γ = 0, then . E n is regular; we take . E ≡ and we will have
E f (x) d x E p f (x) d x E f (x) d x
. = + n
m(E) m() m()
m(E p ) β m(E n ) β β
> f (x0 ) + + f (x0 ) − = f (x0 ) + .
m() 2 m() 4 4
Finally, there remains only the case where . E p being regular and . E n irregular, .β = 0; with
E ≡ , then we have
.
E f (x) d x γ
. < f (x0 ) − .
m(E) 4
Therefore, if .λ > 0 is less than both . β4 and . γ4 , we always have
E f (x) d x
. − f (x
0 > λ.
)
m(E)
The sequence of these sets . E, which belong to the regular family for .k = 41 , allows us to see
that .x0 belongs to .E1 .15
In the course of the previous arguments, we introduced a concept that needs to be
m(e)
explained. Let . E be a measurable set, the ratio . β−α of the measure of the subset .e of
15 In a paper concerning Fourier series that I published in Math. Annalen (Bd. L X I ,.1905) we will find
another proof of this theorem, from which we can deduce the proposition regarding the differentiability
of set functions which are indefinite integrals, in a manner completely different from the one which
we have used here.
204 10 The Search for Primitive Functions.The Existence of Derivatives
. E located in an interval .(α, β) to the measure of that interval, is called the mean density of
. E in .(α, β). If the mean density of . E in the intervals .(α, β) containing a point .x0 , tends
towards a definite limit when .β − α tends towards zero, this limit is called the density at .x0 ;
. x 0 need not be a point of . E for this definition to apply.
16
Let .ϕ be the measurable function which is equal to one at points of . E and zero elsewhere.
β
ϕ dx
The mean density of . E in .(α, β) is the incremental ratio . αβ−α , therefore the theorem on
x
differentiation of . α ϕ d x yields: the density of a measurable set . E is equal to one almost
everywhere at point of . E, and equal to zero outside . E, almost everywhere.
We can consider this property as fundamental geometric theorems on differentiation of
definite integrals. To establish these theorems, we can follow a course exactly opposite
to the one presented here, namely, to prove first the theorem on density.17 we can simply
deduce from this theorem an interesting proposition concerning any measurable function
. f , using the notations introduced for the study of the differentiation of the functions of set
(Sect. 10.2).
Except at the points of an exceptional set of measure zero, the different sets. E i have density
equal to either one or zero, depending on whether it concerns the density at a point of . E i
or not. We can suppose this exceptional set is chosen in such a manner that it corresponds
to the values .1, 21 , 13 , . . . assigned to .ε. After choosing .x0 outside this exceptional set, let
.l(ε) be the index of the . E l corresponding to .ε which contains . x 0 . Then, at . x 0 , for each of
the indicated values of .ε, . El(ε) has a density equal to one, while the other . E i have a density
zero.
Let us choose an interval.1 with center.x0 and sufficiently small so that, in every interval
contained in .1 and containing .x0 , the mean density of . El(1) is greater than .1 − ε1 , where
.ε1 is a number chosen arbitrarily between zero and one. Let .1 − ε1 + η be the lower limit
.1 − ε2 + η2 ,
16 I introduced these notations, which are almost necessary, given the physical meaning of same
expressions, in the cited Memoire in the Math. Annalen. M. Denjoy replaces the word density with
thickness; he is bothered by phrases such as: . A set everywhere non-dense can have a density equal
to one almost everywhere. . It is certain that such a phrase may seem like a pun. However, between
the two terms—‘dense’ and ‘density’—it is the former which is poorly chosen because it, in my
opinion, evokes an idea of measure, a qualitative idea.
Be that as it may, the reader who has followed me so far will not allow himself to be disturbed by
these semantic concerns, as he would have agreed to delve into the quest for definition of the definite
integral, which may seem absurd, and the definition of the integral that remains indefinite even after
being defined, which is, to some extent, a humbling realisation.
17 See, for example, my Memoire of Ann, Sc. de l’Ec. Norm., .1910.
10.2 The Differentiation of Functions of Bounded Variation 205
the lower limit of this mean density and, starting from .η2 and .ε2 , just as before with .η1 and
ε1 , we will define, through the intermediary of . El( 1 ) a number .ε3 and an interval .3 , and
.
3
so on.
Let . E be the set formed by the part of . El(1) which is exterior to .2 and contained in .1 ,
and the part of . El( 1 ) which is exterior to .3 and contained in .2 , . . .. It is clear that . f (x)
2
is continuous at .x0 on . E. We will see that . E is of density one at point .x0 .
To do this, consider a set .E consisting of . El(1) in .1 − 2 and, in .2 , would have a mean
density exceeding .1 − ε2 in any interval contained in .2 and containing .x0 . For example
we could take .E to be identical to . El( 1 ) in .2 .
2
The intervals contained in .1 and containing .x0 are, either contained in .2 , and then we
know for them the lower limit .1 − ε2 of their mean density, or not contained in .2 . Let . L
be such an interval, .l be its subset located in .2 . In . L, . El(1) had a measure at least equal
to .(1 − ε1 − η1 )m(L); but any .l can be a subset of . El(1) and, in .l, .E has a measure greater
than .(1 − ε2 )m(l). Finally, in . L, .E has a measure greater than
m(l)
.1 − ε1 + η1 − ε2 > 1 − ε1 + η1 − ε2 > 1 − ε1 .
m(L)
Now, let .E2 be the set .E identical to . El( 1 ) in .2 . Let .Ei be the set identical to .Ei−1 to
2
the exterior of .i and identical to . El( 1 ) in .i . Now, it is clear that the mean densities
i
of .E2 , E3 , . . . , are all greater than .1 − ε1 in the intervals containing .x0 , contained in .1
and containing .2 ; that the mean densities of .E3 , E4 , . . . , are all greater than .1 − ε2 in the
intervals containing .x0 , contained in .2 and containing .3 , . . . .
Therefore, the mean density of . E is at least equal to .1 − ε1 , 1 − ε2 , . . . , respectively, in
the various types of considered intervals; the density of . E at point .x0 is therefore equal to
one and that of the complement .C E of . E is consequently zero, because the sum of the mean
densities, in the same interval, of two complementary sets is quite obviously equal to one.
Thus:
A measurable function. f (x) is continuous at each point .x0 , provided we neglect the points
of a set of zero density at .x0 ,18 with the exceptions, however, of the points .x0 belonging to
an exceptional set of measure zero.
Hence, it follows, in particular, that the derivatives of functions. F(x) of bounded variation,
whose existence has been shown, are continuous almost everywhere, except at the sets of
zero density. Everyone can explore the relations between this continuity and possibility
18 The neglected set, of zero density at . x , varies with the point . x ; it is the set .C E of the text.
0 0
206 10 The Search for Primitive Functions.The Existence of Derivatives
Let a curve be given by the continuous functions .x(t), y(t), z(t), defined on some interval
(a, b). By applying to the radical, which represents the length of a chord of this curve, the
.
inequality
.|l| ≤ l 2 + m 2 + n 2 ≤ |l| + |m| + |n|,
we concluded in Chap. 4, that the curve is rectifiable if and only if, .x(t), y(t), z(t) are of
bounded variation. When this condition is met, the arc given by the values of .t in an interval
.δ, has a length between the total variation . V x(δ) of . x in .δ [or . V y(δ), or . V z(δ)], and the sum
If, .s(t) denotes the arc length from .t = a to an arbitrary .t, we can write:
Let us apply this double inequality to all the intervals of a set . I of non-intersecting intervals,
and add. We obtain a result that can be denoted by
Let us now vary . I in such a way that .m(I ) tends towards zero, we will see that .s(t) is
absolutely continuous if, and only if, .x(t), y(t), z(t) are absolutely continuous. The set of
singularities of .s(t) is the sum of the sets of singularities of .x(t), y(t), z(t).
Therefore, we can consider the set .Es of values of .t where .x(t), y(t), z(t), s(t) do not all
have finite and definite derivatives as a set of singularities.
This being said, let us enclose .Es in a set . Ai of non-intersecting intervals. Then a positive
.ε being arbitrarily chosen, let us enclose in another set . A p of non-intersecting intervals, the
. pε ≤ x 2 + y 2 + z 2 < ( p + 1)ε.
Now, let us cover .(a, b), starting from .a, using a chain of intervals chosen among the
following:
a. To a point.t0 of.Es let us attach the interval with its origin at.t0 and its extremity coinciding
with the extremity of the interval from the set . Ai which contains .t0 ;
b. To a point .t0 of . E p , let us attach an interval of origin .t0 , contained in the interval from the
set . A p which contains .t0 , and such that the length of the chord provided by this interval
10.3 The Rectification of Curves 207
.δ differs from
m(δ) x (t0 )2 + y (t0 )2 + z (t0 )2
.
by less than .ε
Let’s use this chain, or rather the corresponding inscribed polygon, to calculate an approxi-
mate value of .s(b). The contribution of the intervals of type .a is at most .Vs (Ai ). Moreover,
it will also be as close as we want to this value if we modify the construction of the chain as
follows: we will proceed as it indicated only outside the first . p intervals, .δ1 , . . . , δ p consti-
tuting . Ai , and we will subdivide .δ1 , . . . , δ p into extremely small partial intervals. With this
modification, we can thus assume, by suitably choosing the . Ai , that the contribution of the
intervals of type .a differ as little as we want from the lower limit of .Vs (Ai ), when we let
.m(Ai ) tend towards zero, that is, from the value taken by .ss (b) for .t = b by the function of
where .α p denotes the part of . A p not covered by . Ai , and the other . A p intervals. By appropri-
ately choosing .ε, .m(Ai ), .m(A p ) to be sufficiently small, we can ensure that the two previous
sums differ arbitrarily little from
. x 2 + y 2 + z 2 dt,
E
where.E is the set of points of.(a, b) which do not belong to.Es . Before concluding, let us note
that if we had formed.Es using the points where one of the numbers.x , y , z was infinite with
a definite sign and .E from points where .x , y , z were all three finite and definite, it would
not have changed either the lower limit of .Vs (Ai ), or the integral . E x 2 + y 2 + z 2 dt,
therefore:
If we denote by .E , the set of values of .t for which .x (t), y (t), z (t) are finite and definite,
and by .Es the set of values of .t for which at least one of the derivatives .x (t), y (t), z (t) has
an infinite value of definite sign, the length of the rectifiable curve .x(t), y(t), z(t) is equal
to
. x 2 + y 2 + z 2 dt + ss (b);
E
where .ss (b) is the lower limit of the sum of the arc lengths for the curve, containing all
points on this curve given by values of .Es .19
If we correct .s(t) by its function of singularities, we have an absolutely continuous
function
19 The proof shows that we can replace in this statement the words . the lower limit of the sum of
the arc lengths . by . the lower limit of the limit of the sum of the chord lengths of the arcs . .
208 10 The Search for Primitive Functions.The Existence of Derivatives
t
. ACs (t) = s(t) − ss (t) = x 2 + y 2 + z 2 dt,
0
which has. x 2 + y 2 + z 2 as its derivative almost everywhere; but.ss (t) has a zero derivative
almost everywhere. Therefore, we have almost everywhere
2 2 2 2
.s (t) = x (t) + y (t) + z (t) ,
it holds almost everywhere; the expression ‘almost everywhere’ now being relative to the
measure with respect to the variable .s. Wherever it holds .x (s), y (s), z (s) exist, and are
not all three zero, therefore, if we exclude the points of a rectifiable curve, given by a set
of measure zero for the values of the arc length .s, at every point of the curve there exists a
well-defined tangent, and we have
We will now demonstrate the theorem that we have already stated in Sect. 7.2:
For a function to be of class one at most, it is necessary and sufficient that it be point-wise
discontinuous for any perfect set.
This theorem, due to M. Baire,1 beyond the scope of our subject. However, on one hand,
we will use the necessary condition it expresses, and on the other hand, and most importantly,
the transfinite method that we will use to prove that the stated condition is sufficient, is the
very same method that enabled M. Denjoy to completely solve the problem of primitive
functions.2 Also, we will henceforth consistently use transfinite numbers. Readers who are
not familiar with the use of these numbers should study the Note located at the end of the
volume before reading this chapter.
Let us show that the condition is necessary, meaning that if . f is the limit of a convergent
sequence of continuous functions . f 1 , f 2 , . . ., and if . E is a perfect set, there exist points in
. E at which the function . f , considered only on . E, is continuous. Let us denote by . E n, p the
1 Annali di Matematica, .1899. See also the book published by M. Baire in this collection.
2 The Memoires of M. Denjoy published in Journal de Mathematiques,.1915, in Bulletin de la Societe
mathematique de France, .1915 and in Annales scientifiques de l’Ecole Normale, .1916, 1917.
3 By point contained in an interval, we mean a point included between the extremities and different
from these extremities.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 209
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_11
210 11 The Totalisation
to . E 1 ,4 where . I1 is an interval with midpoint .x0 and is taken to be sufficiently small. . I1 will
be interior to . I0 , it will contain points of . E and no points of . E 1 .
If, in . I1 , . E and . E 2 are not identical, then in . I1 , we can find a point .x1 belonging to . E
but not belonging to . E 2 , starting from which we can define an interval . I2 contained in . I1 ,
containing points of . E and no points from . E 2 , etc.
Now the sequence of . I1 , I2 , . . . cannot be infinite because, in the interior of all these
intervals, there would be points; these points would belong to . E and would not belong to
any of . E n , which is impossible.5 Therefore, we will arrive at one last interval, say . I p , and,
in . I p , . E and . E p+1 are identical.
Thus, we could find an interval . I and a value of .n such that, in . I , . E and . E n are identical
and indeed have the points. In . I let us take .i, containing points of . E, and such that the
oscillation of . f n , in .i, is smaller than .ε. Then, for .x1 and .x0 taken in .i and in . E, we have,
for any . p positive or zero,
from where
.| f n+ p (x1 ) − f n (x0 )| ≤ 2ε,
and as a result
| f (x1 ) − f (x0 )| ≤ 2ε.
.
In other words, in .i, . f and . f n+ p (. p positive or zero) are each constant to within .2ε and,
when .x and . p vary, they are equal to each constant to within .4ε. If now we take the numbers
.ε1 , .ε2 , .ε3 , . . . tending towards zero, we can find intervals .i 1 , i 2 , . . . contained within each
other, containing points of . E, and in which the oscillation of . f , over .ε6 would be at most
.2ε1 , 2ε2 , . . . respectively. In the interior of all these intervals .i k , there will exist at least one
4 I will refrain from specifying the rule for the choices of. x , x , . . . , I , I , . . . as it would be required
0 1 0 1
to meet the logical requirements mentioned in Sect. 5.2.
5 In other words, a perfect set . E cannot be the sum of a finite or a countably infinite number of sets
that are everywhere non-dense on . E.
6 Translator’s note: Instead of .ε, it is probably, .i , i , . . ., that the author intends to say.
1 2
11.1 The Functions of the First Class 211
theseries is uniformly
convergent, since it has its terms majorised by those of the series
1
.
2k
+ 1
2k−1
, and all its terms are of class one at most.
Therefore, . f k is the limit of a sequence of continuous functions .ϕkp , and as .| f k | does not
exceed . 21k + 2k−1
1
, if we set, for .k > 1,
1 1
.ψ kp = ϕkp , when we have |ϕkp | ≤ k
+ k−1 ;
2 2
1 1 1 1
ψ kp =+ , when we have ϕ p > k + k−1 ;
k
2k 2k−1 2 2
1 1 1 1
ψ p = − k + k−1 ,
k
when we have ϕ p < − k + k−1 ;
k
2 2 2 2
for fixed .k and . p increasing indefinitely, the sequence of .ψ kp has the same limit as that of
.ϕ p .
k
p is a continuous function because the series of the second member has its terms majorised
.
most equal to .ω, the oscillation of . f (x) is less than .ω + ε, on a subset of . F contained in
an interval of length .λ, as long as .λ is small enough; where .ε is any positive number.
With these remarks in mind, given a function . f (x) that is point-wise discontinuous on
any perfect set in an interval .(a, b), we can obtain a function .ϕ(x) that differs from . f (x)
by at most .η through the repetition of the following operation: let us suppose that .ϕ(x) is
already constructed except at the points of a closed set . F, we consider the set . of points
of . F at which the oscillation of . f (x) on . F is at least equal to . η2 . . is a closed set non-dense
everywhere on . F, since . f (x) is point-wise discontinuous on . F.
.(α, β) being any of the intervals contiguous to ., let us subdivide it in two equal intervals
.(α, z 0 ), (z 0 , β); let us subdivide each of them into two equal interval, giving us, in particular,
the two extreme intervals .(α, z −1 ), (z 1 , β) which we further subdivide in two equal intervals
by the points .z −2 and .z 2 , respectively. Let us continue similarly by subdividing the extreme
interval .(α, z −2 ), (z 2 , β) in two equal intervals, etc. The considered interval .(α, β) is, thus,
divided into a sequence, infinite in both directions, of intervals .(z i , z i+1 ). In .(z i , z i+1 ) there
are no points at which the oscillation of . f (x) on . F exceeds . η2 , therefore, by subdividing
.(z i , z i+1 ) in sufficient number of equal subsets, the oscillations of . f (x) on the subsets of . F
α+β
. · · · < x−2 < x−1 < x0 = < x1 < x2 < · · ·
2
We agree that, at points of . F where we have
. xi ≤ x < xi+1 ,
to take
li + L i
.ϕ(x) =
2
where .li and . L i are the lower and upper limits of the values taken by . f (x) on subset of . F
located in .(xi , xi+1 ).
If .a belongs to . F and not to . we will set .ϕ(a) = f (a), and similarly, if .b belongs to . F
without belonging to ., we will set .ϕ(b) = f (b).
It is clear that .ϕ(x), now defined for every point not belonging to ., differs from . f (x)
by .η at most outside of ..
We will explain how the transfinite repetition of this process allows determining .ϕ(x)
throughout .(a, b), then we will prove that .ϕ(x) is at most of class one outside ..
Let us apply our method to the case where . F is the entire interval .(a, b); this operation
. O1 provides us with .ϕ except at the points of a closed set ., which we denote by .e1 . Let us
take this set .e1 as the new . F. The subsequent operation, the operation . O2 will provide us
with .ϕ except at the points of a set . that we denote by .e2 . Then we will take .e2 as the new
. F, leading to an operation . O3 , etc. If it happens that .en does not exist, then .ϕ will be entirely
11.1 The Functions of the First Class 213
determined by the operation . On . This can occur for an arbitrary value of .n, for example, for
n = 1. However, it can also happen that we exhaust the sequence of entire finite indices .n
.
without determining .ϕ(x) in whole .(a, b). All the sets .e1 , e2 , . . . exist; they are closed, each
of them contains the next. Therefore there are points common to all these sets and these
points constitute a closed set which we will denote by .eω .
The operation . Oω , which is of a different nature from the previous ones, will simply
involve constructing .eω and recognising that, after the preceding operations prior to . Oω , the
function .ϕ is known except at the points of .eω .
The operation . Oω+1 will be the one in which we take .eω as the known set . F. In general,
the operation . Oα+1 , following immediately the operation . Oα (which provides .ϕ(x) except
at the points of .eα , will be the one in which we take .eα as the new set . F.
Whenever we have exhausted the finite and transfinite indices less than a transfinite
number of second kind .α, without arriving at the definition of .ϕ over the .(a, b), it means
that the sets .eβ exist for all .β < α. In this case, there are points common to all the .eβ sets
and which constitute a closed set .eα . The operation . Oα then reduces to constructing .eα and
recognising that, after the operations . Oβ (β < α), we know .ϕ(x) except at the points of .eα .
The family of operations . O is thus defined and provides a well-ordered sequence of
closed sets .e1 , e2 , . . . such that each one contains all the subsequent ones, and each one is
non-dense everywhere in those which precede it, since . f is point-wise discontinuous on any
closed set . F. Therefore, two sets .ei , e j of different indices cannot be identical. Also (see
the note at the end of this Volume) the sequence of .ei is at most countable. In other words,
after a finite or countably infinite number of operations, we arrive at an operation . Oμ for
which there is no exceptional set .eμ , meaning that it reveals .ϕ(x) over the entire .(a, b).
Let us now show that .ϕ(x) is of class one at most in the interval .(a, b) which we denote
by .e0 .
Considering an arbitrary point . X from .e0 , . X belongs to the sets .e0 , e1 , . . .. However,
since .eμ does not exist, there is a first index .α, at most equal to .μ, beyond which . X no longer
belongs to .eα . Moreover, .α is of first kind; a point, belonging to all the .eβ sets of indices
less than a second kind number .γ, indeed belongs, to .eγ by very definition of .eγ . Therefore,
the function .ϕ(x) has been defined at point . X during the operation . Oα and through the
intermediary of an interval .(xi , xi+1 ) containing . X obtained from the subdivision of an
interval contiguous to .eα . If . X is at a distance of at least . 1p from .eα , and if we have
pxi+1 + xi
. xi ≤ X ≤
p+1
we set
ϕ p (X ) = ϕ(X ).
.
We set
ϕ p (a) = ϕ(a) = f (a) and ϕ p (b) = ϕ(b) = f (b).
.
214 11 The Totalisation
The points at which .ϕ p (x) is thus defined, excluding the points .a and .b, naturally form
closed sets; . E α will be the set of those points of . X belonging to all the .eβ sets for which .β
is less than .α and not belonging to .eα to which our definition of .ϕ p (x) applies.
The different . E α sets are at least at a distance of . 1p from each other, therefore there
are at most . p(b − a) of them that actually
exist. Each of them decompose through the
+xi
consideration of intervals . xi , pxi+1 p+1 into a finite number of closed sets and, on each of
these, .ϕ(x), and therefore .ϕ p (x), are constant.
Finally, .ϕ p (x) is defined by the condition of being constant on a finite number of closed
sets separated from each other; therefore, we can complete the definition of .ϕ p (x) in .(a, b)
in such a way that .ϕ p (x) is continuous in entire .(a, b).
It is clear that .ϕ(x) is the limit of functions .ϕ p (x) when . p increases indefinitely; at any
point . X we indeed have .ϕ(x) = ϕ p (x) from a certain value of . p, which we determine as
follows: let .α be the index from which . X no longer belongs to .eα , and let .l be the distance
of . X from .eα . Consider .(x1 , xi+1 ), the interval obtained from the subdivision of intervals
contiguous to .eα and such that we have
. xi ≤ X ≤ xi+1 ,
X −xi
. p is smallest integer at least equal to both . 1l and . xi+1 −X .
Let . f (x) be a derivative function which is finite everywhere in .(a, b). From what we have
learned so far, we know how to find the primitive function of . f (x) when . f (x) is summable.
It is clear that the processes of Cauchy and Dirichlet allow us to attain, from there, the
primitive function of . f (x) when the points of non summability7 of . f (x) form a reducible
set. Let us limit ourselves to the case where the two extremities of the considered intervals
.(a, b) are the only points of non summability of . f (x); then we can obtain the primitive
When we let .α and .β tend towards .b, in such a way that we have
From this particular case we will immediately deduce a very extensive result. For this
purpose, let us note that the points of non summability of a function . f (x) necessarily form
a closed set . E, since, if .x0 is a point of summability, meaning that if .x0 is interior to an
interval in which . f (x) is summable, all the points sufficiently close to .x0 , in order to be
interior to the same interval, are also points of summability.
To say that a point belongs to . E means that . f (x) is not summable over any interval
containing this point. However, it does not necessarily imply that . f (x) is not summable
over . E around that point. We will examine precisely this case where . f (x) is summable over
. E.
. F1 (x) at .α. Similarly, we know the right-hand derivative of . F1 (x) at .β. Therefore, if . F1 (x)
is of bounded variation, meaning that if its right-hand derivative, for example, is summable,
we can compute . F1 (x). Now, the right-hand derivative of . F1 (x) is summable only if it is
summable, on one hand over . E, that is, if . f (x) is summable over . E, and on the other hand,
over the set of intervals contiguous to . E, that is, if the series . [F(β) − F(α)] extended
to the intervals contiguous to . E is absolutely convergent. Therefore, when the previous
conditions are fulfilled, we have
. F(b) − F(a) = f (x) d x + [F(β) − F(α)].
E
To give this result its complete scope, let us note that we have only used the fact that . E is
closed, therefore:
If we know the primitive function . F(x) of a function . f (x), given on an interval .(a, b),
in every interval .(α, β) contiguous to a closed set . E:
If the series . [F(β) − F(α)] is absolutely convergent; if . f (x) is summable over . E,
then we have
. F(b) − F(a) = f (x) d x + [F(β) − F(α)].
E
8 We introduce here the extremities .a and .b of the considered interval, because we have agreed to
consider .(a, x1 ) and .(xω , b) as two intervals contiguous to . E 1 , where .x1 and .xω are, respectively,
the points of smallest and greatest abscissa of . E 1 .
216 11 The Totalisation
Now, we will see that, as soon as the first of the three conditions of the previous statement9
is met, there exists a partial interval .i, in the interior of which . E has points, and in which
the two other conditions of the statement are also satisfied.
Let us suppose, indeed, that . F(x) is known in every interval contiguous to a closed set
. E. Then:
Either . E is not perfect; let us take an interval .i containing in its interior only one point of
. E, which is possible since . E has isolated points; in .i, the three conditions of the statement
are fulfilled;
Or . E is perfect. We have learned, Sect. 10.1, to choose a sequence of values .h n tending
towards zero and such that the ratio .r [F(x), x, x + h] has an oscillation equal to . n1 at
most, for .h included between .h n+1 and .h n . Let us consider . f (x) as the limit of continuous
functions .r [F(x), x, x + h] = f k . We know that we can determine an interval .i, containing
points of . E, and in which . f and . f n+ p are, on . E, equal and within a distance of .4ε, for all
positive values of . p, with .n chosen suitably, Sect. 11.1. I assert that this interval .i answers
the question. Indeed, for .x located in .i and on . E, .r [F(x), x, x + h] is, for .h < h n , differs by
n at most from one of the functions . f i . Therefore, the value of .r [F(x), x, x + h] is within
1
.
.4ε + . Thus, except may be for the intervals contiguous to . E, which have a length greater
1
n
than .h n , and there are finite number of them, every interval .(α, β) contiguous to . E and
contained in .i, provide for .r [F(x), α, β] a value within .4ε + n1 . Therefore, we have, for any
interval contiguous to . E and included in .i,
for all the contiguous intervals contained in .i. Therefore, it is quite clear that, for the subsets
of . E located in .i, the series . [F(β) − F(α)] is absolutely convergent. However, on the
other hand, . f (x) is constant to within .2ε on . E, is bounded, and therefore summable, and
all the requisite conditions for the application of our theorem are satisfied in .i.
Since then, in any interval containing points of . E, we can find another one, containing
points of . E, and in which we know how to determine the primitive function of . f (x). The
points of . E which are not interior to such intervals, therefore form a set, which is obviously
closed and non-dense everywhere on . E. Let . H be this set; if .(l, m) is an interval contiguous
to . H and if we take .(λ, μ) such that
9 I mentioned this statement in my Thesis in note in Sect. 4.1. It represents the extreme point that I
had reached in the research of primitive functions.
11.2 The Primitive Functions of Everywhere Finite Derivatives 217
we know how to calculate the primitive function of . f (x) in .(λ, μ), and therefore a passage
to the limit will provide us this function in .(l, m).
Thus: if we know how to determine, up to an additive constant, the primitive function of
a derivative function . f (x), in any interval contiguous to a closed set . E, we know through
this very fact how to determine it in any interval contiguous to a closed set . H , formed of
points of . E and non-dense everywhere on . E.
This proposition will allow us to proceed by transfinite induction. First, let us consider
the interval .(a, b) itself, for . E. Then, the set . H is, the set . E 1 of points of non summability
of . f (x), and we will call . O1 the operation which reveals . F(x) in the intervals contiguous
to . E 1 . Then, let us take . E 1 for the set . E. We will denote by . E 2 the set . H given by . E 1 , and
. O2 will denote the operation which provided . F(x) in the intervals contiguous to . E 2 . And
so on. If we exhaust all the finite indices without arriving at . F(x) in entire interval .(a, b), it
means that all the sets . E 1 , E 2 , E 3 , . . . exist. As they are closed and each one contains all the
following ones, there are, then, common points to all these sets. These points form a closed
set . E ω , contained in . E n and non-dense on any of them.The operation . Oω will consist of
deducing . F(x) in the intervals contiguous to . E ω from the knowledge of . F(x) in the intervals
contiguous to . E n , by passing to the limit.
In a more general way, if the operations of indices less than .α do not provide . F(x) in
entire .(a, b), the operation . Oα is defined as follows:
If .α is finite or transfinite of first kind, the operation . Oα is the one which consists of
calculating . F(x) in intervals contiguous to a closed set . E α , which is obtained as set . H ,
when one assigns the role of . E to . E α+1 .
If .α is transfinite of second kind, then all the . E β sets, for .β < α exist; there are points
common to all these . E β sets. These points form a closed set . E α . The operation . Oα consists
of determining . F(x) in intervals contiguous to . E α , by passing to the limit.
This finite or transfinite sequence of operations, which constitutes the totalisation, was
conceived by M.A. Denjoy. It is clear that the sequence of closed sets . E 1 , E 2 , E 3 , . . ., all
different from those which precedes them and contained within them, can contain only a
finite or countably infinite number of terms. Therefore, the totalisation permits, in all cases,
the determination of the primitive function of a known derivative function, in the entire
interval where this derivative is given.
We will return to the operation of totalisation and investigation of the primitive functions
of derivative numbers later. For the time being, we will modify our operational procedure,
while preserving the essential transfinite reference, and in doing so, we will find the primitive
functions of the derivatives without using the concept of integral of the summable function.10
This procedure generalises that of Sect. 7.1.
10 The possibility of dispelling this notion is certain since we can always replace an integral by one
of the sums which serve its definition chosen in such a way as to commit only a negligible error,
then passing to the limit. However, this always involves the use of the integral, albeit in a concealed
manner.
218 11 The Totalisation
Let .η be an arbitrarily chosen positive number. We will construct a function .(x), which
differs from . F(x) only by .η at most, from the point of view of differential; meaning that,
which is such that, in any positive interval .(α, β), we have
It is clear that, if we know how to construct this function .(x), for any .η, we will deduce
F(x) by passing to the limit.11 The construction of this function .(x) is based on the
.
following remarks:
If we know a function .1 (x) for the positive interval .(x1 , x2 ), and a function .2 (x) for
the positive interval .(x2 , x3 ), then the function .(x) equal to .1 (x) in .(x1 , x2 ) and given
by
.(x) = 2 (x) − 2 (x 2 ) + 1 (x 2 )
in .(x2 , x3 ), is a function .(x) for .(x1 , x3 ). Indeed, if we take an interval .(α, β) located in
(x1 , x2 ) or .(x2 , x3 ), it is clear that we have
.
if we have
. x1 ≤ α < x2 < β ≤ x3 ,
we have
.|[(β) − (α)] − [F(β) − F(α)]| < |[(β) − (x2 )] − [F(β) − F(x2 )]|
+ |[(x2 ) − (α)] − [F(x2 ) − F(α)]|
≤ η(β − x2 ) + η(x2 − α).
If we have an increasing (or decreasing) sequence of numbers .xi tending towards a limit
.X , then the repeated application of previous procedure yields, for every .(x, X ), a function
. derived from functions .i (x) relative to the intervals .(x i , x i+1 ).
It is obviously sufficient to prove that the function. resulting from the construction in the
statement is continuous at point . X . Now we have, by assuming, for example, an increasing
sequence with .xn < x < X ,
The method which is about to be presented, and that I introduced in an article of Acta Mathemat-
ica (t. .49), deviates more significantly from the totalisation as just presented and, on the contrary,
approaches the construction of the proof of the sufficient condition of theorem of M. Baire.
11 The right-hand superior derivative number . (x), for example, tends towards . f (x) when .η
d
tends towards zero; we will compare the construction of .d (x) and the function .ϕ(x) defined in
Sect. 11.1.
11.2 The Primitive Functions of Everywhere Finite Derivatives 219
therefore, in .(xn , X ), the oscillation of .(x) is, at most, the oscillation of . F(x), augmented
by .η(X − xn ). Therefore, the oscillation of .(x) in .(xn , X ) tend towards zero with the
length of .(xn , X ).
In particular, it follows from this that when we know how to determine a function .(x),
in any interval .(α, β), entirely interior to an interval .(a, b),
where . f 0 is one of the values taken by . f (x) on .E ; the indices .a and .x indicate that we will
deal only with the subsets of .E and with the intervals contiguous to .E located in .(a, x).
It will be sufficient for us to show this proposition for .x = b. To do this, let us cover .(a, b)
starting from .a with a chain of intervals, some of which will be intervals contiguous to .E ,
and the other intervals of length .λ at most, having as origin and extremity points .x, x + h
from .E , and such that we have
m ≤ r [F(x), x, x + h] ≤ M,
.
where.m and. M are two numbers at a distance.η at most, and all the values taken by. f (x) on.E
are contained between them. We will evaluate. F(b) − F(a) using this chain, but first, it must
be noted that the series . [F(β) − F(α)], extended to the intervals .(α, β), contiguous to
.E , is absolutely convergent, because . F(β) − F(α) differs from .(β) − (α) by .η(β − α)
at most.
This being said, the intervals of the chain will give us as contribution to . F(b) − F(a) as
follows:
On one hand, a part of the sum . [F(β) − F(α)], containing in particular all the terms
arising from the intervals .(α, β) with a length greater than .λ, therefore tending towards
. [F(β) − F(α)] when .λ tends towards zero; and on the other hand, the measure of the
lengths of intervals of the second kind, that is, a measure tending towards .m(E ) when .λ
tends towards zero, multiplied by a number included between .m and . M.
The sum of these two contributions is
. [(β) − (α)] + f 0 m(E )
within
220 11 The Totalisation
.η (β − α) + ηm(E ) = η(b − a).
The statement is legitimate.
We have seen that, a closed set .E being given, it is possible to determine an interval .i
containing points of .E in its interior, in which . f (x) is within .η on .E , and for which the
sum . i [F(β) − F(α)], extended to subsets of the intervals contiguous to .E , which are
situated in .i, is absolutely convergent. Then, if we know the functions .(x) for each interval
contiguous to .E , the sum . i [(β) − (α) is also absolutely convergent, and we are within
the conditions of application of the previous theorem.
In other words, as soon as the first of the conditions in the previous statement is met, the
points of .E which are not interior to the intervals .i, in which the three conditions of that
statement are satisfied, necessarily form a closed set, which is non-dense everywhere on .E .
As a result, if we are able to determine functions . for all intervals contiguous to a closed
set .E , we can determine functions . for all intervals contiguous to a closed set .H, formed
from points of .E and non-dense everywhere on .E .
It is then clear that this statement allows us the construction of .(x) through transfinite
induction:
The operation . O1 will be the one in which we take .(a, b) as the set .E . The set .H will be
a set .E1 , which contains all the points at which the oscillation of . f (x) is greater than .η, and
some of those at which the oscillation is equal to .η. . O1 will reveal .(x) in every interval
not containing points of .E1 in its interior.
If .α is finite or transfinite of first kind, the operation . Oα−1 would have revealed . in any
interval not containing points of a closed set .Eα−1 in its interior. The operation . Oα will be
the one in which .Eα−1 plays the role of .E , it will lead as set .H to a set .Eα and reveal .(x)
in any interval not containing any point of .Eα in its interior.
If .α is of second kind, the points common to all .Eβ sets for .β < α, form a set .Eα . The
operations . Oβ have formed .(x) in every interval not containing any point of .Eα , neither
in its interior nor as origin or extremity.
The operation . Oα will provide .(x) in every interval not containing the points of .Eα in
its interior.
The function .(x) will be provided in entire interval .(a, b) through this transfinite induc-
tion. To determine this function, which, in the preceding discussion, depends on arbitrary
choices, it would be sufficient to fix these choices by laws. This would be easy, but it is
completely futile to insist on it.
Then, since .(x) being defined for each number .η, by letting .η tend towards zero, we
would have . F(x) as the limit of .(x).
Therefore, we have two sightly different transfinite procedures, both of which allow us
to obtain the primitive function . F(x) from a given derivative . f (x); let us show by examples
that all the planned steps of these transfinite processes are necessary.12
12 A new operation . O may be required due to the fact that one or another of the various conditions
α
stated in our propositions are not satisfied in the entire.(a, b). We may aim to show, through examples,
11.2 The Primitive Functions of Everywhere Finite Derivatives 221
Let us denote by.ϕ(x), a function defined on.(0, 1), which is continuous and differentiable,
such that we have
.|ϕ(x)| ≤ x , |ϕ(x)| ≤ (1 − x)2 ,
2
which is of bounded variation in.(h, 1 − h), and of unbounded variation in.(0, h), (1 − h, 1),
however small and positive .h may be, and whose derivative is continuous except for .x = 0
and .x = 1.
We could take for example,
1
ϕ(x) = x 2 (1 − x 2 )sin
. ;
x 2 (1 − x 2 )
the set of roots of .ϕ (x) = 0 then forms a set whose derivative reduces to .0 and .1. But we
can also choose .ϕ(x) in such a way that, among the roots of .ϕ (x) = 0, all the points of any
perfect or closed set can be found.
Let.λ be any finite or transfinite number. We will define the functions . F1 (x), F2 (x), . . . , Fλ (x)
whose determination from their derivatives will respectively require the operations . O1 ; O1
and . O2 ; . O1 , . O2 and . O3 ; .. . . ; O1 , . O2 , .. . . , Oλ . And this applies to both of our two transfinite
processes, whether one or the other.
For that let us arrange in a simply infinite sequence . S, the finite and transfinite numbers
up to .λ, say .λ1 , λ2 , . . .. If .β is a transfinite number of second kind, at most equal to .λ,
we will call the sequence determining .β, the one obtained by crossing out, in . S, first every
number greater than or equal to .β, and then any number in the resulting sequence that is
preceded by a larger number.
Let us denote by . E a closed set non-dense everywhere, chosen once and for all in .(0, 1).
. F1 (x) would be the function .ϕ(x).
. Fα (x) would be, for an index .α, finite or transfinite of first kind, the function zero on . E
and equal to .(m − l)Fα−1 m−l x−l
, in the interval .(l, m) contiguous to . E. . Fβ (x) would be,
for transfinite .β of second kind and determined by sequence .β1 , .β2 , .. . ., the function equal
to
1 1
. 2 x− p
p
2 p Fβ p 2
for
1 1
≤ x ≤ p−1 .
.
2p 2
It is clear that the functions thus constructed are continuous and have derivatives which are
formed from .ϕ just as the . F are formed from .ϕ. We immediately see that the determination
of the function . Fγ , whatever be its index .γ, from its derivative, requires the operations
that each of these conditions, by itself, requires all the steps of transfinite. This is what M. Denjoy
has done. The reader may refer to his work. Here, we will only show that the set of conditions in our
propositions necessitates the use of transfinite in all its generality.
222 11 The Totalisation
which are zero at the origin of these contiguous intervals. There is no need to integrate over. E,
since . F2 (x) is zero on . E. But it would be sufficient to add to each function . Fα (x) a function
.u α (x) of continuous derivative, in order to obtain a function .Fα (x), whose determination
from its derivative would require all the operations . O1 , . O2 , up to . Oα , these operations
comprising integrations over sets-integration that are exact in the first method, that is, the
totalisation method, and approximate in the second.
When attempting to extend the methods from the previous section to the determination of the
primitive function of a given derivative number, which is everywhere finite, we encounter
difficulties from the start. These methods are indeed based on the fact that a derivative is a
function of class one at most and, as a result, is point-wise discontinuous on any perfect set.
However, we only know, Sect. 10.1, that the derivative numbers are of the second class at
most, and from this we have only been able to deduce that they are . B measurable.
If we examine the two methods from previous section a little more closely, we note
that while the second one indeed makes use of the fact that the derivative is point-wise
discontinuous on any perfect set, the first one, in reality, relies on a proposition that we can
formulate thus: An everywhere finite derivative function is point-wise unbounded on any
perfect or closed set.14
However, M. Denjoy obtained a proposition concerning any derivative number, which
precisely replaces the previous one in the search for the primitive function, and it can be
stated as follows:
When the right-hand superior derivative number .d F(x) of a continuous function . F(x)
is everywhere finite or at least never equal to .+∞, it is point-wise unbounded from above
on any closed set.
Or, in a more precise way: when the right-hand superior derivative number .d F(x), of
a continuous function . F(x) is not equal to .+∞ at any point of a closed set . E, then there
exists a positive number . M and an interval . I , containing the points of . E in its interior, such
13 Indeed, it happens that, on the closed sets that the transfinite sequence of operations lead us to
consider, there is an identity between the points of discontinuity of the derivative, its points of non
summability and the points around which the derivative is unbounded.
14 It is appropriate to make, in context of this statement and the following ones, an observation similar
to the one formulated in note, Sect. 7.2; therefore, we must understand that any derivative is either
bounded or point-wise unbounded on any closed set.
11.3 The Primitive Functions of Everywhere Finite Derivative Numbers 223
that for any interval .(α, β) whose origin is the point of . E and of . I , we have
. r [F(x), α, β] < M.
It is clear that the second statement implies the first15 one. It is also clear that, when we have
demonstrated them for any perfect set, they will be proved by that very fact for any closed
set . E, since any isolated point of . E is the point at which .d is not equal to .+∞, and as a
result, can be enclosed in an interval . I satisfying the conditions of the second statement.
Proof of the second statement.16
Let us denote by . E n, p the set of points .x0 of a perfect set . E for which we have
r [F(x), x0 , x0 + h] ≤ n,
.
.r [F(x), α, β] ≤ M,
. F(b) − F(a) ≤ d F(x) d x + [F(β) − F(α)].
E−E in
15 This second statement refines the first one, just as the one in Sect. 11.1 refines the necessary
condition for a function to be of class one.
16 The two previous statements replace the proposition that M. Denjoy calls the first fundamental
theorem (descriptive) concerning derivative numbers.
17 This property will replace here the second fundamental theorem (metric) concerning the derivative
numbers, of M. Denjoy.
224 11 The Totalisation
Let .G(x) be the continuous function equal to . F(x) at points of . E and linear in intervals
contiguous to . E. For .x0 to be the origin or the interior of such an interval .(α, β), we have
if .k is the upper limit mentioned in the statement. At the points of . E, which are not origins
of intervals contiguous to . E, we have moreover
d G(x) ≤ d F(x) ≤ k.
.
where .d G(x) is bounded from above in any .(a, b), G(x) is of bounded variation and we
have, and where we denote .(a, b) as . H0 , and the set of points where .d G(x) = −∞ as . E G
in
. G(b) − G(a) = d G(x) d x − N (E G )
in
H0 −E G
in
= d G(x) d x + [G(β) − G(α)] − N (E G
in
);
H0 −E G
in
the symbol . N (E G in ) has the meaning indicated in Sect. 10.1. At any point of . E, we have
.d F(x) ≥ d G(x); inequality of which the second member is bounded except at the points
in . Therefore, the set . E in is contained in . E in , and as a result, is of measure zero.
of . E G
G
Furthermore, the series . lεm(El ) cannot be less than
. [d G(x) − ε] d x;
in
E−E G
since, almost everywhere on . E, d F(x) is at least equal to .d G(x). To be more precise,
we can say that negative values of .l provide sets . El which contribute at least as much to
. lεm(El ) as they do to the previous integral, while the positive values of .l give .lε at most
equal to .λ, therefore . E−E in d F(x) d x exists and is at least equal to . E−E in d G(x) d x.
G
It is true that the two integrals are extended to the different intervals, . E − E in and . E − E G
in ,
In the case where . E in does not exist, let us cover the entire interval .(a, b) from .a, using a
chain of intervals chosen as it follows.18
Let us define, as it was done in Chap. 9, Sect. 10.1 and following, sets . El and . Al by
considering numbers .ε, η, ζ. However, this time we are constructing . El using only points
of . E. The intervals in the chain with origins at points in . E are chosen to satisfy the three
conditions indicated in Chap. 9. The interval with origin at a point .x0 in an interval .(α, β)
contiguous to . E is the interval .(x0 , β).
The intervals of the first kind yield a contribution to expression . F(b) − F(a), which
tends towards . E d F d x when .ε, η, ζ tend towards zero.
An interval .(x0 , β) of the second kind gives a contribution
.F(x0 ) − F(α) is at most .k(x0 − α), therefore the sum of the terms . F(x0 ) − F(α) is at most
.kη (see, in Chap. 9 the significance of .η) and as a result, have zero or negative limits when
As a result we have
. F(b) − F(a) ≥ d F d x + [F(β) − F(α)].
E
From these two theorems it follows, in particular, that: If . E is a closed set, at the points
of which .d F(x) is finite, there exists an intervals .i containing points of . E in its interior,
in which .d F(x) is summable over . E, and for which the series . [F(β) − F(α)] of the
increments of . F in intervals contiguous to . E is absolutely convergent.
Let us show, without assuming this time .d F(x) to be bounded from above in .i, that the
increment of . F in .i is the sum of the integral of .d F(x) on a subset of . E located in .i, and
of the series . [F(β) − F(α)] relative to . E and .i, which we would denote
E
A F(x) (i) =
. d F(x) d x + [F(β) − F(α)].
i,E i
Indeed, if it was not so, the points of .i which are not interior to intervals . j in which we
would have
18 It is clear that we can replace in the above, the use of results borrowed from Chap. 9, with their
demonstration using chains of intervals; thus, at the cost of some length, we will have a more homo-
geneous presentation.
226 11 The Totalisation
E
.A F(x) ( j) = d F(x) d x + [F(β) − F(α)],
j,E j
would form a closed set . H . An interval contiguous to . H is the sum of a countably infinite
number of . j intervals without common interior points. For each of these . j intervals we have
the previous equality. The sum of all these equalities may be carried out since, by hypothesis,
E
.
i,E |d F(x)| d x and. i |F(β) − F(α)| exist. And this proves that any interval contiguous
to . H is itself a . j interval.
Now, from the theorems of this section, it follows that we can find an interval.λ containing
points of . H in its interior, for which .r [F(x), x0 , x) + h] is bounded when .x0 is a point of
. H , so that we have
H
.A F(x) (λ) = d F(x) d x + [F(m n ) − F(lr )]
λ,H λ
= d F(x) d x + A F(x) (δn ),
λ,H λ
δr = (m r , lr ) denotes the intervals contiguous to . H and located in .λ. Each .δr being a . j
.
interval, we have
E
. A F(x) (δr ) = d F(x) d x + [F(β) − F(α)].
δr ,E δr
Hence, it follows
E
. A F(x) (λ) = d F(x) d x + [F(β) − F(α)],
λ,E λ
there exists a positive number . M and an interval . I containing in its interior, points of . E,
such that, for every interval .(α, β), whose origin is a point from . E and . I , we have
r [F(x), α, β] < M.
.
11.3 The Primitive Functions of Everywhere Finite Derivative Numbers 227
Indeed, . E is still the sum of a countably infinite number of closed sets, namely the . E n , and
various points . Pi , each considered as forming its own closed set. These sets cannot all be
non-dense everywhere on . E, Sect. 11.1; however, all the . Pi are non-dense on . E. Therefore,
one of . E n is dense in . E within an interval, meaning that it is identical to . E within that
interval, and the proof is complete as previously described.
Then, agreeing as always that we will leave aside the points . Pi at which .d F(x) is
infinite in the study of the summability and for the calculation of the integral, given a closed
set . E, there exists an interval .i:
Indeed, either . E contains an isolated point, in which case an interval .i containing only that
single point of . E in its interior answers the question; or . E is perfect, and nothing changes
in the previous reasoning.
From this, it can be further deduced that whenever the conditions .a. and .b. are met, .c.
follows. The only difference is that the interval .λ containing points of . H , will be defined
by the condition that .r [F(x), x0 , x0 + h] is bounded for .x0 , a point of .i and . H , only if . H is
perfect. If . H is not perfect, an interval containing a single point of . H will be taken for .λ.
Thus, totalisation still allows the calculation of . F(x) when we know the finite value of
one of its derivative numbers, except at the points of a countable set.
There is a distinction to be made between the results in this section and the preceding
one. The fundamental operation of the totalisation consists of, given a closed set . E and
the function . F to be obtained already known up to an additive constant in the intervals
contiguous to . E, determining an interval .i containing points from . E for which . i,E f (x) d x,
E
. i [F(β) − F(α)] exists. However, in the case where . f is a derivative, we were able to
restrict.i further to ensure that. f is bounded on the subset. E i of. E located in.i. If. f is a superior
derivative number, we were able to restrict .i further to ensure that . f is bounded above on
. E i . Thus, there are methods of special totalisation alongside the general totalisation, which
Let us begin by clarifying the definition of a finite or transfinite sequence of operations that
constitutes totalisation. This sequence of operations is performed based on a given function,
denoted by. f (x), assumed to be finite everywhere19 in.(a, b). When the indicated operations
are feasible,. f (x) is considered totalisable. Totalisation then associates a continuous function
. F(x) with . f (x) in the entire interval .(a, b), which is the indefinite total of . f (x). . F(x) is
determined only up to an additive constant. If .(l, m) is any interval contained in .(a, b), the
increment . F(m) − F(l) of . F(x) in .(l, m) is the definite total of . f (x) in .(l, m). Therefore,
totalisation can be considered for two different purposes; for obtaining a function, which is
indefinite totalisation; or obtaining a number, which is definite totalisation.20
The operations of the totalisation are constructed based on the following two:
A. We assume the indefinite totals . Fk (x) are known in intervals .(ak , bk ) such that we have
the .ak tend towards .l and the .bk tend towards .m. Then, we form the continuous function
F(x), equal to . F1 (x) in .(a1 , b1 ), equal to
.
. f dx + [F(β) − F(α)],
E
19 Previously we had neglected the points where . f was infinite, and therefore, took . f (x) = 0 at
these points.
20 This very coherent language introduced by M. Denjoy, clearly shows a complete parallelism
between totalisation and integration. However, it should be noted that, in the case of integration, it is
the concept of definite integral which is primordial, while, in the case of totalisation, it is the notion
of indefinite total which is most important. Totalisation is more directly related to the integration of
differential equations than to the calculations of quadratures: the definite total is equivalent to the
definite integral of Duhamel (Chap. 6).
11.4 The Totalisation 229
1. The operation . A must lead to a function . F(x) which is continuous at .l and .m;
2. For any closed set .E , there must exist an interval .(l, m) enclosing points of .E such
that, on the subset . E of .E located in .(l, m), . f (x) must be summable, and the series
. [F(β) − F(α)], extended to the intervals contiguous to . E, is convergent.21
These conditions being satisfied; taking .E as the interval .(a, b) itself, we see that the points
of .(a, b) where . f (x) is not summable, form a set . E 1 , non-dense everywhere in .(a, b).
Operations . B, followed by operations . A, reveal . F(x) in every interval contiguous to . E 1 ;
this set of operations constitutes the first operation . O1 of the totalisation.
If .α is a finite or a transfinite number and if the operations with index lower than .α have
not revealed . F(x) in the entire .(a, b), they have revealed . F(x) in every interval that does
not contain any points of a certain closed set . H . If .α is not of second kind, this closed set . H
is called . E α−1 , and it has been provided by operation . Oα−1 , which revealed . F(x) in every
interval contiguous to . E α−1 . Then, by taking . E α−1 as the set .E , it follows from the second
condition satisfied by . f (x) that points of . E α−1 , which are not interior to intervals in which
we can carry out the operation . B, form a closed set . E α non-dense everywhere on . E α−1 .
Operations . B, followed by operations . A, reveal . F(x) in every interval contiguous to . E α .
The set of these operations constitute the operation . Oα of the totalisation.
If .α is of second kind, the set . H consists of the points common to all the sets . E i with
indices smaller than .α. This set . H is then denoted by . E α , and the operation . Oα is reduced to
the operations . A necessary to construct . F(x) in intervals contiguous to . E α from the known
functions . F(x) in intervals contiguous to . E i with indices lower than .α.
The sets . E 1 , E 2 , . . . are closed, and since each contain all that follow, we know they are
finite or countably infinite in number. Therefore, the totalisation clearly determines . F(x) in
the entire .(a, b) after a finite or countably infinite number of operations . Oi .
Thus, the stated conditions are sufficient for the operations of totalisation to be possible,
but it is not so obvious whether they are necessary. The first condition is certainly necessary
since it is indispensable for the continuity of . F(x) at .l and .m. However, it is sufficient for the
second condition to be satisfied when taking .E as the sets . E α to which the very operations
of totalisation lead, along with the subsets of . E α located in various interval .(l, m) contained
in .(a, b), for the operations of the totalisation to be legitimate. Now, we will see that as
soon as the second condition is met for these special sets—meaning when totalisation is
possible—the second conditions is fulfilled for any closed set .E0 .
Let . O N be last operation of totalisation. Therefore, the set . E n does not exist, while all
. E i with indices lower than . N do
22 exist.
21 We could associate with this statement the following properties: the sum of several totalisable
functions is totalisable; the total is the sum of the totals.
22 The index . N of this last operation is, therefore, never a transfinite number of second kind.
230 11 The Totalisation
Let .α be the smallest index such that .E0 is not entirely in . E α . The number .α exists and
is less than . N ; moreover, .α is not of second kind, as otherwise . E α would contain all the
points belonging to . E with indices lower than .α, and would contain .E0 . Therefore there is
a set . E α−1 which contains .E0 . Since . E α does not contain all of .E0 , we can find an interval
.(l, m) entirely interior to an interval contiguous to . E α and in which there are points of .E0 ,
where the sets of the right-hand side are pairwise disjoint. Now, since in .(l, m) there are no
points of . E α , the operation . Oα reveals . F(x) in .(l, m), and therefore in any portion of .(l, m).
Therefore, . f is summable over .eα−1 , and hence, also on .e0 , and on the .ei , and we have
. f dx = f dx − f d x.
eα−1 e0 ei
e
α−1
The series. [F(β) − F(α)] of increments of. F(x), in intervals contiguous to.eα−1 and
(l,m)
ei
contained in .(l, m) is therefore convergent, and hence, so is the series . [F(β) − F(α)].
(αi ,βi )
And we have
eα−1
. F(m) − F(l) = f dx + [F(β) − F(α)];
eα−1 (l,m)
which can be written, since convergence of the series necessitates its absolute convergence,
⎧ ⎫
⎨ ei ⎬
. F(m) − F(l) = f dx + f dx + [F(β) − F(α)]
e0 ei ⎩ ⎭
(αi ,βi )
⎧ ⎫
⎨ ei ⎬
= f dx + f dx + [F(β) − F(α)] .
e0 ⎩ ei ⎭
(αi ,βi )
Now, if . F(βi ) − F(αi ) was provided by an operation prior to . Oα , then .ei does not exist and
the summation sign . reduces to . F(βi ) − F(αi ). If . F(βi ) − F(αi ) was provided by . Oα ,
then we have
ei
. F(βi ) − F(αi ) = f dx + [F(β) − F(α)].
ei (αi ,βi )
11.4 The Totalisation 231
In other words, the operation . B applies to .e0 . Therefore, the second condition stated in
Sect. 11.4 is satisfied by .E0 .23
Therefore, We have characterised the totalisable functions. Let us try to characterise the
functions given by totalisation: the indefinite totals.24
A function . F(x) is an indefinite total if, and only if:
1. It is continuous;
2. . E being a closed set, the continuous function .G(x), equal to . F(x) at the points of . E and
linear in any interval contiguous to . E, is absolutely continuous in an interval containing
points of . E in its interior.
These conditions are necessary; the two properties, stated in Sect. 11.4, that a totalisable
function . f (x) possesses, immediately imply the previous properties for the indefinite total
. F(x) of . f (x).
These conditions are sufficient; because once they are met, we can, by reference to
transfinite, construct a function. f (x) whose. F(x) is the indefinite total, operating as follows:
23 The reader could also use this mode of reasoning to prove that if we have managed to attach a
definite total to . f (x) taken in .(a, b) using operations . A and . B, but subjecting the closed sets . E
appearing in the statement of . B to the conditions indicated in this statement and, in addition, to the
supplementary conditions (Sect. 11.3), the number obtained is the one that the general totalisation
would have attached to . f (x) taken in .(a, b).
In other words, there is never a disagreement between the numbers or functions provided by
special totalisation and the general totalisation.
There is no need to elaborate on this reasoning here because in Sects. 11.2 and 11.3 of this chapter,
we have justified the use of the general totalisation for the search for the primitive functions. However,
we have noted that certain special totalisations are sufficient to obtain the result.
24 In Comptes rendus in .1912, M. Denjoy solved the problem of primitive functions for derivatives
using a special totalisation which he latter referred to as complete totalisation. In Comptes rendus
in .1915, he implicitly introduced the general totalisation for solving the problem of the primitive
functions for derivative numbers. It was not until his Memoirs, published from .1916 onwards, that
he embarked on the study of all questions relating to totalisation. Before the publication of these
Memoirs, various authors, building on the results already published by M. Denjoy, had also delved
into the study of these questions.
It is worth mentioning that M. Lusin was the first to characterise the indefinite totals (Comptes
rendus, in .1912,) and to study the differentiation of the indefinite totals (Moscow thesis, .1915).
However, he did so only for the indefinite totals provided by complete totalisation.
M. Khintchine (Comptes rendus .1916) introduced a new method of differentiation to study the
differentiation of the indefinite totals provided by general totalisation, which he called the asymptotic
derivative and which we will refer to as the approximate derivative, following M. Denjoy.
232 11 The Totalisation
Let us take the set . E as the interval . H0 = (a, b) itself, where .G(x) is identical to . F(x).
Therefore, the set of points of non-absolute continuity of . F(x) is non-dense everywhere in
.(a, b). Let . H1 be this set. Outside of . H1 , we take for . f (x) the derivative of . F(x) where it
elsewhere.
To complete the definition of . f (x), it is sufficient to state that by . Hβ , where .β is a
transfinite number of second kind, we mean the set of points common to all the . H with
indices lower than .β.
Our statement is thus legitimate. We will now show an equivalent statement due to M.
Denjoy.25
When a continuous function . F(x) satisfies that the series
. [F(β) − F(α)],
extended to the intervals contiguous to a closed set . E, is convergent, we say that the incre-
ment26 of . F(x) over . E is defined and equal to
. F(b) − F(a) − [F(β) − F(α)].
25 The main interest of M. Denjoy’s statement, lies in the fact that it is the culmination of a fine
analysis of certain notions, such as that of a function of bounded variation, for example. The reader
may refer to the Memoirs of M. Denjoy.
It is necessary to state that the sufficient condition of M. Denjoy appears to demand less, but
potentially more immediately applicable than the previous one. However, in what follows, we have
not needed the statement of M. Denjoy.
To better illustrate the difference between the two statements, let us note that the first one is
equivalent to the following: For a continuous function . F(x) to be an indefinite total, it is necessary
and sufficient that, for any closed set . E, there exists an interval .(l, m) containing points of . E in its
interior, such that if we take a set of non-intersecting intervals . I whose extremities belong to . E and
.(l, m), the sum of the increments of . F(x) in the intervals in . I , tends towards zero with the measure
of . I .
If we compare this statement to the one given later in the text, we see that the latter requires, as a
sufficient condition, that a certain property holds for every closed set of measure zero, whereas the
statement of this note requires that the same property holds in a more uniform manner.
26 M. Denjoy used the word variation in place of the increment.
11.4 The Totalisation 233
When this is the case, the increment of . F(x) over . E is the limit of the increment of . F(x)
in a family of intervals . I , consisting of a finite number of non-intersecting intervals whose
origins and extremities are the points of . E, which enclose . E, and whose measure .m(I ) tends
towards that of . E. Indeed, the complement of . I is formed of intervals contiguous to . E, and
any interval contiguous to . E ends up being a subset of this complement when .m(I ) tends
towards .m(E).27 In particular, when the function .G(x) associated with . E is absolutely
continuous, the increment .A F(x) (E) of . F(x) over . E is defined and equal to the number
.AG(x) (E) which follows from the previous definitions, applicable only to the absolutely
This condition is sufficient. According to the first statement, if . F(x) is not an indefinite
total, there exist a closed set . E such that the corresponding function .G(x) is not absolutely
continuous in any interval containing points of . E in its interior. We will show that we can
even replace this set . E with another of measure zero. For that, we will distinguish three
cases.
a. Let us suppose that for any interval .(l, m) containing in its interior points of . E, the series
E
. [F(β) − F(α)], extended to the subsets of the intervals contiguous to . E located in
(l,m)
.(l, m), contains an infinite number of positive terms with an infinite sum.
E
Let us choose a finite number of terms of . [F(β) − F(α)] such that their sum exceeds
(l,m)
.1; let .(α1 , β1 ), . . . , (αi , βi ) be the corresponding intervals. Let us take the intervals
.(a1 , α1 ), (β1 , b1 ), . . . , (ai , αi ), (βi , bi ), that are not included in the chosen intervals
.(α, β), have their origins and extremities at point of . E, and form a set . I1 of measure .ε
27 The increment of . F(x) on . E is therefore defined by one, and determined by one, of the methods
that can be adopted when we take into account the . B measurable nature of . E. See Sect. 9.1.
28 The same conclusion is true of all the continuous functions of bounded variation . F(x), if for them,
the increment on a closed set is defined as described in Chap. 8.
234 11 The Totalisation
E
. [F(β) − F(α)] > 0,
(l,m)
for every interval (l,m) of . I1 . Finally, . I2 must have the measure smaller than . 2ε .
It is clear that by continuing in this manner, we would construct the sets . I1 , I2 , . . . of
measures tending towards zero, each of which contains the following ones and, conse-
quently, the points common simultaneously to all of these sets form a set .E of measure
zero, closed, and even perfect. Among the intervals contiguous to .E , there are, in particu-
lar, all the intervals .(α p , β p ) that are used to construct . I1 , all those used for constructing
E
. I2 , . . . . Therefore the series . [F(β) − F(α)] extended to the intervals contiguous to
(λ,μ)
E and located in an interval .(λ, μ) containing in its interior points of .E , is divergent.
.
The increment of . F(x) over the subset of .E located in .(λ, μ) is not defined; . F(x) is non
resolvable.
E
b. When the series . [F(β) − F(α)] does not always contain an infinite number of pos-
(l,m)
itive terms with a sum .+∞, nor does it always have an infinite number of negative
terms with a sum .−∞, for any interval .(l, m) containing points of . E, it means that there
E
exists such an interval .(l, m) for which the series . [F(β) − F(α)] is convergent. If
(l,m)
E
we remove the points of . E outside of .(l, m), we can say that . [F(β) − F(α)] is
(a,b)
convergent.
Let us suppose that .G(x) is not of bounded variation in any interval .(l, m) containing
points of . E. Then, let .(l, m) be an interval which contains points of . E in its interior. In
.(l, m), . G(x) is of unbounded variation; therefore, we can find a finite number of intervals
in .(l, m) whose set . I provides a value .AG(x) (I ) as large as we want. Moreover, we can
remove any number of intervals from . I not containing the points of . E in their interior,
as this modifies .AG(x) (I ) only by
E
. |F(β) − F(α)|
(l,m)
11.4 The Totalisation 235
at most. As a result, we can choose the intervals . I with origins at the points of . E, which
are not origins of intervals contiguous to . E, and with extremities at points of . E, which
are not extremities of intervals contiguous to . E, assuming . I to be of measure as small
as we want. In such a set . I , we have .A F(x) (I ) = AG(x) (I ).
With this in mind, let us choose in .(a, b) a sequence of sets of intervals . I1 , I2 , . . ., satis-
fying the following conditions: each of them contains the following ones, the measures
of . I p tend towards zero, each origin of an interval must have points of . E to its right as
close as we want, and each extremity must have points of . E to its left, of which it is the
limit point. The origins and extremities of the intervals of . Ik−1 are origin and extremities
of intervals of . Ik . Finally, the subset .i k of . Ik contained in any of the intervals of . Ik−1 ,
must yield a value greater than .k for .AG(x) (i k ). It is clear that the complement . Jk of . Ik
provides an increment .A F(x) (Jk ) which tends towards .−∞ since we have
and the same holds if we envisage only the subset of . Jk located in an interval .(l, m)
containing points of . E. In this case, the function . F(x) is not resolvable; because the
increment of . F(x) is not, in fact, defined on any subset of the set .E , which is perfect and
of measure zero, formed of the points common to all the . Ik .
c. If none of the previously examined cases apply, then there exists an interval containing
points of . E in which .G(x) is of bounded variation. By removing points of . E exterior to
this interval, we can assume that .G(x) is of bounded variation in .(a, b) and it admits . E
as the set of points of non-absolute continuity. We know that if we decompose .G(x) in
its absolutely continuous kernel and the positive and negative variations of its function
of singularities according to the formula
at least one of the two numbers . Ps (b), Ns (b) is non-zero; let us assume that . Ps (b) is
positive.
We can find a set. I1 consisting of a finite number of intervals whose origins and extremities
are points at which . Ps (x) is increasing respectively to the right and left, and therefore
are points of . E. Additionally, we can assume that the measure of . I1 is less than .ε, and
that we have the two inequalities
In . Ik we can find a set . Ik+1 of measure less than . 2εk , formed of a finite number of intervals
whose extremities and origins satisfy the same conditions as above. The family of these
extremities and origins for . Ik+1 includes those related to . Ik , and for any interval .(λ, μ)
in . Ik , the subset .i k+1 of . Ik+1 contained in it satisfies the inequalities
236 11 The Totalisation
where .εk are the numbers such that the product .(1 − 2εk+1 ) is convergent and of value
1
2.
.
It is clear that the set .E formed from points common to all these . Ik is perfect and of
measure zero, the increment in . F(x), that is, .G(x), on .E is defined and its value is the
limits of .AG(x) (Ik ). Now, we have
and as the first of these numbers tend towards zero with the measure of . Ik , we have
Ps (b)
A F(x) (E ) = AG(x) (E ) = lim AG(x) (Ik ) ≥ Ps (b)(1 − 2εk ) =
. > 0.
2
Thus, the increment in. F(x) on.E is not zero, and a similar conclusion holds for any subset
of .E contained in an interval containing points of .E in its interior. . F(x) is non-resolvable,
and M. Denjoy’s criterion is entirely legitimate.
In our first criterion, we saw how, given an indefinite total . F(x), we could determine a
function . f (x) of which . F(x) is the indefinite total. We did this using some sets . H1 , H2 , . . . ,
defined by considering points of non-absolute continuity of. F(x) and certain functions.G(x).
Let us show that these sets . H can be defined just as well from any function .ϕ(x) that admits
. F(x) as its indefinite total.
Indeed, the first operation of the totalisation of.ϕ(x) reveals the exceptional set. E 1 , formed
of points at which.ϕ(x) is not summable. We claim that. E is identical to. H1 . First of all, since
the indefinite total of .ϕ(x) is absolutely continuous at every point not belonging to . E 1 , H1 is
contained in . E 1 . If it was not identical to it, there would exist an interval .(α, β) contiguous
to . H1 and containing several points of . E 1 , therefore also an interval .(α1 , β1 ) contiguous to
. E 1 . In any interval entirely interior to .(α1 , β1 ), .ϕ(x) is summable, by hypothesis, and we
Therefore, almost everywhere in entire .(α1 , β1 ) we have .ϕ(x) = F (x), and .ϕ(x) is
summable over the entire interval .(α1 , β1 )29 since . F(x) is absolutely continuous in the
interior of .(α, β) therefore in .(α1 , β1 ).
However, there exists an interval .(l, m), entirely interior to .(α, β), containing points of
. E 1 and no point of the exceptional set . E 2 , formed by the second operation of the totalisation
of.ϕ. Let us even suppose that.(l, m) contains no points of. E 2 , either as origins or extremities.
Then .ϕ(x) is summable over the subset .e1 of . E 1 , located in .(l, m). However, .(l, m) can be
29 It is sometimes necessary to reduce .(α , β ), on the side of .α if .α was at .α, and on the side of
1 1 1 1
.β1 if .β1 was at .β, so that this remains true in one or the other of these hypothesis.
11.4 The Totalisation 237
expressed as the sum of .e1 and of intervals, or subsets of the intervals contiguous to . E 1 ; let
us write
.(l, m) = e1 + i 1 + i 2 + · · ·
We know that, in .i 1 , ϕ is summable and we have, almost everywhere .ϕ(x) = F (x). There-
fore, the series
. ϕ dx + ϕ dx + ϕ dx + · · ·
e1 i1 i2
is convergent, since we have
m
. |ϕ| + |ϕ| d x + · · · ≤ |F (x)| d x.
i1 i2 l
And this shows that.ϕ(x) will be summable over entire.(l, m), which implies a contradiction.
Thus . E 1 and . H1 are identical.
But the exceptional set . E 2 provided by the second operation of totalisation, which yields
. F(x) from .ϕ(x) is the first exceptional set one would encounter when searching, by totalisa-
tion, of the function.G(x) constructed from. E 1 , Meanwhile, the set. H2 related to. F(x) serves
as counterpart of . H1 for .G(x), meaning it is the set of points of non-absolute continuity of
. G(x). Therefore, It is clear that . E 2 and . H2 are identical; the same holds for . E 3 and . H3 , . E 4
and . H4 , and so on. With . E ω being the set of points common to all . E n of indices lower than
.ω, while . Hω is the set of points common to all . Hn of indices lower than .ω, . E ω and . Hω are
also identical. Continuing this, we see that there is an identity between the . E and . H of the
same index.
At the same time it can be observed that we have: .ϕ(x) = F (x) almost everywhere at
any point exterior to . H1 ; .ϕ(x) = G 1 (x) almost everywhere at any point of . H1 − H2 , where
. G 1 (x) is the function . G constructed using . H1 ; .ϕ(x) = G (x) almost everywhere at any
2
point of . H2 − H3 , where .G 2 (x) is constructed using . H2 , and so on. Now, . H0 = (a, b) can
be expressed as the sum of the sets . Hα−1 − Hα in a finite or countably infinite number;
therefore, finally, .ϕ(x) is determined almost everywhere by its indefinite total . F(x).30
And this determination is always achieved through derivations. Let us examine these
derivations more closely. To differentiate.G 1 (x) at a point.x0 in. E 1 amounts to differentiating
. F(x) at this point, but by taking into account only points of . E 1 ; in other words, we examine
the ratio .r [F(x), x0 , x0 + h] in which .x0 + h is also a point of . E 1 . This is what we call the
derivative of .G 1 (x) over . E 1 . Therefore, for any closed set . E, there exists an interval .(l, m)
containing points of . E, and within .(l, m), . f (x) is almost everywhere on . E, the derivative
(taken over . E) of its indefinite total.
30 We will reach this result more quickly by proving that a function, non-zero almost everywhere has
an indefinite total, not identically equal to zero. Throughout this chapter, I have preferred, despite
the consequent lengths, the analytic examination of the transfinite operational process of totalisation
over the synthetic reasoning, which is though a more rapid reasoning, but in my opinion, would lead
to a less profound understanding.
238 11 The Totalisation
In reality, this statement is proved by the preceding text only if . E is one of the sets
.H1 , H2 , . . .. But if .α is the smallest index such that . E is not completely in . Hα , it means
there is an interval .(λ, μ) containing a subset .e of . E which is contained in . Hα−1 − Hα .
Consequently, almost everywhere on .e, f (x) is the derivative of .G α (x), and therefore, the
derivative over . E of . F(x).
This statement can be advantageously replaced by the following, which is due to M.
Khintchine, and based on the concept of approximate derivative.
A continuous function . F(x) is said to possess, at .x0 , an approximate derivative equal to
. f (x 0 ), if . f (x 0 ) is the derivative of . F(x) over a set of density .1 at point . x 0 .
It is clear that . F(x) cannot have, at .x0 , two different approximative derivatives . f (x0 )
and .g(x0 ), because they would be the derivatives of . F(x) over the two sets . E( f ) and . E(g),
both having density of .1 at .x0 . As a result, . E( f ) and . E(g) would have common points in as
small a neighbourhood of .x0 as we want, and as a result, . f (x0 ) and .g(x0 ) would be equal.
From this, it follows, in particular, that the approximate derivative coincides with the
ordinary derivative when the later exists. Moreover, if . Fa (x0 ) is the approximate derivative
of . F(x) at .x0 , and if .λg , .g , .λd , .d are the four derivative numbers of . F(x) at .x0 , we have
because . Fa (x0 ) is the limit of the ratios .r [F(x), x0 , x0 + h] for which .h is positive, and the
limit of ratios for which .h is negative.
A totalisable function is almost everywhere the approximate derivative of its indefinite
total.
Indeed, almost everywhere at the points of . Hα−1 − Hα , the totalisable function . f (x)
is the derivative, taken over . Hα−1 − Hα , of its total . F(x). However, almost everywhere
on . Hα−1 − Hα , the density of . Hα−1 − Hα is equal to .1. Therefore, almost everywhere on
. Hα−1 − Hα , . f (x) is the approximate derivative of . F(x), and the theorem is proved.
This theorem generalises the one related to summable functions—stating that a summable
function is almost everywhere the derivative of its indefinite integral—, but using a gener-
alisation of the concept of an ordinary derivative. For understand the concept itself, let us
compare a totalised function with the derivative numbers of its total, which the reasonings
in Sect. 11.3 and the following would allow us to do.
First, let us note that to show that a property holds, at most, only at the points of a set of
non-zero measure, it is sufficient to show that it never takes place at all points of a closed set
of measure non-zero.31 Indeed, if it were to take place at all the points of a set .E of non-zero
measure,32 it would be sufficient to enclose the complement of .E in the interior of intervals
whose measure is less than that of the entire interval .(a, b) considered, so that the set . E,
formed of the points not interior to these intervals, becomes a closed set of measure non
zero in which the property holds at all points.
With this in mind, let us refer to two statements at the beginning of the previous Sect. 11.3.
We see that the right-hand superior derivative number .d F(x) of a continuous function
. F(x), is equal to .−∞ only, at most, at the points of a set of measure zero. If it was not so,
we could find a closed set . E of non-zero measure at all points of which .d F(x) = −∞;
according to the second statement of Sect. 11.3. We could even suppose that.r [ f (x), x0 , x0 +
h] is bounded from above for all points.x0 of. E, in which case the theorem of Sect. 11.3 would
apply. However, it asserts that on . E, it asserts that the set of points where .d F(x) = −∞
is of measure zero.
More generally, the set of points where one of the . is equal to .−∞, or one of the .λ is
.+∞ is of measure zero.
Therefore, in what follows, we can argue on the closed sets of non-zero measure at points
of which we have neither . = −∞ nor .λ = +∞.
In the set of points where a continuous function . F(x) has a finite right-hand superior
derivative number, . F(x) almost everywhere has an approximate derivative equal to this
number.
Let us suppose, indeed, that there exist a closed set of non-zero measure . E at the points of
which .d F(x) is finite without being the approximate derivative of . F(x), and let us choose,
Sect. 11.3, an interval where the conditions for application of the theorem in Sect. 11.3 are
met. Then, according to this theorem, the function .G(x) constructed from . E has a derivative
equal to .d F(x) almost everywhere on . E. That is to say, almost everywhere on . E, . F(x) has
a derivative taken over . E, equal to .d F(x) and consequently . F(x) has, almost everywhere
on . E, an approximate derivative equal to .d F(x). Hence it follows, in particular, that a
totalisable function . f (x) is almost everywhere equal to the right-hand superior derivative
number of its indefinite total, in the set of points where this derivative number is finite.
Naturally, we have the same statements for the four derivative numbers.
However, as we will see, there exist the indefinite totals whose derivative numbers are
infinite in sets of non-zero measure.
Let, indeed, . P be a closed set, non-dense everywhere and of non-zero measure; let us
enumerate the intervals contiguous to . P, in any manner.
Successively place these intervals .u 1 , u 2 , . . . on .(a, b). Each interval .u i will be thus
placed in an interval .u i , which is one of those obtained by subtracting .u 1 , u 2 , . . . , u i−1 from
.(a, b). Let .ρi be the length of .u , and the number .ρi will be infinitesimally small with . .
i 1
i
Let us take . F(x) zero at the points of . P and equal in each .u i to a continuous function
whose derivative is continuous, vanishes at the extremities of .u i and is of absolute maximum
√
value equal to . ρi .
√
It is clear that . F(x) is continuous since . ρi tends towards zero with . 1i ; . F(x) is the
indefinite total of the zero function over . P and is equal to . F (x) outside of . P.
Let .ξ be a point of . P which is neither origin nor the extremity of an interval contiguous
to . P. .ξ is found in a sequence of intervals .u i1 , u i2 , . . . . As a point of .u i j , we can affirm that,
240 11 The Totalisation
Therefore, at .ξ at least one of the four derivative numbers of . F(x) is infinite. Since the set
of .ξ has the same measure as . P, which is of non-zero measure, it follows that one of the
four derivative numbers of . F(x) is infinite on a set of points of non-zero measure. However,
. F(x) has zero derivative with respect to . E at every point of . E, and thus its approximate
Additionally, we have
Furthermore, let us assume that in each .u i , . F(x) is chosen in a way that, in .u i , in the
neighbourhood of the extremities of .u i , the curve .Y = F(x) is similar to the one which
represents, in the neighbourhood of .x = 0 the function which, in . (k+1)π , kπ 1 1
is equal to
2k k
.
1 1
k sin x if it is . F1 (x), or equal to . k1 sin x1 if it is . F2 (x). Then, the above equalities
33 This example, in its various forms, is, with minor modifications, extracted from the Memoirs of
M. Denjoy.
11.4 The Totalisation 241
determine the values of four derivative numbers of . F(x) not only at the points .ξ, but at every
point of . P. In addition, if . P is perfect and the density of . P is equal to .1 at every point .ξ of
. P, then . F(x) has an approximate derivative equal to zero at every point of . P.
Therefore, two indefinite totals have been constructed which have a finite approximative
derivative at every point. However, at every point of the perfect set of non-zero measure . P,
we have
Thus, we cannot establish the relationships between a totalised function and its indefinite
total by solely employing the ordinary differentiation taken over an interval; it is essen-
tial to resort to a generalisation of the derivative, namely, differentiation over a set or an
approximate derivative. Consequently, the generalisation of this statement: every absolutely
continuous function has a derivative almost everywhere, which is stated as: any resolv-
able function has an approximate derivative almost everywhere cannot be replaced by a
proposition concerning the existence of the ordinary derivative.34
Nevertheless, we can obtain information about the ordinary derivative by simultaneously
applying the obtained theorems to several derivative numbers, just as we did in Chap. 9 with
the theorems related to integration.
Two derivative numbers of the same function,.d F(x) and .λg F(x) for example, are equal
almost everywhere in the set of points where they are both simultaneously finite. Indeed,
we know, on one hand that . F(x) admits almost everywhere in this set, an approximative
derivative equal to .d F(x), and, on the other hand, this approximate derivative is almost
everywhere equal to .λg F(x).
In particular, a continuous function has a derivative almost everywhere in the set of points
where its four derivative numbers are finite.35
A continuous function . F(x) has its derivative number .λg F(x) finite almost everywhere
in the set of points where .d F(x) is finite.
Otherwise, there would exist a closed set . E of non-zero measure, where .λg F(x) is equal
to .−∞, and yet, for points .x0 in this set, the ratio .r [F(x), x0 , x0 + h] would always be less
than a fixed number .k. Let us then cover .(a, b) from .b, with a chain of intervals chosen as
follows: if .η is a point of . E, we associate with it an interval .(ξ, η) in which we have
34 There exists a mode of totalisation called by M. Denjoy, the complete totalisation, which has been
studied by M. Denjoy and M. Lusin. With this mode, on the contrary, all theorems considered in the
text can be extended to indefinite totals without requiring any differentiation beyond the ordinary
one.
35 P. MONTEL, Comptes Rendus, .1912.
242 11 The Totalisation
However, this is impossible since the right-hand side is as small as we want; the proposition
is therefore proved. We have, of course, a similar statement relative to .g and .d .
By combining the various results obtained, we have the remarkable theorem due to M.
Denjoy:
Except at most at the points of a set of measure zero, the four derivative numbers of a
continuous function,36 . F(x), exhibits one of the following arrangements:
1. The four derivative numbers are equal, meaning there is an ordinary derivative;
2.
g = +∞,
.
λg = d = finite value,
λd = −∞
or
. g = −∞,
g = λd = finite value,
d = +∞.
3.
g = d = +∞, λg = λd = −∞.
.
The two functions . F1 (x) and . F2 (x), constructed earlier, also show that cases .2 and .3 indeed
occur explicitly in sets of points of non-zero measure.
From this theorem, M. Denjoy deduced that a continuous function which does not have,
at any point, all four of its derivative numbers infinite simultaneously is an indefinite total.
Let us show that any continuous function . F(x) that is not an indefinite total has all four of
its derivative numbers infinite at some points. For such a function, there exists a perfect set . E
36 Ms Chisholm Young extended this result to any finite measurable function defined on a measurable
set (Comptes rendus, .1916).
11.4 The Totalisation 243
such that the corresponding function .G(x) is not absolutely continuous in any interval .(l, m)
containing points of . E in its interior. This means that, in .(l, m), the limit of the absolute
value of the increment .AG(x) (I ) of .G(x) over a set . I of non-intersecting intervals does
not tend towards zero when .m(I ) tends towards zero. Since .G(x) is linear in the intervals
contiguous to . E, we may assume that the origins and extremities of the intervals constituting
. I are points of . E, without affecting the greatest limit of .|AG(x) (I )|.
a. In every interval .(l, m), as .m(I ) tends towards zero, the greatest limit of . A G(x) (I ) is
positive and the smallest is negative;
b. Or else, there exists an interval (l,m) for which one of these limits is zero.
a. In this case, in every.(l, m) we can find intervals bounded by points.x0 and.x0 + h, h > 0,
of . E, for which .r [F(x), x0 , x0 + h] is as large or as small as we want. Because, for
example, if this ratio could not exceed .k, then . A G(x) (I ) would be at most .km(I ).
.r [F(x), x 0 , x 0 + h] is therefore, unbounded from above everywhere on . E, therefore,
where .g F(x) = +∞, and since these ratios are also unbounded from below, there are
points where .λd = −∞ and points where .λg = −∞.
Let us specify this result by referring to proof of our first theorem on the derivative
numbers (Sect. 11.3). We had noted then that the set . E n of points .x0 of . E where
r [F(x), x0 , x0 + h] ≤ n,
.
for any positive .h, was a closed set. In the current assumption, this is a closed set which
is non-dense on . E, since in every interval containing points of . E, at some of these points,
.r exceeds .n for some positive values of .h.
either for any positive .h, or for any negative .h, the sum of the four sets similar to . E n , is
closed and non-dense on . E.
If we note that the sum of .En is the set of points of . E where one of the four derivative
numbers is finite, the proof is easily completed. . E cannot be the sum of .En nowhere
dense on . E (Sect. 11.1), therefore there are points of . E that do not belong to any .En ; at
these points, four derivative numbers of . F(x) are infinite.
b. Let us admit that, in an interval, say the interval.(a, b) itself, the smallest limit of. A G(x) (I ),
for .m(I ) tending towards zero, is zero. It is clear that the total negative variation of .G(x)
is bounded, therefore .G(x) is of bounded variation. Moreover, if we decompose .G(x)
in its absolutely continuous kernel and its function of singularities, the later reduces to
244 11 The Totalisation
.t = x + Ps (x),
transforms . Ps (x), AC(x), G(x) into the absolutely continuous functions . ps (t), .ac(t), .g(t)
and . E i p into .ei p , a set . I of intervals enclosing . E i p into a set .i of intervals enclosing .ei p , the
total variations of . Ps (x) and . AC(x), in . I , into the total variations of . ps (t) and .ac(t), in .i.
However, for . Ps (x), the total variation in . I is equal to . Ps (b), that is, the total variation
in entire .(a, b) since . E i p is the set of singularities of . Ps (x). Therefore, the total variation
of . ps (t) in .ei p is equal to its total variation in .(a, b) and, consequently, is non-zero. This
shows that .ei p is of non-zero measure.
For. AC(x), the total variation in. I tends towards zero with.m(I ) since. AC(x) is absolutely
continuous. Therefore, the total variation of .ac(t) is zero in .ei p and consequently, we have,
almost everywhere on .ei p ,
.ac (t) = 0.
then
1 = x (t) + ps (t), ps (t) = 1.
Therefore, the derivative of .g(t) = ac(t) + ps (t) exists and is equal to .1 at all points of a
set .e0 contained in .ei p and having the same measure as .ei p .
By restricting .e0 without changing its measure, we can even assume that at the points
of .e0 , conditions are satisfied that are fulfilled almost everywhere in the transformed set .e
of . E, and thus almost everywhere in .ei p . We thus admit that the points of .e0 are neither
origin, nor extremities of the intervals contiguous to .e and at these points, the four derivative
numbers of function . f (t) which is transformed from . F(x), . f (t) ≡ F(x), exhibit one of the
four associations indicated by the previous theorem.
From the first of these assumptions, as we have . f (t) = g(t) at the points of .e, it follows
that to any value .t0 belonging to .e, we can associate two sequences of values of .t tending
11.4 The Totalisation 245
towards .t0 , one by values greater than .t0 , and the other by values smaller than .t0 , such that
.r [ f (t), t0 , t] has a limit equal to .1. It is sufficient, in fact, to take these sequences of values of
.t belonging to .e; then .r tends towards the derivative of . f (t), taken over .e, which is . g (t) = 1.
.α. that if we are in the case where the four derivative numbers are equal, we have . f (t) = 1;
.β. That if we are in the case where .g = +∞, λd = −∞, λg = d = finite value, this
finite value is equal to .1, since .1 must be common to two intervals .(λg , g ), .(λd , d );
.γ. That if we are in the case where .λg = −∞, d = +∞, g = λd = finite value,
this finite value is .1;
.δ. Finally, we can have .d = g = +∞, λg = λd = −∞.
d = g = λd = λg = +∞;
. (A)
d = g = λg = +∞, λd = −∞; (B)
d = g = λd = +∞, λg = −∞; (C)
d = g = +∞, λd = λg = −∞. (D)
The theorem of M. Denjoy is proved. It follows that if we know a finite function . f (x), and if
we know that it is, at every point, equal to one of the four derivative numbers of a continuous
function . F(x)—equal to .d at some points, to .λd at others, etc.—then the function . F(x) is
determined, and obtained by totalisation of . f (x). Indeed, . F(x) is an indefinite total, and the
totalisable function which when differentiated, is almost everywhere equal to . f (x) in the
set of points where . f (x) = d F(x), in the set of points where . f (x) = λd F(x), etc. These
sets are unknown, but it does not matter since it follows that the function to be totalised is
almost everywhere equal to . f (x) in whole .(a, b).
Thus, the totalisation allows not only solving the problems . A, B, C posed in the Chap. 5
and their extensions . A , B , C ; but it also allows dealing with even broader problems like
the one mentioned earlier.
In reality, this one might appear a little strange; It is hard to understand how one could
have known that . f (x) is everywhere equal to one of the derivative numbers of . F(x) without
knowing at least which one it is at each point. Therefore, the reader will perhaps wonder if
it would be enough to know that the finite function . f (x) is at every point, one of the limits
of .r [F(x), x, x + h], as the values of .h tend towards zero, for the knowledge of . f (x) to
determine . F(x) up to a constant? The answer is negative. Let us note, indeed, as M. Denjoy
did, that the function . F2 (x), Sect. 11.4, has a definite derivative . F2 (x) at every exterior point
of the perfect set . P and has as its derivative numbers .d = g = +∞, .λd = λg = −∞ at
246 11 The Totalisation
the points of . P. The function . F2 (x) + m(x), where .m(x) denotes the measure of the subset
of . P located in .(a, x), always has the same derivative numbers as . F2 (x). Therefore, for each
of these functions, one of the limits of .r is the function . f (x) which is equal . F2 (x) outside of
. P and zero on . P. We can even say that . f (x) is one of the limits of .r when we let .h towards
zero by the values of definite sign; by the positive values, for example. Nevertheless, the
difference between these functions is not constant.
The Integral of Stieltjes
12
In .1894, Stieltjes, while researching the advancements in the field of continued fractions,1
defined a new method of integration of continuous functions. It is important to fully under-
stand the originality of Stieltjes’ generalisation and how it fundamentally differs from what
we have examined so far. In Chap. 1, we recalled what is referred to as integration in the
introductory course of infinitesimal calculus. it is a well-defined operation, which associates
a number to each continuous function . f (x). In Chaps. 2, 3, 6, 7, 10 we defined this opera-
tion for increasingly larger families of functions . f (x). We extended the notion of integration
deeper into the realm of functions. f (x). Stieltjes, in this case, leaves the family of considered
functions . f (x) invariable. However, for a given function . f (x), he defines as many integrals
as we want. Each of them associates a number to . f (x). He extends the concept of surface
integration into the field of functional operations.
In this chapter, we will present the definition of the Stieltjes integral of a continuous
function, which is similar to that in Chap. 1. Then, as done in Chaps. 2, 3, 6, 7, 10, we will
extend this concept to increasingly larger class of functions. Finally, as done in Chaps. 4, 5,
8, 9, the concepts and problems associated with the new integration. The execution of this
program would assume that research, which has not yet been addressed, has already been
conducted. On many points, we will content ourselves with posing the problems.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 247
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0_12
248 12 The Integral of Stieltjes
Let .α(x) be a function of bounded variation in an interval .(a, b); we will call it the deter-
minate function of the integration to be defined.
.Let . f (x) be a continuous function in .(a, b). We call the Stieltjes integral of . f (x), taken
over .(a, b) with respect to the determinate function2 .α(x), the limit of the sum
n
. S= f (ξi )[α(xi+1 ) − α(xi )],
0
. f (η j )[α(y j+1 ) − α(y j )],
1
where the summation is extended to some values of . j. However, with the same values of . j,
we have
. f (ξi )[α(x i+1 ) − α(x i )] = f (ξi )[α(y j+1 ) − α(y j )].
1
The difference between the two contributions from .(xi , xi+1 ) is, therefore,
. |[ f (η j ) − f (ξi )][α(y j+1 ) − α(y j )]| ≤ ωi |α(y j+1 ) − α(y j )| ≤ ωi Vi ,
1 1
where .ωi denotes the oscillation of . f (x) in .(xi , xi+1 ), and .Vi is the total variation of .α(x)
in the same interval.
Hence, by addition,
.|Sk − Sk+l | ≤ k V ,
where .V is total variation of .α(x) in .(a, b) and, .k is the maximum oscillation of . f (x) in
the intervals of division . Dk . However, .k tends towards zero as .k increases indefinitely. By
assumption, therefore, the sequence of . Sk is convergent.
2 Translator’s Note: The function.α(x) is called the determinate function, a term used by M. Lebesgue
to refer to what is now commonly called the integrator function in the Stieltjes integral.
12.1 The Integral of Stieltjes Defined by the Theory of Summable Functions 249
. | f (x)| < M,
it follows that
b
. f (x) d[α(x)] < M V ,
a
where .V is the total variation of .α(x) in .(a, b).
All these properties follow immediately from an examination of the sums . S.
The function x
. F(x) = const. + f (x) d[α(x)]
a
is called the indefinite integral function of one variable, in the sense of Stieltjes, of . f (x),
taken with respect to.α(x). This definition is applicable for.a < x ≤ b. We will soon complete
the definition for .x = a.
The indefinite integral is a function of bounded variation. Let us represent by .V (x), the
total variation of .α(x) from .a to .x. Then, we have
l
.|F(l) − F(m)| = f (x) d[α(x)] ≤ M|V (l) − V (m)|,
m
.M denotes the upper bound of the absolute value of . f (x) in .(a, b). Therefore, if we partition
.(a, b) into a finite number of intervals .(x i , x i+1 ) we have
. |F(xi+1 ) − F(xi )| ≤ M |V (xi+1 ) − V (xi )| = M V .
250 12 The Integral of Stieltjes
This inequality proves the proposition and provides the upper bound . M V for the total
variation of the indefinite integral.
When the determinate function is continuous, the indefinite integral is continuous. Indeed,
in this case, .V (l) − V (m) tends towards zero with length of .(m, l); therefore, same holds
for . F(l) − F(m).
More generally, the indefinite integral is continuous at any point of continuity of .α(x).
However, it is discontinuous at .x0 , if . f (x0 ) is non-zero and if .α(x) has a discontinuity at
. x 0 . Indeed, since . x 0 is assumed to be different from .a, let .β(x) and .γ(x) be the two functions
defined as follows:
.β(x) and .γ(x) are two functions of bounded variation, whose sum is .α(x). The indefinite
integral . F(x) = A(x), relative to .α(x), is therefore the sum of those relative to .β(x) and
.γ(x); say . B(x) and .C(x). Now, . B(x) is continuous at . x 0 , because .β(x) is continuous at . x 0 ,
.(x) = f (xi )sd (xi ) + f (xi )sg (xi ),
a≤xi <x a<xi ≤x
where the summation is extended to all values indicated by the inequalities, or, equivalently,
all those values which are points of discontinuity of .α(x). Nevertheless, by a new convention
that complements the definition of the indefinite integral, we include the point .a in the first
summation.
As a result, the corrected integral of its jump function,. F(x) − (x), is the indefinite inte-
gral, in the sense of Stieltjes, with respect to the determinate function obtained by correcting
.α(x) for its jump function.
12.1 The Integral of Stieltjes Defined by the Theory of Summable Functions 251
α(x) −
. sd (xi ) − sg (xi ).
a≤xi <x a<xi ≤x
The Stieltjes integral for discontinuous determinate functions is therefore easily calculated
using those for continuous determinate functions. In most case, these can be calculated
immediately.
Let us suppose, for example, .α(x) is continuous and strictly increasing, .α(x1 ) > α(x0 )
for .x1 > x0 . The change of variable .α = α(x) transforms . f (x) into a function .g(α) and
transforms the definition of . f (x) d[α(x)] into that of an ordinary integral of .g(α). Thus
b α(b)
. f (x) d[α(x)] = g(α) dα,
a α(a)
Therefore, if we set
. f (x) = g1 (α1 ) = g2 (α2 ),
we deduce
b α1 (b) α2 (b)
. f (x) d[α(x)] = g1 (α1 ) dα1 − g2 (α2 ) dα2 .
a α1 (a) α2 (a)
Finally, the most general case is treated similarly. When.α1 (x) and.α2 (x) are formed using the
total variations of .α(x), corrected for its jump function, we obtain, using previous notations,
b
. f (x) d[α(x)] = f (xi )sd (xi ) + f (xi )sg (xi )
a a≤xi <b a<xi ≤b
α1 (b) α2 (b)
+ g1 (α1 ) dα1 − g2 (α2 ) dα2 .
α1 (a) α2 (a)
Therefore, every Stieltjes integral is expressed using the ordinary integrals. Before drawing
conclusions from this essential fact, let us provide other equivalent formulae. The advantage
of this approach is that it relies only on the basic form of integration: the integral of a
252 12 The Integral of Stieltjes
continuous function over an interval. However, it requires two integrals, a series summation,
and a change of variables.
Let us suppose that.α(x) is continuous, strictly increasing, and has a continuous derivative.
Then, a change of variable is not necessary because we have
b α(b) b
. f (x) d[α(x)] = g(α) dα = f (x)α (x) d x,
a α(a) a
returning from variable .α to .x using the classical change of variable formula.3 In short,
we will treat the Stieltjes integral by noting that it reduces to the curvilinear integral
b,α(b)
.
a,α(a) f (x) dα, associated with the curve .α = α(x).
The previous formula extends to the case where .α(x) is only assumed to be absolutely
continuous; then, let . α(x) denote the function, determined only on points of sets that are
equivalent up to sets of measure zero, for which .α(x) is the indefinite integral. This function
can be called the almost derivative of .α(x). We have
xi+1
.S = f (ξi )[α(xi+1 ) − α(xi )] = f (ξi ) α(x) d x.
xi
Therefore b xi+1
. S− f (x). α(x) d x = [ f (ξi ) − f (x)]. α(x) d x.
a xi
As a result, with . denoting the maximum oscillation of . f (x) in the intervals .(xi , xi+1 ) and
V always representing the total variation of .α(x) in .(a, b),
.
b xi+1
. S −
f (x). α(x) d x ≤ | α(x)| d x = V .
a xi
When the function of bounded variation .α(x) is simply assumed continuous, a change of
variable is sufficient for bringing us back to the previous case; indeed, it is clear that if we
have a uniform change of variable .(t|x) in both the directions associating .(t1 , t2 ) to .(a, b),
we always have
b t2
. f (x) d[α(x)] = f [x(t)] d{α[x(t)]}.
a t1
3 The extension to various integration methods of classical procedures, for exact or approximate
calculation of integrals of continuous functions (integration by parts, by substitution, second theorem
of the mean, inequality of Schwarz, etc.) does not find place in our presentation. What follows is
actually related to the method of integration by substitution.
12.1 The Integral of Stieltjes Defined by the Theory of Summable Functions 253
.t = x + V (x),
where .V (x) is, as previously, the total variation of .α(x) from .a to .x. Then we have, by
setting .α[x(t)] = A(t),
b b+V
. f (x) d[α(x)] = f [x(t)]. A(t) dt.
a a
If.α(x) were continuous and variable in entire interval, we could set.ν = V (x) and we would
have b V
. f (x) d[α(x)] = f [x(ν)]. α[x(ν)] dν.
a 0
We would extend this formula to all the cases: to do this, let us pose the following definitions:
.V (x) denotes the total variation, from .a to .x, of the determinate function .α(x), for any value
.ν0 contained between .0 and . V = V (b) it corresponds to:
a. Let one or several values of .x be such that we have .ν0 = V (x), then we choose one of
these values .x0 and we will set
α(x0 ) − α(x0 − 0)
. A(ν0 ) = α(x0 − 0) + [ν0 − V (x0 − 0)].
V (x0 ) − V (x0 − 0)
in the second
α(x0 + 0) − α(x0 )
. A(ν0 ) = α(x0 ) + [ν0 − V (x0 )].
V (x0 + 0) − V (x0 )
With these conventions, we have
b V
. f (x) d[α(x)] = f [x(ν)]. A(ν) dν
a 0
To justify this statement, let us partition .(a, b) into partial intervals .δi , in each of which the
oscillation of . f (x) is smaller than .ε. In each interval .δi = (li , m i ) let us arbitrarily choose
a value .li ≤ ξi ≤ m i and set
Then, we have
b
f (x) d[α(x)] − [ f (ξi ){α(m i ) − α(li )}]
a
i
= [ f (x) − f (ξi )] d[α(x)]
δi
i
≤ ε d V (x) = ε i V = εV ;
i δi i
.
V
f [x(ν)]. A(ν) dν − [ f (ξi ){α(m i ) − α(li )}]
0
i
= { f [x(ν)] − f (ξi )} A(ν) dν
i
i
≤ ε | A(ν)| dν = ε i V = εV .
i i i
we take the total variation .ν of .α(x) from .a to .x as the variable. In particular, this definition
will apply to all bounded functions that are measurable with respect to .ν.
Now, among the functions . f (x) that are measurable with respect to .ν, we must mention
all functions . f (x) that are . B measurable with respect to .x. Indeed, the formula .x = x(ν)
associates every interval in .x with an interval in .ν or a point, a sum of sets in .x with a sum
of sets in .ν, a difference of sets in .x, with a difference of sets in .ν plus sometimes with some
values of .ν corresponding to the intervals of constancy of .x. Since these values are finite or
countable, every . B measurable set in .x corresponds to a . B measurable set in .ν. Therefore, if
. f (x) is . B measurable in . x, meaning if the set . E[α < f (x) < β] is . B measurable, then the
set . E[α < f [x(ν)] < β] is also . B measurable, and . f [x(ν)] is . B measurable with respect
to .ν.
Therefore, the previous definition is applicable to a class of functions . f (x) that vary
with the determinate function .α(x), while always containing the family of bounded and . B
measurable functions . f (x).
The disadvantage of such a rapid method, which provided us with this result, is that
it does not clearly reveal the significance of extending the concept of Stieltjes Integral. A
theorem of M. Frédéric Riesz will highlight its significance.5
No one had dealt with the integration of a function with respect to another function since
Stieltjes, until in .1909, M.F. Riesz revealed that this concept had, in fact, been the subject
of quite a few studies–albeit under a different name, the linear functional operation.
A linear functional operation is one that associates with each function . f (x), belonging
to a certain class of functions, a number . A[ f (x)] such that:
where . M is a fixed number. The functional . A[ f (x)]6 provided by the linear operation is
itself called linear.
These were primarily questions in mathematical physics that led to the notion of linear
functionals. The class of functions which presented itself at that time, varying with the ques-
5 It is in the context of this theorem of M. Riesz (C.R. Acad. Sc., .1909; see also Annales de l’École
Normale superieure, .1911 and .1914) that I introduced (C.R. Acad. Sc., .1909;) the extension of
concept of Stieltjes integral by the methods that we just mentioned, sometimes presented in a slightly
different form.
6 Since the term function of functions is somewhat ambiguous, M. Volterra had referred to numbers
such a . A[ f (x)] as functions of lines. However, the expression functional proposed by M. Hadamard
has generally prevailed.
256 12 The Integral of Stieltjes
tions, always contained the continuous functions but often various types of discontinuous
functions as well. Thus the problem of extending the field of application of linear functional
operations naturally arose. This extension is achieved through the earlier generalisation of
the concept of Stieltjes integral and M. Riesz’s theorem.
Among the linear functionals defined on the field of continuous functions, we find those
of the form b
. K (x) f (x) d x.
a
Expressions of this form were also considered when we tried to construct more general linear
functional; M. Hadamard and M.Frechet had obtained quite interesting results in this direc-
tion. However, it was left to M. Riesz to completely solve the problem by showing that any
b
linear functional defined for all continuous functions in.(a, b) is of the form. a f (x) d[α(x)];
where .α(x) is a function of bounded variation that characterises the functional.7
Let . A( f ) be a linear functional defined on the field .C of continuous functions from .a to
8
.b. The equality
. A( f 1 + f 2 + · · · + f p ) = A( f 1 ) + · · · + A( f p )
therefore holds in .C; there are two important cases where we can even assume that the
functions. f i are in infinite number. First, the case where the series of. f i converges uniformly,
and second, the case where all . f i , from a certain value of .i onwards, are of same sign.
Indeed, let us assume that the series of . f i converges uniformly; if . f is its limit and .s p is
the sum of its first . p terms, we have, for . p sufficiently large .| f − s p | < ε, hence
7 I imitate in what follows, the proof given by M. Riesz in his Memoire of Annales de l’École Normale,
.1914. M. Riesz honours me by stating that a remark I made regarding the role of monotone sequences
of functions guided him. In reality, I had only very imperfectly understood this role; otherwise, I
would not have written, in my note of .1909, that it would be very difficult to extend the notion of
Stieltjes integral by a different method from the one I was using. Shortly after I made this imprudent
statement, M. W.H.Young showed that my method was far from essential and that the Stieltjes integral
is defined exactly as the ordinary integral by the method of monotone sequences indicated in Chap. 7,
Sect. 8.5 (Proceed. of the London Math. Society, .1913).
This work of M.Young was the first of those which eventually clarified what a Stieltjes integral
is. We have truly penetrated the depth of this notion thanks to the definition given by M. Radon
(Sitz. d: K. Ak. d. Wiss. in Wien, .1913) and the work of M. de la Vallée Poussin on the extension of
the concept of measure (See, in particular, in this collection, the aforementioned book of M. de la
Vallée Poussin). However, for these works to be possible, the concepts of set functions (Lebesgue),
functions of several variable of bounded variation (Vitali, Rend. della R. Acc. delle Sc. di Torino,
.1908), Stieltjes integral of a continuous function of several variables (Frechet, Nouv. Ann. de Math.,
.1905), had to be developed.
Since in this book, I am only concerned with the functions of a single variable, the difficulty and
importance of some of these works may not be readily apparent. This is why I want to emphasise that
if, with regard to the functions of a single variable, the Memoire of M. Radon has brought us a new
definition, which is particularly insightful, of the Stieltjes integral. However, for the case of several
variables, this Memoire provides a real extension of the concept of integral.
8 We could start from a more restricted field, such as that of polynomials, for example.
12.2 The Linear Functionals 257
The first case is thus examined. Now, the second case reduces to the first, because if . f i are
all positive or zero and, if the sum . f of the series of . f i belongs to the field .C, the set . E p
of the points where we have .| f − f p | ≥ ε is a closed set when it exists. Since . E p contains
. E p+1 and there are no common points to all . E p , the set . E p no longer exists as when . p is
. A( f i ), we have
p
p
p
. |A( f i )| = θi A( f i ) = A θi f i
1 1 1
p
≤ M × max. of θi fi
1
p
≤ M × max. of f i ≤ M × max. of f .
1
. A(k f ) = k A( f ).
This property is obvious for integer .k or inverse of an integer; we then arrive at commensu-
rable (rational) .k, then finally we attain any .k by passage to the uniform limit.
This being said, let us set .α(a) = 0 and, for .a < xi ≤ b, let us take .α(xi ) equal to the
limit of . A( f ) for the decreasing sequence of continuous functions . f n,xi (x) equal to .1 from
.a to . x i , equal to .0 from . x i +
n to .b, linear from . x i to . x i + n .
1 1
for
9 In the field .C, a similar examination for the case of uniform convergence would be unnecessary; on
the contrary, for other fields, it provides an interesting result, which, along with the one that will be
given in the text, can be used to directly study the extension of the domain of definition of a functional,
without resorting to the notion of Stieltjes integral.
258 12 The Integral of Stieltjes
it is, in each.(xi−1 , xi ), contained between. f (ξi−1 ) and. f (ξi ) when. n1 is less than the smallest
difference .xi − xi−1 .
Therefore, if we chose .xi in such a manner that the oscillation of . f (x) is less than .ε in
each .(xi−1 , xi ), then the difference . f (x) − gn (x) is less than .2ε when .n is large enough.
And then the difference . A[ f (x)] − A[gn (x)] will be less than .2Mε.
Now we know the limit . S of . A[gn (x)] for very large .n; it is written as
And consequently, . A( f ) is the limit of previous sum, meaning . A( f ) is the Stieltjes integral
of . f (x), taken with respect to .α(x). The theorem of M. Riesz will be proved as soon as we
have verified that .α(x) is of bounded variation.
However, this is obvious; if .α(x) were not of bounded variation, it would have a positive
variation equal to .+∞. Therefore, we would be able to find intervals .(a1 , b1 ), .(a2 , b2 ), . . .
distinct from each other and finite in number such that. [α(bi ) − α(ai )] exceeds.2M; where
. M is always the number which appears in the second property of the linear functional.
Therefore, for the continuous functions .ϕi equal to .1 in .(ak , bk ), zero in the intervals
. bk + , ak+1 −
i and linear in the intervals . ak − i , ak , . bk , bk + i , functions which
1 1 1 1
i
decrease towards the function .ϕ equal to .1 in .(ak , bk ) and to .0 outside, the numbers . A(ϕ)
would tend towards
. [α(bi ) − α(ai )] > 2M;
which is impossible, since . A(ϕ) cannot exceed . M.
This theorem of M. Riesz associates with each linear functional, defined for the continuous
functions . f (x) in an interval .(a, b), a determinate function .α(x). As we have seen regarding
the extension of Stieltjes integral, such a functional can be extended to the field of all
functions which, through the change of variable from .x to the total variation .ν of .α(x)
in .(a, x), transform into summable functions of .ν. This family varies with .α(x), but it is
important to note that it always contains all . B measurable and bounded functions.
When we use the linear functionals of continuous functions . f (x), one of the most use-
ful properties is this: the sum . A[ f (x)] + B[ f (x)] of two linear functionals . A[ f (x)] and
. B[ f (x)] is itself a linear functional.
When we want to extend the field of functions . f (x) while still retaining the advantage
of this property, it is necessary to focus on a functional field independent of the determinate
function .α(x). Therefore, it is very interesting to know that all linear functionals defined on
the field of continuous functions can be extended to the field of . B measurable and bounded
functions.
12.2 The Linear Functionals 259
It is clear that the functionals extended to the field of. B measurable and bounded functions,
as we have obtained, possesses the following property:
We will verify that the properties .1, 2, and .3 are sufficient to characterise the extension
we have made to the field of . B measurable and bounded functions of a given linear
functional defined on continuous functions.
Indeed, we will see how, from the properties .1, 2, and .3, can extend to a larger field, a linear
functional . A( f ), given in the field of continuous functions.
We have seen that, for .n increasing indefinitely, the functions . f n,xi (x) decrease towards
the function . f x≤xi (x) which equals .1 for .a ≤ x ≤ xi and .0 for .xi < x ≤ b. Therefore,
according to .1 and .3, . A[ f x≤xi (x)] is deduced.
Let us set, for .α < β,
. f α<x≤β = f x≤β (x) − f x≤α (x).
. E = E1 + E2 + E3 + · · ·
. A[ f E ] = A[ f E 1 ] + A[ f E 2 ] + A[ f E 3 ] + · · ·
. A[ f E ] = A[ f E 1 ] − A[ f E 2 ],
. f ε (x) = iε f E[iε≤ f (x)<(i+1)ε] (x);
therefore, the sequence of functions . f ε (x) tends uniformly towards . f (x) when .ε tends
towards zero and we deduce from .1 and .2, as we did previously, that the numbers . A[ f ε (x)]
converge to a limit that we must take as the value of . A[ f (x)].
Thus, the extension to the entire field of . B measurable bounded functions, if possible, is
unique. But, we have seen that it was possible.10
The extension to this large functional field11 is, therefore, well characterised by the
conditions .1, 2 and .3.
Thanks to the concept of the Stieltjes integral, we will obtain another extension. With
every definite integral, we have associated an indefinite integral providing a point function,
an interval function, and a measurable set function. Thus, the notion of the Stieltjes integral
leads to an indefinite Stieltjes integral as a point function, an indefinite Stieltjes integral as
as an interval function, and an indefinite Stieltjes integral as a set function. This last one
allows us to associate with the function . f (x) and a set . E x , a definite number
. f [x(ν)]. A(ν) dν = f (x) d[α(x)],
Eν Ex
where . E ν is the set of values of .ν for which .x(ν) belongs to . E x . Thus, the Stieltjes integral
of . f (x) taken with respect to .α(x), is now defined and extended to the set . E x .
This integral is defined for the sets . E x for which . E ν is measurable; this family of sets
contains all the . B measurable sets . E x .
The integral is defined for every function having a definite value at the points of . E x
and is equal on . E x to a function . f (x) for which . f [x(ν)] is summable over .(0, V ). There-
fore, in particular this definition is applicable to any function . f (x), that is bounded and . B
measurable on a . B measurable set . E x .
Now, . E x f (x) d[α(x)] is obviously a linear functional in the field of functions . f (x)
given on. E x . Therefore, when a linear functional. A1 [ f (x)] is defined on continuous functions
in an interval . I , we can deduce a family of linear functionals . A E [ f (x)], associated with
each . B measurable set . E located in . I and defied on . B measurable functions on . E, by the
conditions
10 If we had not already extended it, thanks to the Stieltjes integral, we should, which would be easy,
verify here that the values we have assigned to . A[ f ] are well-determined for each . f of the new
functional field and that they satisfy the conditions .1, 2 and .3. That is, in fact, what we will do shortly.
11 We will see later how we can attain an even larger functional field, consisting of functions . f (x)
which yield measurable functions . f [x(ν)].
12.2 The Linear Functionals 261
. A[ f (x)] = A I [g(x)].
We can be surprised by this result because given . A I [ f (x)], the determinate function .α(x)
is not unique. Indeed, it is quite clear that if we modify .α(x) at a single point .x0 interior to
. I , it does not modify . A I [ f (x)] for the continuous functions . f (x) in . I , because we have
. A I [ f (x)] = f (x) d[α(x)]
a≤x<x0
+ f (x0 )[α(x0 + 0) − α(x0 − 0)] + f (x) d[α(x)];
x0 <x≤b
This paradox arises solely from the fact that, for .x0 > a, the equality
. A a≤x≤x 0 [ f (x)] = f (x) d[α(x)],
a≤x≤x0
holds only with functions .α which are continuous to the right of .x0 . Such was the case for
the function .α(x) constructed in the course of proof of the theorem of M. Riesz, which can
be verified easily. However, with this function, we do not have
. A x 0 ≤x≤b [ f (x)] = f (x) d[α(x)],
x0 ≤x≤b
x
by denoting .F(x) as the indefinite integral . a f (x) d[α(x)], it is necessary that .F(x) serves
as a generating function for an additive set function, and as a result, be, as we have seen in
262 12 The Integral of Stieltjes
the indicated section, a right continuous function. However, the right jump of .F(x) at .x0 is
equal to . f (x0 )[α(x0 + 0) − α(x0 )]; therefore, .α(x) must be right continuous.
To circumvent the difficulty which thus arises, when we want to extend a known func-
tional to measurable sets in an interval using a determinate function .α(x) that is not right
continuous, we can decompose .α(x) into its jump function and its continuous part
This study of the linear functionals provides a better understanding of the meaning of condi-
tions .1I , 2I , . . . , 6I in the integration problem (Sect. 8.1). Let us compare these conditions
with conditions .1F, 2F, 3F and .4F established for linear functionals (Sect. 12.2). .3I is
identical to .1F; .6I replaces .3F; .2I replaces .4F; as for .2F, it turns out to be a consequence
of .4I and .5I . The conditions .1I , 4I , 5I serve only to characterise the functional . A f (x)
relative to continuous functions that needs to be extended. In short, when the integral of a
continuous function . f (x) is known in any interval where the function is given, in Chap. 7,
we could have limited ourselves to setting the conditions .1F, 2F, 3F and deducing the
extension of the integral by reasoning presented earlier. However, while for the extension
of the general linear functional, we could limit ourselves to proving that the extension was
unique because the case of the integral had been previously examined, we would now need
to directly verify that the extension is possible. This is what we will do by placing ourselves
in the case of most general Stieltjes integral. In this way, we will obtain a direct definition
of this integral, from which the definition of the ordinary integral will be deduced by setting
.α(x) ≡ x.
the functional relative to . AC(x) is written . ab f (x). AC(x) d x; only the functional relative to .Cs (x)
requires the use of Stieltjes integral (FRÉCHET, Comptes rendus du Congres des Societes savantes
en .1913).
12.3 Direct Definition of Integral of Stieltjes 263
But, before looking for a new form of the definition of the Stieltjes integral, it is necessary
to determine the cases in which the initially established definition for continuous functions
can be employed without modification.
Let . f (x) be a bounded function in .(a, b), and let .α(x) be a function of bounded variation.
We will form, as was said, the sum
n
n
. S= f (ξi )[α(xi+1 ) − α(xi )] = f (ξi )δi α.
0 0
If, in .(xi , xi+1 ), . f (x) varies between . L i and .li , we will denote by . L i and .li , two numbers
defined by the conventions
. L i = L i , li = li , if δi α ≥ 0,
L i = li , li = Li , if δi α < 0;
. L i and .li will be the upper and lower bounds of . f (x) defined with respect to .α(x). The
oscillation of . f (x) in the given interval will be
.ωi = L i − li = |L i − li |.
This being said, if we vary .ξi without varying .xi , then . S varies between
n
n
. S= L i δi α and S = li δi α.
0 0
We will show that these sums, for a sequence of divisions . D1 , D2 , . . ., into intervals whose
maximum length tends towards zero, converge towards definite limits
b b
. f (x) d[α(x)], f (x) d[α(x)].
a a
—which we will call the upper and lower Stieltjes-Darboux integrals of . f—provided that
every point of discontinuity of .α(x) is a point of division of . Dk , from a certain value of .k.
If . p(x), n(x), ν(x) are three total variations of .α(x)
we have
n
n
n
n
n
. li δi p − L i δi n ≤ S = f (ξi )δi [ p − n] ≤ L i δi p − li δi n.
0 0 0 0 0
Now, for the increasing functions . p and .n, the study of sums such as . L i δi p is easy. Let
.1 , 2 , . . . be a second sequence of division subject to the same conditions as the sequence
264 12 The Integral of Stieltjes
of . Di ; let .si and .σ j be the numbers . L i δi p provided by . Di and . j ; we want to compare
the sequence of .si with that of .σ j .
If . j is taken sufficiently large, .i being fixed, in each interval given by . j at most one of
the points of . Di will be found. So, if .(x, y) is one of the intervals given by . Di , .x will be
in an interval .(r , s) of . j and . y in .(t, u); .r ≤ x ≤ s ≤ t ≤ y ≤ u. If .x (or . y) is a point of
discontinuity of.α(x), for. j sufficiently large,.r and.s will coincide with.x (or.t and.u with. y);
so we will assume .s − r (or .u − t) is different from zero only if .x (or . y) is point of continuity
of.α(x). Then, let us replace the contribution of.(r , s) in.σ j with. f (x)[ p(s) − p(r )]; as. f (x)
differs from the upper bound of . f in .(r , s) by as little as we want when . j is taken sufficiently
large,—because of the smallness of .(r , s) and of the continuity of . f at point .x,—we will
modify .σ j by as little as we want. Let us do that for each point of division of . Di , and we
will have a number .(σ j ) different from .σ j by less than .ε. If, between .s and .t the points of
. j are . z 1 , z 2 , . . . we can say that the contribution of .(x, y) to .(σ j ) is of the form
[ p(s) − p(x)] f (x) + L 1 [ p(z 1 ) − p(s)] + L 2 [ p(z 2 ) − p(z 1 )] + · · · + [ p(y) − p(t)] f (y).
.
Now,. f (x),. L 1 , L 2 , . . . , f (y) are at most equal to the upper bound. L of. f in.(α, β); therefore,
the previous sum is at most equal to
. L[ p(y) − p(x)],
(σ j ) ≤ si
.
and as a result
σ j ≤ si + ε,
.
as soon as . j is sufficiently large. It follows that the .si and the .σ j converge towards the same
limit.
In short, we have just proved the theorem for a monotone function and the existence of
the limits.
b b
. f (x) d[ p(x)], f (x) d[ p(x)],
a a
b b
f (x) d[n(x)], f (x) d[n(x)]
a a
is proved.
12.3 Direct Definition of Integral of Stieltjes 265
The difference between the extreme members of this inequality is, according to the manner
in which it was obtained, the limit of
n
n
n
n
. L i δi p − li δi n − li δi p − L i δi n
0 0 0 0
n n
= (L i − li )(δi p + δi n) = ωi δi ν.
0 0
Now, we have
n
n
n
n
. S−S= L i δi α − li δi α = (L i − li )δi α = ωi |δi α|.
0 0 0 0
n
n
n
0≤
. ωi δi ν − ωi |δi α| = ωi [δi ν − |δi α|];
0 0 0
all the .ωi are at most equal to the oscillation . of . f (x) in .(a, b), since the .δi ν − |δi α| are
positive, this quantity is at most equal to
n
n
. [δi ν − |δi α|] = V − |δi α|.
0 0
n
But we know (Sect. 5.1) that in the considered conditions here . δi α tends towards .V . From
0
this, the following relations follow:
. D ≤ lim S ≤ E; D ≤ lim S ≤ E;
lim(S − S) = E − D;
Therefore, we have
. lim S = E, lim S = D.
The theorem is proved and we have for the integrals by upper and lower sums, the expressions
266 12 The Integral of Stieltjes
b b b
. E= f (x) d[α(x)] = f (x) d[ p(x)] − f (x) d[n(x)].
a a a
b b b
D= f (x) d[α(x)] = f (x) d[ p(x)] − f (x) d[n(x)].
a a a
For our statements to reduce exactly to those of Chap. 2, when we set .α(x) ≡ x, let us define
the mean oscillation of . f (x) in .(a, b), taken with respect to .α(x), the limit of the ratio
n
ωi δi ν
0
. ,
n
δi ν
0
13 We will note that here it is no longer necessary to consider only sequences of divisions . D such
i
that every point of discontinuity of .α(x) belongs to all the . Di from a certain value of the index.
14 That is, . enclosed in the open intervals . .
12.3 Direct Definition of Integral of Stieltjes 267
By using the theory of measure that will be developed, the reader will easily show, as in
Chap. 2, the necessary and sufficient condition for a function to have a Stieltjes–Riemann
integral is that the set of its points of discontinuity should have a measure zero with respect
to the determinate function .α(x).
Let us not study the initial definition of the Stieltjes integral any longer and, to prepare a
broader definition of this integral, let us define the measure of a set, taken with respect to a
function .α(x), of bounded variation.
We will agree that the measure of closed interval .(l, m) is
from there we determine the measure of an open or a semi open interval by subtracting from
the measure of a closed interval, the measure of either or both of its two extreme points.
It is obvious that this function of intervals is completely additive. Therefore, as mentioned
in Sect. 9.3, it corresponds to a completely additive set function, defined particularly for all
. B measurable sets. We now elaborate on this result.
Let us recall that the sets . E x , for which we defined the function .Aα(x) (E x ), denoted here
as .m α(x) (E x ), are those which are transformed into measurable sets .Et with respect to .t by
means of the change of variable
.t = x + V (x);
.V (x) always denotes the total variation of .α(x) from .a to .x; this change of variable is to be
interpreted as it has been explained in Sect. 9.3.
To say that .Et is measurable means that we can enclose it in a set .et of open intervals and
its complement .Ft in a set . f t of open interval, such that the length of the subsets common
to .et and . f t is at most .ε, as small as we want. To .et and . f t , there correspond sets .ex and
. f x of intervals enclosing . E x and its complement, . Fx . This is achieved by ensuring that the
intervals constituting .et and . f t are chosen in such a way that none of them has an extremity
in the interior of an interval .(t1 , t2 ) corresponding to a singular point of .α(x); which is
possible since these intervals are completely in .Et or .Ft .15 In each interval of the .x-axis we
have
.δx ≤ δt, δV < δt.
From the first inequality, we have already deduced that . E x is measurable, because it implies
. δx ≤ ε, where the summation is extended over the intervals common to .ex and . f x . The
second inequality provides us . δV < ε and shows us that the . E x are measurable with
respect to .α(x) by agreeing that: a set . E x is said to be measurable with respect to .α(x)
if it can be enclosed in an infinite number of open intervals .ex and, if we can enclose the
complement . Fx of . E x in an infinite number of open intervals, . f x , such that the sum . δV ,
extended over intervals common to .ex and . f x , can be made as small as we want.
Therefore, we have defined .m α(x) (E) for sets which are at the same time measurable
in the ordinary sense and measurable with respect to .α(x).16 If we were to stop there,
measurability in the ordinary sense would play a separate role. We completely generalise
the theory of measure only by defining measure with respect to .α(x) for all the measurable
sets with respect to .α(x), without additionally requiring that they are measurable in the
ordinary sense. That is what we are going to do now.17
The problem of measure that we have solved in Chap. 7 can be stated as follows.
Find a set function which is:
1. Positive or null;
2. Completely additive;
3. Which, for the open and closed intervals, reduces to the usual measure of these intervals.
When it comes to the measure with respect to a non-decreasing function .α(x), we can retain
this statement exactly. Then, we define the exterior measure of . E x as the lower limit of the
sums . δα over the sets of intervals enclosing . E x . The measure of .(a, b) minus the exterior
measure of the complement . Fx of . E x gives the interior measure of . E x . Clearly, that the first
of these measures is at least as large as the second; these two measures are equal only for
sets measurable with respect to .α(x). In short, for a non-decreasing .α(x), the theory of the
measure with respect to .α(x) is constructed identically to that of ordinary measure, that is,
for .α(x) ≡ x.
However, if .α(x) is only of bounded variation, condition .1 cannot be retained, since it is
no longer satisfied for all intervals. We will replace it with the following:
The measure of a set with respect to .α(x) is at most equal, in absolute value, to the
measure of the same set with respect to the total variation .ν(x) of .α(x).
If . E x is a set, let us enclose it in sequences .Ex1 , .Ex2 , . . . of sets of open intervals such that
the corresponding sums . 1 δν, . 2 δν, . . . tend towards the smallest possible value. Let
ij j ij i j
.E x be the set common to .E x and .E x ; as .E x encloses . E x , it provides a sum . δν, such
i
i i j j i j
that . δν − δν, . δν − δν tends towards zero when .i and . j both increase
j ij
indefinitely. Now, the three sets .Exi , .Ex and .Ex provide the sums of increment of .α(x) such
that we have
i i j i i j j i j j i j
δα −
δα ≤ δν − δν, δα − δα ≤ δν − δν.
.
16 We have defined .m
α(x) (E) for all the sets satisfying these two conditions at the same time.
17 It will be noted that the result we are going to obtain is the one that would be given by the change
of variable .ν = V (x) used instead of the change .t = x + V (x).
12.3 Direct Definition of Integral of Stieltjes 269
Therefore, the numbers . i δα converge when .i increases indefinitely; their limit is what
we call the exterior measure, with respect to .α(x) of . E x . The interior measure of . E x is the
measure of .(a, b) minus that of the complement . Fx of . E x .
Now, suppose . E x is measurable with respect to .α(x) and let us enclose . Fx in a sequence
of sets of intervals—the sets .Fix , providing sums .σ i δν and .σ i δα—such that the subsets
common to .Exi and .Fix provide a sum .τ i δν tending towards zero when .i increases. Then,
the sum .τ i δν provided by these common subsets tends a fortiori towards zero and as we
obviously have
it follows
.m(ext.)α(x) [E x ] + m(ext.)α(x) [Fx ] = m α(x) [a ≤ x ≤ b],
that is
m(ext.)α(x) [E x ] = m(int.)α(x) [E x ].
.
Thus, the exterior and interior measures, with respect to .α(x), of a measurable set with
respect to .α(x) are equal. Their common value is the measure of the set, taken with respect
to .α(x).
For proving this last point, let us note that the set .Exi − E x , which is enclosed in the
subsets common to .Exi and .Fix , has measure, with respect to .ν(x), equal to .τ i δν, at most. A
fortiori, we have
.|m α(x) [E x − E x ]| < τ δν;
i i
hence we conclude
After finding the unique number which can satisfy the conditions of the problem of measure,
it remains to verify whether it actually satisfies them. To simplify, and to return to previous
considerations, let us conclude this from the correspondence between the sets . E x located
in .(a, b) and sets . E ν in .(0, V ) that has already been useful18 to us; in this correspondence,
every point .x in .(a, b) is associated with the interval .[ν(x − 0), ν(x + 0)] in .(0, V ). To any
interval . E x , corresponds an interval . E ν whose length is the measure of . E x with respect to
.ν(x). Consequently, the sets . E x measurable with respect to .ν(x) are those which provide
m ν(x) [E x ] = m[E ν ].
.
Since the sets measurable with respect to .α(x) and with respect to .ν(x) are, by definition,
the same, we now know which sets are measurable with respect to .α(x).
18 See Sect. 12.1. For direct proofs, we can refer to the aforementioned Book of M. de la Vallée
Poussin.
270 12 The Integral of Stieltjes
1. . I ( f 1 + f 2 ) = I ( f 1 ) + I ( f 2 );
2. . I ( f 1 + f 2 + · · · ) = I ( f 1 ) + I ( f 2 ) + · · · ,
when the series . f 1 + f 2 is uniformly convergent;
3. and when this series is convergent and of positive terms;
4. . I ( f ) reduces to the usual integral of . f when . f is continuous;
5. . I ( f ) = I (g) if . f differs from .g only at the points of a set of measure zero.
We will retain this statement for the extension of the Stieltjes integral; only, the integral and
the measure referred to in number .4 and .5 will now be the integral and the measure with
respect to .α(x).
To solve this problem, we only need to revisit, with slight modification due to condition
.5, the reasonings used for the extension of a linear functional, Sect. 12.2 and we will fall
the continuous function . f p , differs less and less from .α(m + 0) − α(l − 0) that is, from
.m α(x) [E(ψ = 1)]. We therefore have
when . E(ψ = 1) is an interval. By the addition of such functions and the application of
condition .3, we see that we have the same equality when . E(ψ = 1) is a set of intervals.
Now, let us suppose that . E(ψ = 1) is a measurable set with respect to .α and, let us enclose
this set in sets of intervals yielding sums . δν tending towards the smallest possible limit.
Moreover, we can assume that each of these sets of intervals contain following intervals.
12.3 Direct Definition of Integral of Stieltjes 271
To these sets of intervals correspond the functions .ψ1 , ψ2 , . . ., taking only the values .0 and
1, and such that these sets are denoted . E[ψ1 = 1], . E[ψ2 = 1], . . .. Let .ψ0 be the function
.
towards which the functions .ψi converge in a decreasing manner. We have, according to .3,
. I [ψ0 ] = limit of I [ψi ] = limit m α(x) [E(ψi = 1)] = m α(x) [E(ψ = 1)]
n
. I ( f ) = lim I [ϕ(x)] = lim li m α(x) [E(ψi = 1)].
ε→0 ε→0
0
n
. li m ν(x) [E(ψi = 1)]
0
is absolutely convergent for one choice of .li , it will be so for any choice of .li and for any
ε. The function . f (x) is, then, said to be summable with respect to .α(x). By condition .3, it
.
272 12 The Integral of Stieltjes
The integral, with respect to .α(x) of a summable function with respect to .α(x), is thus
defined in all cases exactly as in the particular case of .α(x) ≡ x.
It is easy to show directly, using arguments completely similar to those of Chap. 7, that this
definition provides a definite number, which indeed satisfies the conditions of our problem,
allows us to derive its main properties. However, we are going to do all of this in one step
b
by proving that . I ( f ) is identical to the integral . a f (x) d[α(x)] defined in Sect. 12.1. We
would use the functions .x(ν) and . A(ν) which were employed at that time.
The previous expression derived from the measure of a set, with respect to .α(x), gives
us
. I ( f ) = lim li m α(x) [E(ψi = 1)] = lim li A(ν) dν;
ε→0 ε→0 E ν (ψi =1)
The definition we have just given, which is due to M. Radon, is therefore equivalent to
that in Sect. 12.1 and this exempts us from any direct study of the properties of the Stieltjes
integral. However, we will show, by way of example, how the generalisation of the notion of
the absolutely continuous function19 appears. However, first let us examine how it happens
19 Among the exercises that the reader can tackle, I highlight the following. Apply Jordan’s methods
to the problem of the measure relative to .α(x); define the range with respect to .α(x); show that the
.J measurable functions with respect to .α(x) are the integrable functions in the Stieltjes–Riemann
sense, with respect to .α(x); compare the field of extension of the definition of M. Radon to that of the
various definitions given in Sect. 12.1 of this chapter. In particular, show that, just as the extension
12.3 Direct Definition of Integral of Stieltjes 273
that the notions of the . B measurable sets, . B measurable functions are independent of the
function .α(x) determining the problems of measure or integration we are dealing with,
whereas the notion of measurable sets and measurable functions vary with .α(x).
The reason is that these are notions of entirely different natures; M. Denjoy would say that
the first ones are descriptive and the second metric. M. Borel introduced the . B measurable
sets in the context of measure theory, and that is where their names come from, but he
did not characterise them by a metric property: he indicated what geometric operations,
performed from intervals and the points, allow us to obtain these sets. The importance of . B
measurable functions in Analysis is primarily from the fact that these functions encompass
all those which fall into the classification of M. Baire, all those which are amenable to an
analytic representation.20 Here, the importance of . B measurable sets and functions come,
as we might have noticed, from the fact that for them, the measure or integral is determined
by the conditions of our problems which can be expressed as equalities, and without the
need to use those involving inequalities; that is to say, with the help of the conditions .2, 3
(Sect. 12.3), conditions .1, 2, 3, 4 (Sect. 12.3). Now, the field of these sets is so vast that it
took great efforts to construct some examples of the sets or the functions which are . B non
measurable; that is to say, there would be no practical inconvenience in limiting ourselves
to the study of the . B measurable sets and functions.
Let us propose to characterise the functions . F(x) which are indefinite integrals with
respect to a known function of bounded variation .α(x), these functions . F(x) are the ones
which we could call absolutely continuous functions with respect to .α(x).
A first condition is that . F(x) is of bounded variation and have left and right jumps at
every point that are proportional to those of .α(x), according to the expression of the jump
function of a given integral (Sect. 12.1)
of the notion of measure obtained (Sect. 12.3) using a change of variable of the form .t = x + ν(x),
applies only to sets that are measurable in both the ordinary sense and with respect to .α(x), the
extension obtained (Sect. 12.1) using the same change of variable applies only to the functions that
are both measurable and summable in the ordinary sense and with respect to .α(x).
It is also clear that two of these definitions are always consistent when they both apply, since
they define linear functionals satisfying the third condition in Sect. 12.2 and they are identical for the
continuous functions.
20 LEBESGUE, Journ. de Math., .1905.
274 12 The Integral of Stieltjes
ν
(ν) = C +
. f [x(ν)]. A(ν) dν
0
is defined on entire .(0, V ). For any value of .ν, such that the equation .ν = V (x) have at least
one root, we have
.(ν) = F[x(ν)];
the only values of .ν at which we do not have this equality are, therefore, those for which we
have an inequality of the form
Hence, it follows that if we set.[V (x)] = F(x) and, if we agree to complete the definition of
. in such a manner that it is everywhere continuous and, linear in the intervals where it was
not yet determined, we must have .(ν) ≡ (ν). In other words, .(ν) must be absolutely
continuous with respect to .ν.
Let us show that this condition, in conjunction with the previous one, is sufficient. There-
fore, let us assume that these conditions are satisfied. If .x0 is a point of discontinuity of
.α(x), we will take
a finite quantity since . A(ν) has been taken equal to .+1 or .−1.
Finally, at the points where . f (x) is not yet defined, we will take . f (x) arbitrarily. Indeed,
these points correspond to a countably infinite number of values of .ν = V (x) having no
influence on the integral taken from .0 to .V .
With these choices, it is clear that we have
12.3 Direct Definition of Integral of Stieltjes 275
ν
.(ν) = C + f [x(ν)]. A(ν) dν,
0
therefore, x
. F(x) = C + f (x) d[α(x)].
a
Let us transform the second condition we found; for that let us note that, as soon as the first
condition is met, we can calculate . f (x) at the points of discontinuity of .α(x), and therefore
. f [x(ν)] in the various intervals .(ν1 , ν2 ). Let us denote by .g(ν) the function equal to . f [x(ν)]
in the intervals .(ν1 , ν2 ) and zero elsewhere. The function .g(ν). A(ν) is summable, because
in .(ν1 , ν2 ) its integral is . F(x0 ) − F(x0 − 0) and in .(ν0 , ν2 ) it is . F(x0 + 0) − F(x0 ), and
the sum of the absolute values of all these integrals is bounded since . F(x) was assumed to
be of bounded variation.
The function ν
.1 (ν) = g(ν). A(ν) dν
0
can therefore be calculated as soon as the first condition is met. Moreover, it is clear from
the above that .1 (ν) is the transform of the jump function . S(x) of . F(x).
.1 (ν) being absolutely continuous, it is sufficient for us to express that .(ν) − 1 (ν) =
2 (ν) is also absolutely continuous. This means that if we form the sum . 2 , extended
to a set . Iν of intervals—assuming that they do not have their origins or the extremities in
the intervals .(ν1 , ν2 ), since .2 (ν) is constant in such intervals, and we can assume them to
be open since .2 (ν) is continuous—and if the total measure of these intervals is .ε, it tends
towards zero with .ε.
However, since the . Iν have neither their origins nor their extremities in the .(ν1 , ν2 ), and
that these are the open intervals; they are indeed transformed from sets . I x of open intervals
of .x-axis and the sum to be considered is
. |[F(x) − S(x)]|.
Ix
. |1 | = |S|
Iν Ix
tends towards zero with .ε; therefore it is necessary and sufficient that . Ix |F| tends
towards zero.
For a function . F(x) to be an indefinite integral with respect to .α(x), it is necessary and
sufficient that:
The answer is much simpler when it comes to determining how to recognise that a
function of . B measurable sets is an indefinite integral with respect to a given .α(x). First, let
us recall that for calculating such an indefinite integral, we can eliminate all unnecessary
singularities of .α(x), meaning we replace .α(x) with a function that is equal to .α(x) at .a,
.b, and all points of continuity of .α(x), and which does not exhibit two jumps of opposite
signs at any point. Therefore, let us assume that .α(x) has only useful singularities left.
An indefinite integral with respect to .α(x) is a function of . B measurable set:
The stated property of the indefinite integrals is simply a translation of the fact that when
considered as attached to sets of .(0, V ), this set function is absolutely continuous. Let us
examine the converse.
To a function .ϕ on the . B measurable set . E x , our change of variable associates a function
. on the . B measurable set .Eν , but is not defined for every . B measurable set of .(0, V ).
Indeed, we know its value .W in an interval .(ν1 , ν2 ) corresponding to a singular point .x0 of
.α(x), but we do not know its value .U in a . B measurable set .eν contained in .(ν1 , ν2 ); let us
The final form we have given, (Sect. 9.3), of the condition of absolute continuity, can
be generalised quite literally; from this, we could easily derive the generalisation of other
forms of this condition.
12.4 Physical Significance of Integral of Stieltjes 277
We have just generalised one of the modes of analytic definition of the integral; the other
modes of analytic definition are amenable to similar generalisations.21 But is there not, for
the Stieltjes integral, a definition similar to the geometric definition of the integral, one
that appears as a simple refinement of an intuitive definition? This mode of definition does
exist; it certainly guided Stieltjes’ initial ideas, although Stieltjes does not emphasise it; his
analytic presentation is completely satisfactory from a logical point of view, as a result, the
intuitive significance of the Stieltjes integral was somewhat forgotten.
Nevertheless, Stieltjes does say: let us suppose that there is heavy matter spread over . O x.
Let .u(x) be the mass located on .(0, x); let us calculate the moment of the total mass with
respect to the origin. For that, let us partition the considered interval using increasing values
.ξi . We will have an approximate value of the moment in the form
. ξi [u(ξi+1 ) − u(ξi )];
from this, we obtain, for the exact value of the moment, an integral . x d[ux].
However, the meaning of the Stieltjes integral is much more clearly provided by Cauchy,
who had considered integration with respect to a function before Stieltjes. He did so from
a more extensive physical perspective, albeit in a much less precise form, from the logical
point of view22 than Stieltjes.
Cauchy’s starting point is the notion of coexisting quantities, a broader notion than that
of a function, with the later being just a particular case.
The quantities are said to be coexistent when they are determined by the same geometric
or physical conditions. For instance, the surface and volume of a cylinder are coexisting
quantities, determined simultaneously by the datum of the cylinder. In a gaseous expanse,
we isolate the matter contained within a certain domain in our thoughts; we can consider
the volume of the conceived body, its mass, and the amount of heat needed to raise its
temperature by one degree at constant volume as three coexistent quantities.
The numbers that measure these quantities are not necessarily functions of each other, as
the previous examples illustrate. They can be functions in some cases, though. For exam-
ple, the radius, height, surface area, and volume of a cylinder of revolution are coexistent
quantities and any two of them can determine the other two. In a more general sense, if one
quantity is a function of another, then all these quantities are coexistent. We are accustomed
to reason on the variables and functions, but there is just as much a reason to argue on
the coexisting quantities. For example, we can establish relationships of inequality between
21 It has already been said that the definition of M. W. H. Young was first generalised (Sect. 12.2, in
note).
22 Sur le rapport differentiel de deux grandeurs qui varient simultanement (Ex. d’Analyse, t. . I I , pp.
.188−229; OEuvres, .2nd serie, t.. X I I , pp. .214−262). See also the Traite de Mecanique analytique
de l’abbe Moigno.
278 12 The Integral of Stieltjes
the surface area and volume of a cylinder. To provide a solid base to the arguments on the
coexistent quantities, we need to clarify this notion, which will also limit its scope.
In the previous examples, coexistent quantities appear to be associated with the same
object, such as a cylinder or gaseous body, and they are functions of same domain. Physically
measurable quantities also appear, always as functions of domains, although these domains
are not always three dimensional. They can be domains on a straight line, meaning, the
intervals, planar domains, or domains with more than three dimensions. In the later case,
the domain is not directly perceptible, and its purely mathematical conception is somewhat
artificial. For example, if we wanted to discuss the quantity of the heat required to raise the
temperature of an isolated gaseous body .C by .δt degrees, and we wanted to vary both .δt and
the body itself, we would need to consider the quantity of heat associated with a domain of
four dimensional space in .x, y, z, t coordinates. This space would be obtained by translating
the body .C, traced at time .t = t0 , by .δt units parallel to the .t-axis.
Therefore, we will admit that the quantities we are talking about are functions of domain
and, we will replace the notion of coexistent quantities with the much more precise notion
of functions of the same domain or, through an abstraction of mathematician, functions of
the same set.
In physics, one also encounters functions of points or, if you will, of a certain number
of variables. Some are still functions of a domain, but attached to special domains that
depend only on a finite number of parameters: for instance, the mass .m of the quantity of
water contained in a container up to the height .h is a function of .h, as both .m and .h are
functions of the same domain. The others are truly attached to points; these numbers generally
serve to calibrate states, qualities, to distinguish, for example, movements of varying speeds
(velocity), materials of varying densities (density).
If we consider the precise definition of these numbers, we will see that we obtain them
as the limit of the quotient of two functions of the same domain:
arc length of trajectory
Speed = lim .mean Speed = lim
. ,
time taken to travel this arc
mass of a body
Density = lim .mean Density = lim .
volume of this body
As our goal is not to study the numbers of physics, we do not need to investigate if all
these numbers fit well into one or the other of two categories mentioned, or if the distinction
between these two categories of numbers is absolute. It suffices to have noted the importance,
in physics, of functions of domains and the kind of derivation from one domain function
with respect to another which provides the functions of points.
It should not surprise us that domain functions are introduced into physics and appear to
be more directly suited to the physicist’s needs than point functions. A point is merely the
limiting concept of successively smaller bodies, and a point function can be introduced in
physics only as the limit of a body function, a domain function. However, little is spoken
about these functions because mathematicians have not yet created the Algebra and Analysis
12.4 Physical Significance of Integral of Stieltjes 279
of domain functions. On the contrary, we possess a remarkably convenient notations for point
functions. Therefore, through various techniques—which essentially reduce to reasonings
about sufficiently special domains that depend on only a finite number of variables—we
always replace the use of domain functions with that of point functions.
The operation of differentiation we encountered is one studied by Cauchy. It is defined
as follows: .ϕ(D) and .ψ(D) being two domain functions, to obtain the derivative at a point
ϕ(Di )
. P of .ϕ with respect to .ψ, we take the limit of ratio .
ψ(Di ) for a sequence of increasingly
smaller domains . Di that ultimately reduce to the limit at the single point . P.
We can similarly define the derivative of a set function with respect to another. We may
also need to impose additional restrictions on the sequence of . Di or . E i for the limit to exist,
as we did (Sect. 10.2); let us leave these details aside.
Let us propose, following Cauchy, to calculate .ϕ(D) knowing .ψ(D) and the derivative
. f (P) of .ϕ(D) with respect to .ψ(D). This problem would not be well defined, and we would
hardly know how to study it, if we allowed the notion of domain functions in all its possible
generality. We will assume that we are dealing with additive domain functions. This is an
important restriction to the concept of Cauchy: the surface and the volume of a body are two
coexistent quantities, these are certainly two domain functions, but only the second one is
additive.
In reality, for addressing the problem at hand, Cauchy, like we are going to do, uncon-
sciously restricts himself to the case of additive domain functions. Moreover, this restriction
is practically justified by the fact that the numbers provided by physics, which we call
measure of the quantities,23 are indeed additive domain functions.
Every point . P0 is then interior to a domain . D(P0 ) such that the ratio . ψϕ differs from
. F(P) by less than .ε for the domain . D(P0 ) and for all the interior domains, subject to
the restrictions which could have been imposed in the definition of the derivative. Using a
finite number of these domains . D(P0 ), we can cover, according to the theorem of M. Borel
(Sect. 8.2), the entire domain . D under consideration. By restricting these domains . D(P0 ),
we can assume that they have no common interior points. Then we would have partitioned
. D into partial domains . D1 , D2 , . . . , Dn and taken a particular point . Pi in each of them or
If . f (P) is continuous, this will be only slightly modified by assuming . Pi is taken inside . Di .
From this, it follows
.ϕ(D) = ϕ(Di ) = ψ(Di ) f (Pi ) + θε |ψ(Di )|.
23 In my opinion, quantities must be defined right from the beginning as numbers associated with
domains, such that the quantities associated with domains resulting from the subdivision of another
domain have their sum equal to the quantity associated with the latter domain.
280 12 The Integral of Stieltjes
If therefore, regardless of the fragmentation of . D into the . Di and the choice of . Pi , the
first sum tends towards a definite limit for increasingly smaller . Di , and the second remains
bounded, the latter fact expresses that .ψ is of bounded variation - with this knowledge we
know how to calculate .ϕ(D).
The precision of these insights leads very naturally to the Stieltjes integral. We just need
to assume that it concerns one dimensional domains, intervals; that . f (P) is . f (x), and that
.ψ(D) is the interval function associated with a function.α(x) of bounded variation, to recover
In this case, we are dealing with a function.ψ(D) equal to the measure of segment projection,
on . O x, of the arc . D.
Ordinarily, such integrals are considered collectively
. P d x + Q dy + R dz.
If we compare this with the classical formula which gives the arc length of a curve in simple
cases
. d x 2 + dy 2 + dz 2 ,
12.5 Function Primitives and Totalisation with Respect to a Function 281
a formula that we must treat as a curvilinear integral, by substituting for .x, y, z, a func-
tion of same parameter .t. We would be led to examine the integration with respect to
several functions of sets, or if we want, with respect to several coexistent quantities. Let
. f (P, u 1 , u 2 , . . . , u p ) be a function of point . P and . p variables. Let us assume . f to be
homogeneous and of degree .1 with respect to the set of these variables. On the other hand,
let .ψ1 (E), .ψ2 (E), . . . .ψ p (E) be . p additive functions of coexistent domain.
The sum
. f [Pi , ψ1 (E i ), ψ2 (E i ), . . . , ψ p (E i )]
extended over a division of a domain or a set into partial sets . E i , where . Pi denotes a point
of . E i , would define a sort of Stieltjes integral of . f with respect to functions .ψ1 , ψ2 , . . . , ψ p ,
by its limit when it exists.
These summations have still not been studied; M. Hallinger24 , however used an integra-
tion which is denoted . f (x) d xdρdχ and which is the limit of sums
ψ(E i )χ(E i )
. f (xi ) ;
ρE i
the integral which was subsequently studied by M. Radon.
Let us conclude this section by noting that, according to Cauchy, the notion of coexistent
quantities is of elementary nature and renders a great service if it were used from the
beginning of Analysis or even of Geometry. In any case, it seems to us that there would be a
significant advantage in initially presenting a notion of integration with respect to a function
of domain. This would provide a synthetic view of all types of integrals of continuous
functions, the theory of these integral would be obtained more rapidly, and yet it would
better prepare for geometric and physical applications.
For the case of functions of a single variable, the problem of function primitives that we
have just encountered is stated as follows: Being given a function of bounded variation .α(x)
and a function . f (x) in .(a, b), find a function . F(x) which admits . f (x) as derivative with
respect to .α(x) at every point.
To that . f is the derivative of . F, means that we have
F(x + k) − F(x − h)
. f (x) = lim
h→0+ α(x + k) − α(x − h)
k→0+
if we translate exactly the considerations from the previous section. However, in that case,
it is obvious that . F(x) would not be determined at any point since . F(x − 0) and . F(x + 0)
F(x + h) − F(x)
. f (x) = lim ,
h→0 α(x + h) − α(x)
to say that . F admits . f as derivative. In finding the limit of the right-hand side, we only
consider the numbers .h for which the right-hand side, has a definite, finite or infinite value.
At an interior point of an interval in which both . F and .α are constant, the derivative would
be indeterminate. For any .ϕ(x), it could be considered equal to . f (x).
We will still use the method of chains of intervals to study this problem. However, we
need to proceed with caution, primarily because . F(x) is not continuous. Indeed, if we let,
.h tend towards zero with positive values in the derivative’s definition formula, we see that
F(x + 0) − F(x)
. = f (x).
α(x + 0) − α(x)
This formula and the similar formula for .x − 0 reveals us the points of discontinuity and
jumps of . F(x).
25 This is completely natural, because .α(x) appears only to define the function .ψ(D), and . F(x) is
defined by a domain function .ϕ(D). Now, the functions .α(x) and . F(x) correspond to .ψ(D) and
.ϕ(D), respectively, for which
are determined up to an additive constant, while .α(x) and . F(x) are not.
Stieltjes had already noted that for a given distribution of mass on . O x, in other words, for a
function .ϕ(D), there does not correspond a unique function .α(x). When .α(x) is given directly, any
point of discontinuity .x0 of .α(x) corresponds to the concentration of a mass .α(x0 + 0) − α(x0 − 0)
at point .x0 . Stieltjes imagines that this concentration occurs at two geometrically coincident points
at .x0 ; the first one, carrying the mass .α(x0 ) − α(x0 − 0), belongs to .(a, x0 ); and the second one, of
mass .α(x0 + 0) − α(x0 ) belongs to .(x0 , b).
This is equivalent to considering symbols like .x + 0, .x − 0 as numbers in the same way as the
symbols.x, and all these symbols can be classified in order of magnitude. It is also equivalent to treating
sets of numbers .α ≤ x ≤ β, where .α and .β are two different numbers, with .α < β, as intervals, and
to consider .ϕ(D) as a function of such intervals. The intervals would then fall into nine different
categories depending on whether their origin and extremity are represented by the symbols .x − 0, x
or .x + 0. There would be three kinds of null intervals,
These conventions would do away with need for the precautions that we previously had to take when
dividing one interval into several others (Sect. 9.1); in such a division, any null interval of the form
Because of this discontinuity, if .a, x1 , x2 , . . . , are the division points of the chain, we
apply the formula
This formula is obviously true for . I + 1 if it is true for . I . Moreover, it also holds for an
index . I that a transfinite number of the second kind, provided it holds for . I0 < I , since as
. I0 increases towards . I , . F(x I0 − 0) tends towards . F(x I − 0). In other words, whether . I is
of first or second kind, the formula is true for . I because it is true for smaller indices. The
formula is general. We will take the intervals of the chain in such a manner that we have
F(xi+1 − 0) − F(xi − 0)
− f (xi ) < ε,
.
α(x − 0) − α(x − 0)
i+1 i
. F(xi+1 − 0) − F(xi − 0)
= f (xi )[α(xi+1 − 0) − α(xi − 0)] + θi ε|α(xi+1 − 0) − α(xi − 0)|,
where .θi is included between .−1 and .+1; this is again written
. F(xi+1 − 0) − F(xi − 0)
= f (xi )m α(x) [xi ≤ x < xi+1 ] + θi ε[V (xi+1 − 0) − V (xi − 0)].
. V (x) denotes, as always, the total variation of .α(x) from .a to .x. Hence
From this formula and the similar one for .(a, x), we easily deduce that, whenever . f (x) has
an integral of Stieltjes–Riemann, . F(x) is the indefinite integral of . f (x) taken with respect
to .α(x). Let us content ourselves by deducing that if . f (x) is identical to zero, then . F(x) is a
constant, from which it follows that the primitive function, with respect to a given function of
284 12 The Integral of Stieltjes
formula in which, for simplicity, we have included under the contribution of .(a, x) under
the . sign, despite its special form. And let us specify, the choice of intervals of the chain,
as in Sect. 10.1 and following. For this purpose, let us assume that . f (x) be summable with
respect to .α(x) and let . El be the set . E[lε ≤ f (x) < (l + 1)ε]. Let us enclose . El in a set
. Al of non-intersecting intervals, where the measure of . Al with respect to . V (x), does not
exceeds that of . El by more that .εl at most. Choose the numbers .εl in such a way that the
series . εl and . |l|εl are convergent and are of very small sums .η and .ζ.
Let us subject the intervals .(xi , xi+1 ) whose origin .xi belongs to . El to be enclosed in . Al
and such that
F(xi+1 − 0) − F(xi − 0)
− lε < ε;
.
α(x
i+1 − 0) − α(x i − 0)
our formula becomes
. F(b − 0) − F(a) = lim lεm α(x) [xi ≤ x < xi+1 ].
ε→0
i=0
Let us show that the series which appears there is absolutely convergent. Let .l be the
measure, with respect to .α(x), of those intervals of the chain whose origins are points of
. E l , and let . be the measure of these same intervals with respect to . V (x). By summing the
l
absolute values of the terms of the series, we see that their sum is at most
. |l|εl ≤ |l|εm V (x) [Al ] ≤ |l|ε[m V (x) (El ) + εl ],
Since the series is absolutely convergent, we can group the terms and write
. F(b − 0) − F(a) = lim lεl
ε→0
Let us show that, as .η and .ζ tend towards zero, we can replace each .l under the . sign
with its limit .m α(x) (El ). The series of the right-hand side, having terms which vary less than
those of the derivative . |l|εl , it suffices to justify the passage to the limit for this series.
Now, we have
12.5 Function Primitives and Totalisation with Respect to a Function 285
We have just revisited the reasoning from Chap. 9, but under particularly simple condi-
tions. To follow the reasoning of this chapter more accurately, it is necessary to introduce the
concept of derivative numbers with respect to a function .α(x). The reader will easily see that
all the results of Chap. 9 would then extend to integration and differentiation with respect
to .α(x), and very often quite literally. Let us limit ourselves to verifying this statement in
connection with the preceding one: The indefinite integral function of one variable of . f (x),
with respect to a function of bounded variation .α(x), admits . f (x) as its derivative with
respect to .α(x), except at most in a set of points of measure zero, with respect to .V (x).
We have, by the very definition,
x V (x)
. F(x) = f (x) d[α(x)] = f [x(ν)]. A(ν) dν.
a 0
Except possibly for a set . E ν of values of .ν with measure zero, the function
ν
. f [x(ν)]. A(ν) dν
0
admits . f [x(ν)]. A(ν) as its derivative and . A(ν) admits . A(ν) as its derivative.
In other words,
286 12 The Integral of Stieltjes
that the derivative of an unknown function . F(x), finite everywhere, is given by . f (x) with
respect to a given function of bounded variation .α(x).
It is clear that, for the determination of . F(x), integration with respect to .α(x) will be
insufficient, and we will need to turn to a generalisation of totalisation, since the operation
of totalisation is necessary when .α(x) reduces to the function .x. But such a generalisation
presented itself to us (Sect. 12.1) when we decided to take the formula as the definition of
the Stieltjes integral:
b V
. f (x) d[α(x)] = f [x(ν)]. A(ν) dν,
a 0
we have considered only the case where the theory of the summable functions gives meaning
to the right-hand side, without invoking the theory of totalisation. Let us now agree to call
the definite total of . f (x) with respect to .α(x), the expression
in which the symbol .T0V on right hand side denotes the definite total, in sense of M. Denjoy,
of the function . f [x(ν)]. A(ν) assumed to be totalisable.26
26 By doing away with use of the word integral instead of total and the symbol . instead of the
symbol .T , I follow the example of M. Denjoy, who has always carefully distinguished integration
from totalisation in both vocabulary and formulae.
12.5 Function Primitives and Totalisation with Respect to a Function 287
This new approach to totalisation simultaneously defines the indefinite total of . f (x) with
respect to .α(x). These two totals are obtained, as previously, by repeated use of operations
similar to the operations . A and . B of Sect. 11.4: by transfinite induction
in .(ak , ak−1 ), and defined in a similar manner on .(bk−1 , bk ) is taken as indefinite total
of . f (x) in.(l + 0, m − 0). We achieve the determination of the indefinite total, in.(l, m),
by agreeing that . F(x) has, at .l, a right jump equal to
. f (l)[α(l + 0) − α(l)]
. B1 . We have a closed set . E contained in the interior of an interval .(l, m); we assume totals
of . f (x) to be known for the various intervals .(α, β) contiguous to . E and contained in
.(l, m); if the series . [F(β − 0) − F(α + 0)] provided by these totals is convergent
and, if . f (x) is summable over . E with respect to .α(x), we take
. f (x) d[α(x)] + [F(β − 0) − F(α + 0)]
E
For . f (x) to be totalisable with respect to .α(x), it is necessary that: .1. that the operation . A1
yields a function for which . F(l + 0) and . F(m − 0) exist; .2. for any closed set .E , there exists
an interval .(l, m) containing points of .E in its interior, where the necessary conditions for
operation . B1 are satisfied.
From the definition, it also follows that: a function . F(x) is an indefinite total with respect
to .α(x) if, and only if:
Other authors, on the contrary, have used the word integral and the symbol . for all cases.
Both approaches have their advantages and disadvantages.
288 12 The Integral of Stieltjes
Finally, from the relationship between the indefinite integral with respect to .α(x) and the
integrated function . f (x), it follows that the indefinite total, taken with respect to .α(x), of
a function . f (x) admits . f (x) as its approximate derivative with respect to .α(x), except at
points of a set of measure zero with respect to the total variation .V (x) of .α(x).
By approximate derivative of . F(x) with respect to .α(x), at a point .x0 , we mean the limit
of ratio
F(x0 + h) − F(x0 )
.
α(x0 + h) − α(x0 )
taken for numbers .x0 + h, forming a set . E of density one, with respect to .V (x), at point
x0 . This amounts to saying that, if we let .x and . y tend towards .x0 , x ≤ x0 ≤ y, and if . E x y
.
m V (x) E x y
denotes the subset of . E located in .x y, the ratio . V (y+0)−V (x−0) tends towards one.
Let us limit to the theory of totalisation with respect to a function with these assertions,
that the reader can immediately justify, and return to the search for primitive functions.
Therefore, our goal is to construct a function . F(x) knowing the finite value . f (x) of its
derivative with respect to a given function .α(x) of bounded variation. Let us go back to what
we have already done (Sect. 12.3).
For any value .ν0 of .ν such that we have .ν0 = V (x0 ) for one or several values .x0 , let
us associate the number .F(ν0 ) = F(x0 ). This convention does not lead to any ambiguity
because, if there are several values .x0 corresponding to .ν0 , it is because they all yield the
same value for .V (x0 ) and therefore to .α(x0 ) and as a result for . F(x0 ). Indeed, if . F(x) was
not constant in an interval where .α(x) is constant, . F(x) would not have a finite derivative,
with respect to .α(x), at all the points of this interval.
For a value .ν0 such that we have:
F(x0 ) − F(x0 − 0)
either F(ν0 ) = F(x0 − 0) + [ν0 − V (x0 − 0)],
V (x0 ) − V (x0 − 0)
.
F(x0 + 0) − F(x0 )
or F(ν0 ) = F(x0 ) + [ν0 − V (x0 )].
V (x0 + 0) − V (x0 )
It is clear that the function.F(ν), defined on the interval.(0, V ), is continuous. Let us set aside
the countably infinite set . D of points .ν provided by the intervals where .α(x) is constant and
by the formulae
in which we will assign to .x0 , only the values corresponding to discontinuity of .α(x).
Let us examine what the differentiation of .F(ν) yields at a point not belonging to . D. In an
interval .V (x0 − 0) < ν < V (x0 + 0) the function .F(ν) is formed by two linear functions
of slopes
we have
F(ν1 ) − F(ν0 ) F(x1 ) − F(x0 ) α(x1 ) − α(x0 )
. = ,
ν1 − ν0 α(x1 ) − α(x0 ) ν1 − ν0
and when .ν1 tends towards zero the first ratio tends towards . f (x0 ), the second remains
contained between .−1 and .+1 and tends almost everywhere towards . A(ν0 ).
If the value of .F(ν1 ) arises from the second part of the definition of .F(ν), it means
that .x(ν1 ) is the abscissa of a singular point of .α(x) and .ν1 is included in .{V [x(ν1 ) −
0], V [x(ν1 ) + 0]}.
Let us assume, for example, that we have
Therefore:
The function .F(ν) has, except, possibly, at points of a countable set . D, all its derivative
numbers finite, and as a result .F(x) is an indefinite total;
The function .F(ν) has almost everywhere a definite and finite derivative equal to
. f [x(ν)]. A(ν), therefore .F(ν) is the indefinite total
Thus, the search for a function . F(x), whose finite derivative . f (x) taken with respect to
a given function .α(x) of bounded variation is known, can always be carried out by the
indefinite totalisation of . f (x) with respect to .α(x).
This result is a direct generalization of M. Denjoy’s result for the case where .α(x) ≡ x.
It would be very interesting to systematically revisit the reasoning which served us in this
particular case and extend it to the general case. The reader will not encounter any particular
difficulty in this study. The reader can also show that the method developed in Sect. 11.2 and
onwards, which allows for solving the problem of primitives functions without invoking the
notion of integration, still applies to any function .α(x) of bounded variation. He can also
examine the problem of primitive functions of the derivative numbers to which the method
of change of variable used here does not seem to provide a solution. We will not explore
this generalisation of problem of primitive functions. However, there are others, much more
elementary and immediate, which remain unsolved; we will see them.
The case where .α(x) is of bounded variation is, as explained, probably the only one of
physical interest. However, from a mathematical point of view, there is no reason to only
consider the derivative of a function . F(x) with respect to a function .α(x) under hypothesis
where .α(x) is of bounded variation.
Now, if we abandon this hypothesis, almost none of our conclusions would survive. For
instance, let us show that if .α(x) is of unbounded variation, there exist continuous functions
. f (x) which do not have the Stieltjes integral with respect to .α(x). In other words, for these
functions, the sums . S = f (ξi )[α(xi+1 ) − α(xi )] do not tend towards any definite and
finite limit, when we vary the choices of .xi and .ξi so that the maximum .λ of .xi+1 − xi tends
towards zero.
Indeed, when.α(x) is of unbounded variation, we can (Sect. 5.1), find an ordered sequence
of points . X i , such that the series
. |α(X i+1 ) − α(X i )|
is divergent. Then we can find a sequence of numbers .ρi tending towards zero and such that
the series
12.5 Function Primitives and Totalisation with Respect to a Function 291
. ρi [α(X i+1 ) − α(X i )]
is of positive terms and divergent.
Let us suppose, for fixing our ideas, that our points . X i succeed each other in the order
where .b1 is the limit of . X i . Let us take for . f (x) a continuous function, that is zero from .a
to . X 1 , from .b1 to .b and at the points . X i , and attains the value .ρi in .(X i , X i+1 ).
We claim that, for any given maximum length .λ imposed on the intervals of subdivision
of .(a, b), we can choose these intervals and the points .ξi so that the corresponding sum . S
exceeds any prescribed bound.
Let .k be the value of index .i from which the difference . X i+1 − X i remains less than
.λ. Let us arbitrarily divide .(a, X k ) into intervals of length .λ at most and let us choose in
each of them a point .ξ; do the same for .(b1 , b). It remains to divide .(X k , b1 ). Let us take
the subdivision points as . X k+1 , X k+2 , X k+l ; where .l is an arbitrary integer. Let us take
points .ξ in .(X k , X k+1 ), (X k+1 , X k+2 ), . . . , (X k+l−1 , X k+l ), at which . f (x) has the values
.ρk , ρk+1 , . . . , ρk+l−1 , respectively. In .(X k+l , bl ) we take .ξ at . X k+l−1 . Then we have
k+l−1
. S=s+ ρi [α(X i+1 ) − α(X i )],
k
where .s is the contribution of intervals .(a, X k ), (b1 , b), which do not depend on choice of
.l. Now, for sufficiently large .l, the second term in the right-hand side exceeds any given
bound. Therefore, . S is as large as we want. Hence, the definition of Stieltjes does not apply
to . f (x) and .α(x).
Thus, we can no longer associate with each continuous function . f (x) an integral, with
respect to .α(x), when .α(x) is of unbounded variation.
Moreover, the problem of primitive functions will no longer arise for all continuous
functions . f (x).
Let us take .α(x) = x sin x1 ; the function .α(x) is increasing in the intervals
1 1
. pk = , ,
2k + 1
2 π 2k + 23 π
1 1
. nk = , .
2k − 1
2 π 2k + 21 π
Let us take . f (x) continuous in .(0, 1), zero in .n k , positive in . pk . That will be possible even if
we require that the integral . pk f (x) d[α(x)] has a value .πk , provided that .πk tends towards
zero faster than the growth of .α(x) in . pk .
292 12 The Integral of Stieltjes
Let us take
1
.πk = .
2k + 1
2 πLogk
Then, it is clear that, in .(ε, 1), however small .ε > 0 may be, . f (x) is the derivative, with
respect to .α(x), of the function x
. f (x) d[α(x)].
ε
But this integral increases indefinitely when .ε tends towards zero, because the series of .πk
is divergent. Therefore, in .(0, 1), the continuous function . f (x) is not the derivative of a
function . F(x) with respect to .α(x).
Thus, when we no longer assume that .α(x) is of bounded variation, the problem of
primitive functions appears completely different from the one we have solved.
However, here is a category of functions .α(x) to which the previous considerations
immediately apply.
The function .α(x) satisfies the following two conditions:
Let us take for . E as the interval .(a, b) itself. .β(x) is identical to .α(x) in .i; in .i we know the
derivative . f (x) of . F(x), with respect to the function of bounded variation .α(x). Therefore,
. f (x) is totalisable, with respect to .α(x). Consequently, an interval . j exists in .i, in which
. f (x) is summable with respect to .α(x); the Stieltjes integral of . f (x) will yield . F(x).
Thus, . F(x) is known in intervals which cover the interior of intervals contiguous to a
certain closed set. From there, we deduce . F(x), first in the contiguous intervals considered
as open sets; and then, as we know the jumps of . F(x) at every point, in the closed contiguous
intervals.
Let us assume that, by this operation or any other, we have determined . F(x) in the close
intervals contiguous to a closed set . E. let .G(x) be the function derived from . F(x) as .β(x)
is derived from .α(x).
If .(l, m) is an interval contiguous to . E, .G(x) is linear in it, and admits, with respect to
.β(x), a know derivative
G(m) − G(l) F(m) − F(l)
. = .
β(m) − β(l) α(m) − α(l)
At the points of . E, we have . F = G, α = β, hence it easily follows that .G admits at these
points . f (x) as derivative with respect to .β(x).
12.5 Function Primitives and Totalisation with Respect to a Function 293
This is, however, only true for left-hand derivative at points .l, right-hand derivative at
points.m; the right-hand derivative at.l, and the left-hand derivative at.m, have been calculated
above.
Thus, in .i, we know the derivative of .G(x) with respect to the function of bounded
variation .β(x). This derivative is finite and definite, except at the points of a countable
set . Dx where there exists a finite and known right derivative and a finite and known left
derivative.
The conditions we are in are therefore a little more general than the ones previously
examined, but nothing essential will change. When the derivative . f (x) exists everywhere,
we deduce that the function.G (ν) arising from.G(x) has a finite right-hand superior derivative
number, except possibly at the points of a countable set . D. It is necessary for us to say now,
except at the points of . D + Dν , where . Dν is the transform of . Dx . If all the points of
. D x are points of continuity of .α(x), Dν is countable and nothing changes in our previous
conclusions. We would not even need to know that at points of . Dx the derivative numbers
of .G(x) are finite. If .x0 is a point of discontinuity of .β(x) belonging to . Dx , to this point
correspond, in . Dν , two intervals27 .[V (x0 − 0), V (x0 )], [V (x0 ), V (x0 + 0)]; but thanks to
the right and left derivatives at .x0 , f d (x0 ), f g (x0 ), which are known, we know .G (ν) in these
two intervals, and we know that .G (ν) has at any point a finite and known right derivative.
Thus, nothing essential changes; .G (ν) has finite right-hand superior derivative number at
every point, except at most at countable set of points..G (ν) is an indefinite total in.i. However,
to calculate .G(x), it would be necessary to slightly modify the method of calculation of
Stieltjes integral . ϕ(x) d[β(x)]; because now .ϕ(x) represents the known derivative of .G
with respect to .β(x), which is multi valued at the point of discontinuity .x0 , if at this point
. G has a right derivative . f d (x 0 ) and a left derivative . f g (x 0 ). Then . x 0 may appear in a set
. E[lε ≤ ϕ < (l + 1)ε], either only because of the value. f d (x 0 ), or only because of the. f g (x 0 ),
or both; depending on these cases, .x0 will appear in .m β(x) {E[lε ≤ ϕ < (l + 1)ε]} for
27 Here, . V (x) denotes the total variation of .β(x) and not .α(x); for the particular function .β(x) in
the text and the special set . Dx , one of the two considered intervals does not exist. Indeed, .β(x) is
continuous to the right at .l and to the left at .m.
294 12 The Integral of Stieltjes
By28 such elementary modifications, we will therefore find a function . F(x) knowing the
finite value . f d (x) of its right derivative with respect to a function of bounded variation .α(x)
at every point, and knowing, at the points of discontinuity of .α(x), the finite value . f g (x) of
its left derivative.29
But these modifications are unnecessary here because the points of . Dx are origins .l or
extremities .m of intervals contiguous to . E; .β(x) is continuous to the right at .l, and to left at
.m, so it is sufficient to take .ϕ = f at the points of . D x without taking into account the values
As a result, in.i, we can find an interval. j containing points of. E in its interior and in which
.ϕ is summable with respect to .β(x). The integral of .ϕ, in . j, is the sum of the contributions
from the intervals .(l, m), which is known, and the contribution from the subsets . E j of . E
located in . j. Therefore, we have
Ej
. AG(x) ( j) = f (x) d[β(x)] + [G(m) − G(l)].
Ej j
the extremities of . j being assumed to be taken in . E, but only if, in the calculation of the
integral with respect to .α(x), we consider the points .l only for their left measure, and the
points .m for their right measure.
In one or the other form, we see that there is a possibility of determining . F in intervals
containing points of . E; this is enough to be sure of being able to obtain . F in entire .(a, b)
by transfinite induction. As a result, . F is determined up to an additive constant.
The new category of functions .α(x) is very vast, however we do not always know how to
find the continuous function . F(x) which admits, with respect to a given continuous function
.α(x), a given continuous derivative . f (x). It is sufficient to take .α(x) to be continuous and
of unbounded variation in any interval and to take . f (x) ≡ 1 for us to be in this case. Without
doubt, in this example, we know one of the primitive functions of . f (x), namely the function
α(x) itself; but we do not know whether there exist the primitive functions which are not of
.
the form
.α(x) + const.
Indeed, we have not given here any method, either for finding the primitive functions of
.f (x) ≡ 1, or for limiting the extent of their indetermination.
Note. On Transfinite Numbers
A
I. For any infinite set (that is, it contains an infinite number of points), . E 1 exists, and it
is Bozano–Weierstrass principle. To show this, let us group all the numbers into two
classes,. A and . B. Class. A contains all numbers that are greater than only a finite number
of elements of . E, while class . B contains the remaining numbers. The cut . A, B defines
a number that is obviously a limit point of . E, and in fact, it is the smallest of its limit
points.
1
. E is obviously closed, meaning it contains its limit points, therefore it contains its
1 We can always assume that the set . E which appears in this statement is closed. Therefore, it would
be sufficient to study only closed sets, but this limitation would not lead to any notable simplification.
2 He deals with points on a straight line, therefore with numbers. There would only be a little change
if he dealt with the set of points in space of several dimensions. Furthermore, the use of the curves,
such as the curve of Peano, allows to limit ourselves to study the case of straight line.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 297
R. Jain, Lebesgue’s Theory of Integration,
https://doi.org/10.1007/978-981-96-1169-0
298 Appendix A: Note. On Transfinite Numbers
These sets . E 1 , . E 2 , . E 3 · · · exist in certain cases. One straightforward case where their
existence is obvious is when. E 1 is perfect, because then. E 1 ,. E 2 ,. E 3 · · · are all identical.
In this case, the definition of . E 2 , . E 3 · · · does not present any interest, and we will agree
never to discuss the derivative of any perfect set. However, these sets can all be distinct.
Here is the construction method we will use to see this:
Let there be sets .e1 , e2 , . . . located in .(0, 1). Let us divide .(0, 1) into partial intervals
.( , 1), ( 2 , ), ( 3 , 2 ), . . .. Let us perform on .ei the homothetic transformation which
1 1 1 1 1
2 2 2 2 2
replaces .(0, 1) by .( 21i , 2i−1 1
); .ei becomes .Ei . The sum of these sets .Ei and of point .0
will be denoted by . A(e1 , e2 , . . .).
If .e1 , e2 , . . . reduce to the point .0,
. A1 = A(e1 , e2 , . . .)
is a set for which . E 1 reduces to point .0. If .e1 , e2 , . . . are identical to . A1 , we obtain
. A 2 = A(A 1 , A 1 , . . .) for which . E reduces to point .0, and so on.
2
. Aω+1 = A(Aω , Aω , Aω , . . .)
the derivative of . E ω reduces to the point .0. The derivative of . E ω is denoted by . E ω+1 ,
and more generally, the successive derivatives of . E ω , are denoted . E ω+1 , . E ω+2 , . . ..
It is not necessary to attach any importance to the particular form of the indices used
here; it is sufficient to imagine that different symbols are used to denote the different
derivatives.
We will say that one derivative of a set comes after another if it is contained within
the other. With this convention, the words before and after can be used as in ordinary
language.
So far, we have used the following definition: When a derivative contains an infinite
number of points and is not perfect, its derivative is by definition the first derivative
which comes after it. A second definition is necessary; to formulate it, let us first note
3 In summary, we have just proven that there are always points common to the sets . E , E , . . . when
1 2
these sets are closed and each of them contains all those that follow in the sequence . E 1 , E 2 , . . .. The
set of these points is obviously closed.
Appendix A: Note. On Transfinite Numbers 299
either as derivatives, considering the higher indices .λ, μ, and we will use the terms
before and after to state the result of this comparison; or as terms of the sequence
. E 1 , E 2 , . . ., considering the lower indices .i, j, and we will use the terms in front of and
behind.
Within the sequence . S let us mark the terms that are both before . E 1 and behind . E 1 ,
we obtain . E 1 , F2 , F3 , . . .. Within this sequence, mark the terms that are both before
and behind . F2 , and so on. Finally, we will obtain a sequence . E 1 , F2 , G 3 , . . . for which
there is an agreement between the terms before and front, as well as the terms after and
behind. This implies that each set within this sequence . contains all those that are
behind it. As these sets are closed, there exists a closed set of points that is common to
all the sets in ..
Every point in this set .E is obviously common to all the sets in . S, and conversely, since
every set in . S is either a part of . or comes before a set in .. For the same reason,
.E would not change if we replace the sequence . S with any other sequence . S formed
by derivatives of . E, and such that every term in one of these sequences belongs to the
other or comes before a set in the other.
Because of these facts, and by analogy with the definition of . E ω , we state the following
proposition and definition:
III. When derivatives of a set, whether countable or finite in number, all contain points,
there exist points common to all these derivatives. The set of these points, which is
obviously closed, is by definition, the first derivative which does not come before the
given ones.4
For this definition to hold, if there exists a set .e in sequence . S that comes after all others,
our definition must lead to that set. Indeed, this is the case because .e, coming after the
others, is contained in all the others, and it indeed constitutes the set of points common
to all the sets of . S. In the case we are currently examining, the sequence . is bounded
and conversely; .e is the last term of .. In other cases, the defined derivative is called
the first derivative that comes after the given derivatives.
By definition, there are at most countably infinite number of derivatives before each
derivative.
4 The reasoning which has served us provides a general result which can be stated, using a notion
defined in the following section, under the form: For a countable well ordered set of closed sets, such
that each of them contain all those which follow it, there exist points common to all these sets and,
these points form a closed set.
300 Appendix A: Note. On Transfinite Numbers
Cantor states that a set of elements is ordered if a relationship established between any two
elements of that set, which can be expressed using the words before and after; following the
customary rules. In other words, if we say .α is before .β, it implies: .β is after .α. Similarly,
if we say .α is before .β, which is before .γ, then .α is before .γ.
Cantor further defines a set . E as well ordered, if it is ordered in such a manner that, in
this set . E, and in each partial set . P obtained by removing elements from . E, there exists an
element that comes before all the others.
If we agree that before means smaller than, every set of real numbers is ordered, but
it is not necessarily well ordered. The set of real numbers contained in .(0, 1), as well as
the set of positive and negative integers are not well ordered. On the contrary, the set of
positive numbers, that can be written in the decimal system with a non-increasing sequence
of decimal digits is well ordered. That is, the set of numbers
a b c
.n+ + + + ···
10 100 1000
where .n is an integer, and the digits .a, b, c, . . . satisfy the inequalities .9 ≥ a ≥ b ≥ c ≥ · · · ,
is well ordered.
The set of sets. A1 , A2 , . . . , Aω , Aω+1 is well ordered when arranged in the order in which
they were obtained, but merely ordered when arranged in the reverse order.
The set .E of derivatives of a set is well ordered, with the word before having the indicated
meaning, that is, the derivative .1 is before the derivative .2 , if .1 contains .2 . Indeed, .E
is ordered, containing an element before all the others, the derivative . E 1 . Let . P be a partial
set obtained from .E . Let .α be a derivative located in . P; then there are at most countably
many derivatives before .α, the set .E1 of derivatives located before all those in . P, contains
at most countably infinite number of elements. If . E 1 is in . P, .E1 does not exist. In this case
. P includes the element . E that comes before all the others. If . E is not in . P, .E1 exists
1 1
and there exists a derivative of . P which is the first one, coming after all those in .E1 . This
derivative .λ belongs to . P, and there cannot be a derivative in . P that comes before this one;
.λ is, therefore an element of . P that comes before all the others. Therefore, .E is well ordered.
The sets of derivatives are such that before each of their elements, there are at most a
countably infinite number of elements. We will deal only with well ordered sets that possess
this property and reserve the term sequence for these sets.5 The transfinite numbers that
we are going to define are those which are used to represent the order of the elements in
these sequences. Let us consider one of these sequences . S; it contains an element that comes
before all others, which we will call the first and can be denote with index .1, .u 1 . Then, if
. S is not reduced to .u 1 , the sequence . S − u 1 , will have a first element that we will call the
5 We will reserve the term simply infinite sequence for the sequences whose elements are denoted
using successive integers.
Appendix A: Note. On Transfinite Numbers 301
IV. The set of transfinite numbers is uncountable.—In fact, if it were countable, we could
simply add a new symbol to the countable sequence . S0 of finite and transfinite numbers,
placing it after all the elements of . S0 , to obtain a sequence . S1 where the position of
last element cannot be denoted using the symbols from . S0 . Thus, . S0 is uncountable.
However, before any element of . S0 , there are only countably infinite numbers, at most.
This fact is completely analogous to the following: the sequence .s0 of integer is such
that before any element of .s0 , there are only finitely many integers. However, as adding
an element to any finite sequence results in another finite sequence, the sequence .s0 is
not finite.
Throughout these considerations, we have repeatedly assumed as obvious, that the
numbering of elements of an ordered sequence using the successive symbols of . S0 ,
could only be done in one way. For example, we have regarded it as clear that if elements
of . S0 are enumerated as terms of a sequence with the elements of . S0 as numbers, each
6 The transfinite numbers conceived in this way are known as ordinal transfinite numbers. Cantor
also considers transfinite cardinal numbers, which he uses to represent the cardinality of sets.
302 Appendix A: Note. On Transfinite Numbers
element is identical to the number which determines its position. We can, if not make
these fact clearer, at least present them in a form in that we are more accustomed to, by
saying:
Let us establish a correspondence between the elements of two sequences . S and .T in
such a way that the first two elements of . S and .T correspond to each other, and if the
first elements in a finite or countable number of elements of . S, say .s, s , . . . correspond
to the first elements .t, t , . . . of .T , then the first element of . S after .s, s , . . . corresponds
to the first element of .T , after .t, t , . . .; if these two elements exist.
Let us show that this correspondence is well-determined. Indeed, there are elements of
. S for which the correspondence is determined; the first element of . S, for example. Let us
consider the set . of all the elements of . S for which the correspondence is determined,
as well as for all the preceding elements. Then, if there were still elements remaining in
. S and . T , the correspondence, according to its very definition, would still be determined
for the first element of . S after .. This would contradict the definition of .. Therefore,
the correspondence is entirely determined and exhausts . S or .T , or both. In this later
case, we say that the sequences are similar.
In the case where . S is the sequence . S0 of transfinite numbers, it would be contradictory
to assume that the correspondence exhaust . S without exhausting .T : every sequence is
similar either to . S0 or to a segment of . S0 . (By a segment of a sequence, we mean all the
elements which precede a given element).
All the non-finite sequences we encounter, except for . S0 , will be countable; thus, they
will be similar to a segment of . S0 . Furthermore, a segment of . S0 is entirely determined
by the number which follow it. For each countable sequence. S we can attach a transfinite
number .α that informs us of the finite and transfinite numbers necessary to enumerate
the elements of . S. This number .α, thus give us an extremely important information
on . S, but naturally does not allow us to distinguish between . S and similar sequences
since, by definition, .α is the same for all similar sequences. It immediately follows from
the fact that the correspondence between the successive elements of two sequences is
determined, that if. S is similar to.T and.T to.U ,. S is similar to.U and the correspondence
established directly between the elements of . S and .U is consistent with what one could
established through the intermediary of .T .
This number .α is called the order type of the countable sequence . S. It provides us with
information on . S that is exactly analogous to what we know about a finite sequence
when we know that it contains .10 elements.7 G. Cantor introduced transfinite numbers
as order types.8
7 We would note that if we applied the previous definition of order type to finite sequences, it would
be the number .11 which would be the order type of a sequence of .10 elements.
8 His principal Memoirs on this subject can be found in the Math. Ann., Bd .46 and Bd .49. These
Memoirs were translated by F.Marotte (Sur les fondements de la theorie des ensembles transfinis,
Hermann, .1899).
Appendix A: Note. On Transfinite Numbers 303
We have just logically conceived a set. S0 of finite and transfinite numbers which is countable.
However, it is not certain that this set is entirely useful. Let us, therefore, show by examples,
that a segment of . S0 cannot suffice, either to enumerate all the derivatives of sets of points,
or to provide the order types of all sequences of points. These examples will not actually be
constructed; we will only show that assuming we must stop during their construction would
be absurd. G. Cantor frequently used this method of proof.
Therefore, let us suppose that we have constructed sets . A1 , A2 , . . ., . Aω , Aω+1 , . . ., for
which the derivatives are enumerated respectively using segments of . S0 where the last term
is the number .1, the number .2, . . ., the number .ω, the number .ω + 1, . . .. And this for all
numbers of a and certain segment . of . S0 . If .α is the first term of the sequence . S0 after .,
we aim to construct a set . Aα whose enumeration of derivatives requires all the numbers of
. S0 up to and including .α.
If .α is of first kind, that is, if .α − 1 exists, we will take . Aα = A[Aα−1 , Aα−1 , . . .]; if .α
is of second kind, that is, if .α − 1 does not exist, we arrange all the numbers of . into a
simply infinite sequence, let .a1 , a2 , a3 , . . . be this sequence, and we will take
We immediately verify that the set thus constructed answers the question. Indeed, in the first
case the derivative of order .α − 1 is composed of points . 21 , 41 , 18 , . . . and of point .0. In the
second, if .α p is the largest of thenumbers .a1 , a2 , a3 , . . . , a p , there are no
more
points in
the derivative of order .α p + 1 in . 2 p+1
1
, 1 while some points remain in . 0, 2 p+1
1
; where
.α is the first transfinite number which comes after all the numbers .α p + 1, the derivative of
9 Note that we intend to construct a set whose last derivative has a given rank .α, rather than a set
whose first derivative does not exist at a rank .α. According to proposition . I I I , such a set could not
exist when .α is of second kind.
304 Appendix A: Note. On Transfinite Numbers
decreasing order. Each point . A of . E is the extremity of an interval whose origin is the point
. B of . E with an abscissa immediately lower than that of . A. Therefore, we have as many of
these intervals, .(A, B) as there are points . A. These intervals are non-intersecting, therefore
there are at most . p of them with a length not less than . 1p and this is true for any . p. Therefore,
the set of these intervals is finite or countable; so the proposed set is at most countable.
The other fact is much more hidden. It is the result of the Cantor’s analysis of the properties
of derivatives. In this section, without following Cantor’s considerations step by step, we
will not digress significantly from the ideas that guided him.
. V. The points of . E 1 which do not belong to. E α , (α > 1), form a countable set.—Indeed,
the points of . E 1 which do not belong to . E 2 are isolated in . E 1 ; each of them can
therefore be enclosed in an interval that does not contain any other points of . E 1 , and
these intervals with no common points can be taken. Therefore, these intervals form
at most a countable infinity; thus, points of . E 1 which do not belong to . E 2 therefore
form a finite or countable set . B1 .
Similarly, points of . E β that do not belong to . E β+1 form a finite or countable set . Bβ .
Now, the set mentioned in the statement is the sum of . Bβ for values of .β less than .α,
which are finite or countable. Therefore, this set is itself finite or countable.
It can also be said that the points of . E which do not belong to . E α form a finite or
countable set, because the same information which showed that the sets. Bβ are at most
Appendix A: Note. On Transfinite Numbers 305
countable proves the same for the set of points of . E that do not belong to . E 1 . Hence, it
follows that any set of points whose one of the derivatives does not contain any points
is at most countable. These sets are called reducible sets. These are specific countable
sets: for example, the set of rational numbers is countable, but not reducible.
When a set is not reducible, two cases are possible a priori. Either, in the course of
derivation operation, we arrive at a derived set. E α which is perfect and then, following
our conventions, we would stop at our operations of derivation at this set. In other
words, we only consider the different derivative . E 1 , E 2 , . . . , E α . Or, the operations
of derivation will always yield the different derivative, resulting in an uncountable set
similar to . S0 of such derivatives. We will see that this case does arise.
Let us consider the set . S of derivatives of . E, which actually exist, meaning they
all contain points, and they are not perfect, except perhaps the last. They are all
different. Let .(a1 , b1 ), .(a2 , b2 ), . . . be the intervals contiguous to . E 1 , arranged in
a simply infinite sequence. Let .(aiα , biα ) be those intervals contiguous to . E α which
contain .(ai , bi ). Let .liα be the increment in the length when we pass from .(aiα , biα ) to
α+1 α+1
.(a
i , bi ). Some of .liα are zero. However, for each value of .α, some of .liα are non
zero, since . E α+1 is always different from . E α . If . E α+1 does not exist, we will take
α+1
.(a, b) as the interval .(α
i , βiα+1 ).
Now for a given .i, there can be at most a countable number of values of .li that are
non-zero, because the length of .(aiα , biα ) cannot exceed beyond the length of the
interval .(a, b) containing the given set . E.13 Since this is true for each integer value
of .i, the set of all the .liα different from zero is at most countable. The set of terms of
. S is therefore countable,
14 therefore
.V I . Every set of points either has a derivative which does not contain any point, or has
a perfect derivative.
. V I I . Any closed set is the sum of a finite or countable set and a perfect set; one or the
13 Therefore, we suppose that . E is entirely at a finite distance; this assumption is not at all essential.
We would treat the general case either by a transformation such as .x = tan πx 2 , or by decomposing
the set . E into sets . E i , where . E i is formed of those points of . E which are in .(i, i + 1). In the case
where . E is not entirely at a finite distance, it is convenient to agree that the derivative of . E contains
a point at infinity.
14 In essence, we have just proved a general property of sequences of closed sets, which will be stated
in the text shortly.
15 See Acta mathematica, t. .2.
I mentioned that we would not deviate from the ideas of Cantor, but the form of the proof in the
text is very different from that of Cantor in the following respect. Cantor uses the concept of the total
set of different of non different derivatives, that can be associated with each number from . S0 . He
306 Appendix A: Note. On Transfinite Numbers
obviously necessary to note that every perfect set is either uncountable or, more
precisely, has the cardinality of the continuum. This is obvious if the set contains an
interval; therefore, let us assume that . E is perfect set that is non-dense everywhere,
with .a and .b as its extreme points. Now we have associated (Sect. 2.2) with such a
set . E, a continuous function .ϕ(x), which is constant in every interval contiguous to
. E, increasing in every interval containing points of . E, and varies from .0 to .1 when
. x is contained between .a and .b. The equation .ϕ(x) = m admits, for each value of .m
between.0 and.1, either only one solution.x0 , in which case.x0 is the abscissa of a point
of . E, or an infinite number of solutions given by a double inequality .x1 ≤ x ≤ x2 , in
which case .x1 and .x2 are the abscissa of the two extremities of an interval contiguous
to . E. Thus, for each value .m of .(0, 1), we associate either a point .x0 in . E, or two
points .x1 and .x2 in . E; therefore, . E has the cardinality of the continuum.
Also note that the reasoning from Sect. A.3 proves that: Any sequence of closed sets,
that are all different, each of which contains those that come after it in the sequence,
must necessarily be finite or countable.
Throughout this book, we have made several references to the notion of the transfinite
numbers. Therefore, it is important to clarify this notion and if possible, to convince the
reader that they need only familiarise themselves with the sequence of transfinite numbers
by using it frequently to gain as clear a view of this sequence as they have of the sequence
of integers.16
When we first heard about the sequence . S0 of finite and transfinite numbers, we readily
believed that everything would become clear if we had a notation for transfinite numbers.
However, it is clear that we cannot have one; we can only manipulate a finite number of
symbols or sounds and combine them in a finite number of ways, which would always yield
only a finite number of notations. This number could be large, depending on the number of
symbols, their more or less fortunate choices, and the ingenuity of the prescribed combination
names . E as the set of points common to all these derivatives. This set . E is a sort of derivative that
comes after all those, related to the transfinite numbers of . S0 . For Cantor, the symbol . is the first
transfinite number of second class of transfinite numbers, and it comes after all the elements of . S0 .
On the contrary, we have avoided reasoning about the set . S0 as it is currently conceived in its
entirety. Instead, we have only reasoned about the procedure of formation of. S0 and any of its segments
obtained in the course of the application of this procedure. In this way, we do not use an uncountable
well ordered set at any point.
16 My goal is thus clearly defined; I will therefore not address the questions, which are more philo-
sophical than mathematical, raised by the concept of the sequence of integers. I consider this notion
to be established and perfectly clear.
Appendix A: Note. On Transfinite Numbers 307
rules, but it will always be finite. Any enumeration applied to an infinite set of numbers is,
in certain respects, fictitious.
Let us examine, for example, the decimal enumeration applied to the numbers included
between .0 and .1; it claims to allow us to represent any number. For this purpose, it uses
an indefinite sequence of digits .0, a, b, c, d, . . .. However, such a sequence can neither be
written, nor pronounced. There are a few fortunate cases in which we could state the law of
succession of decimal digits .a, b, c, . . . of .x and this law determines .x; most of the times, all
we could say is this: the sequence .0, a, b, c, . . . is determined by the number .x. It is obvious
how illusory the decimal notation of numbers in the interval .(0, 1) is.
Let us again examine the enumeration of integers. It is highly practical and easy for
everyday use and represents a significant progress over the methods of Greeks, although
those methods were already very powerful and somewhat akin to our current methods.
But, just like the methods of the Greeks, it does not allow us to denote all numbers. It only
convinces us, as the Greeks did, that no matter how large a number is, we will succeed in
establishing conventions ingenious enough to represent this number and smaller numbers.
The enumeration would only allow us to represent all the numbers if we could repeat the
same act (to write a digit, to state a digit) an arbitrarily large number of times. But then, we
could draw as many bars, or pronounce the sound ‘one’, as many times as there are objects
in the finite collection to be counted. Therefore, we have only solved the problem of the
representing numbers by placing ourselves in an assumption where the problem does not
actually arise.17
It is quite clear that, similarly, the notation of transfinite numbers would be immediate
for those who could repeat the same action a well ordered and countably infinite number of
times of any type of order.
Despite everything, our habits are such that we desire a semblance of notation; even if
it is merely conceptual and practically unattainable—meaning entirely illusory. Being able
to denote the first transfinite numbers .ω, ω + 1, . . . , 2ω, 2ω + 1, . . . , 3ω, . . . , ω 2 , . . . has
already helped us.
Let .x be a number between .0 and .1, let us write it in the decimal system. We assume this
expansion to be infinite, let .0, a1 , a2 , a3 , a4 , . . . . Let us set
. x1 = 0, a1 , a3 , a5 , . . . ,
x2 = 0, a2×1 , a2×3 , a2×5 , . . . ,
x3 = 0, a22 ×1 , a22 ×3 , a22 ×5 , . . . ,
...
If all the numbers .xi are different and if the set of these numbers is ordered in the direction
of increasing .x, for example, let us agree to say that .x denotes the order type of this set.
17 We note with interest what M. Borel says about a conversation he had with M. Baire (Leçons sur
la théorie des fonctions .2nd edition, p..178).
308 Appendix A: Note. On Transfinite Numbers
What we did in the previous section regarding the construction of sequences . Sα of order
α shows that every transfinite numbers is denoted by some points .x. However, this notation
.
is very imprecise: the same number .α corresponds to an infinite number of points .x. For this
notation to get close to the properties of ordinary notation, it would be necessary for each
transfinite number .α to be associated with well-determined18 sequence . Sα and therefore to
a well-determined19 number .x.
We are thus led to ask whether it is possible to name a set . E contained in .(0, 1) such that
there exist a bijective correspondence between the points of . E and the numbers of . S0 . This
problem has preoccupied a lot many mathematicians; we have asserted several times that we
could take the interval .(0, 1) itself as . E . However, the question is far from being resolved.20
Besides, for us, all this is secondary, because it is clear that this mode of representation of
transfinite numbers would hardly help us in understanding them better.
Yet, this pseudo-notation fulfils one of the essential conditions of notation. After all,
what is the purpose of enumeration of integers? It never appears in any reasonings, except of
course, in the matters which deal with the enumeration itself. Still, it allows us to characterise
the number we are discussing, provided it is not very large. In other words, it enables us
to construct and determine, a well ordered finite set similar to the one under consideration.
When someone talks to us about the number .3, we know that it is just about the sets similar
to this one: . I I I , and when we argue about the number three, we make arguments that
apply to all well ordered sets similar to set of such features. Whenever we argue about an
integer, given in any manner whatsoever, we are in the similar circumstances.
Therefore, it is clear that we could do away with the notion and the term ‘number’; the
more tangible notion of ordered and finite sequence of objects would suffice for us. Exactly in
the same manner, to assume a given transfinite number is to assume a well-odered countable
set as determined. To reason about this number, is equivalent to making reasons that apply
indiscriminately to all sets similar to this one. The notion and the term transfinite number
are therefore unnecessary for us; the notion of a well ordered countable set suffices for us.
Basically, the obscurities we believed to arise in the notion of transfinite number might
already be present in the notion of finite integer, when we want to see it as a metaphysical
entity, and therefore somewhat unclear.
G. Cantor says21 : We call the power or the cardinal number of a set . M, the general notion
that we deduce from . M using our faculty of thought, abstracting from the nature of different
elements of . M and their order.
And, for him, a type of order is the general notion which results from . M when we abstract
from the nature of the elements of . M, but not from their order of succession.
It appears difficult to find anything else in these philosophical definitions other than the
previous remarks, and as a result, these definitions reduce to a matter of language convention:
when we use the word ‘number’, we are merely reminding ourselves that we are reasoning
about a set, but we are doing so using reasonings that apply equally well to all sets similar
to the one under consideration.
Thus, the definition of the finite and transfinite numbers is void of its metaphysical content
and no longer presents any obscurity. It is true that we have shown simultaneously that the
use of the word ‘number’ is unnecessary; but we are habitual of using the words ‘whole
numbers’, so it will be easy for us to use the word ‘transfinite numbers’ as well, which we
know we can do without any inconvenience.22
In summary, when we speak of a transfinite number, we mean that we are dealing with a
sequence—that is, a well ordered countable set—defined only up to similarity.
How can we obtain a property of transfinite numbers?23 To answer this question, let us
examine how we obtain a property valid for all integers. This is always done by using the
method of mathematical induction at some point.
This method of reasoning is perfectly convincing24 ; it is constantly used by mathemati-
cians, and there cannot be any question for us to raise the slightest doubt on its validity.
Due to the fact that we will recall, the method of mathematical induction is not reducible
to syllogistic reasoning, we can, therefore, simply conclude that the syllogistic reasoning is
not the only one that can be used in mathematics.
22 What is happening to us here always occurs when one clarifies a notion which was earlier obscure.
Once it was understood that reasoning about the imaginary numbers was, in fact, reasoning about
pairs of real numbers, the use of imaginary numbers became both clear and unnecessary, at least
from the logical point of view. However, practically, there were significant advantages to using the
imaginary numbers, as their use became at both legitimate and advantageous.
23 We are only concerned with the reasonings applicable to any transfinite number; reasonings related
to a particular transfinite number, however, may betotally different from those we are going to study:
1
Let us consider the function. F(x) = f (x)−1 1
f (x) in the expression of which,. f (x) represents
any continuous monotonically increasing function which varies from.−∞ to.+∞ when.x varies from
.−∞ to .+∞. Let us reason about the set of values of . x in the neighbourhood of which . F(x) is infinite,
is reasoning about the number.ω + 1. However, it is clear that in doing so, we are not using the general
notion of a well ordered set.
24 I purposely used the word ‘convince’ to emphasise that in my opinion, the reasons to declare
oneself satisfied with an argument are of a psychological nature, in mathematics as elsewhere. The
logic gives us reasons for rejecting some of the reasonings; it cannot make us believe in a reasoning.
310 Appendix A: Note. On Transfinite Numbers
Let us assume that we have demonstrated: .1. A property . P holds for number one, .2. if
P holds for a number, then it also holds for its successor. Given this, let us demonstrate the
.
25 The question of so called . law of excluded middle . arose in connection with the difficulties
posed by certain reductio ad absurdum arguments. In this regard, one can refer to the work of M.
Brouwer and of M. Weyl. I may be allowed to take advantage of this opportunity to recall that, long
before the question assumed this philosophical form, I wrote (Soc, math, de France, .1904) . . . . I do
not attribute any more value to the method by which we show that a non finite set contain a countable
set. Although I strongly doubt that we will ever name a set which is neither finite nor infinite, the
impossibility of such a set does not appear demonstrated to me. .
By this I wanted to say that given definitions had been established for the two words finite and
infinite, it was not certain that non finite meant infinite. This observation is not unrelated to some of
the ideas of M. Brouwer. However, my purpose in making it was quite the opposite of his. I, on my
part, do not by any means, contest the value of the traditional logic and the method of reasoning by
reductio ad absurdum; I merely remind that it must be used correctly.
The contemptuous manner in which M. Brouwer speaks of the Paris School highlights our dis-
agreement very strongly.
Appendix A: Note. On Transfinite Numbers 311
We can hope to explore the domain thus constructed only by considering the possibility of
repeating the same reasoning an arbitrarily large number of times, which precisely means
reasoning by induction.
Our initial attempt to legitimise the reasoning by induction using syllogisms, therefore,
gives us a result as favourable as we could hope for, since it provides us with a means
of verifying . P, for each particular number using a finite number of syllogisms.26 Without
doubt, however fast we may go churning out these syllogisms, there will be numbers large
enough for which we cannot provide a demonstrative proof. However, it is easy for us to
conceive that, each time we manage to name a number . M, which means that each time we
have succeeded in performing the addition of an element, . M − 1 times in a row to form a set
of . M considered objects, we can find a way to repeat the induction reasoning . M − 1 times,
as is necessary to verify the property . P for the number . M. Moreover, for a mathematician,
it is not the demonstration that creates the truth of a proposition; it merely allows one to
ascertain that truth. Hence,we may not have the same requirements here as we did when
dealing with notation, and we can be satisfied with a demonstration constructed using the
same method by which the sequence of integers itself is constructed.
In short, the construction of a finite segment of the sequence of integers only requires the
use of what can be called finite induction. The demonstration of a property for all the numbers
of this segment requires only a reasoning by finite induction, reducible to syllogisms. The
construction of the complete sequence of the integers requires a process of infinite induction,
and demonstrating a property for all integers requires reasoning by infinite induction, which
could only be replaced by an indefinite sequence of syllogisms.
Consequently, it is clear that, to prove a proposition for the sequence of transfinite num-
bers, it is necessary to use a method of reasoning which allows us to follow step by step
the procedure of forming this sequence. This reasoning is called transfinite induction and
it requires proving: .1. that a property . P is true for the number one (or for the number .ω,
depending on whether we consider all finite and transfinite numbers or only transfinite num-
bers); .2. that if the property . P is true for all the numbers less than a number .α, it is true for
the number .α.
This second part of reasoning by transfinite induction is most often replaced by two
distinct demonstrations. We only prove the previous statement of number .2 above for .α of
the second kind and, for .α of the first kind, we show that if the property is true for .α − 1, it
is true for .α.
In either form, the transfinite induction reasoning would lead to the syllogistic verification
of the proposition for any number .α, if we were to assume that we can repeat a reasoning a
well ordered and countable number of times.
26 The legitimisation through reductio ad absurdum of the procedure by induction is far less perfect
in this respect, since it requires an actual infinity of syllogisms.
312 Appendix A: Note. On Transfinite Numbers
The links between this mode of reasoning and syllogistic reasoning remain close; how-
ever, we are still very far from ordinary reasoning since, at least for .α of second kind, the
second part of our reasoning is based on the knowledge of actual infinity of premises.27
We can, as we have already done it, explain why we regard a property as demonstrated
for all the transfinite numbers by transfinite induction reasoning by saying: if the property
. P was not true for all the transfinite numbers, there would exist numbers for which it would
not be true, the sequence of these numbers would comprise an element .α smaller than all the
others; property . P would be true for all the numbers smaller than .α and false for .α, which
would be a contradiction.
However, we would not consider this as a justification for transfinite induction reasoning.
This mode of reasoning is new, irreducible to others, and it is precisely why it is powerful
and useful.
But is one obliged to accept this mode of demonstration? Absolutely not; it is up to each
individual to decide whether he is entirely satisfied with reasoning by transfinite induction.
Nevertheless, since almost everything in mathematics has been written only for those who
admit the reasoning by ordinary induction, some passages of this book are written only for
those who admit the reasoning by transfinite induction.
Even if we feel convinced by the demonstrations involving transfinite induction, we may
not love them, desire to do without them or want to define what the domain of mathematics
would be without this mode of demonstration. This is why we would examine some means
of avoiding the use of the transfinite induction.
Throughout the contents of this book, we have used transfinite numbers in three different
ways. First, by using Cantor–Bendixson theorem; then, by using the chains of intervals;
and finally, in connection with the work of MM. Baire and Denjoy. Let us examine these
applications, beginning with the use of chains.
Let us start by assuming that a well-ordered set of points is given in .(a, b), ordered in
the direction of increasing .x, for example. We will label these points as .x1 , x2 , x3 , . . . , xω ,
xω+1 , . . ., and assume that each point with an index of the second kind is a limit point of
the set of points with smaller indices. Then we say that the division of .(a, b) using the
considered points is a chain of intervals.
We would use such a chain to evaluate the increment . F(b) − F(a) of a function . F(x)
by the formula
. F(b) − F(a) = [F(xi ) − F(xi−1 )],
27 This infinity is countable; it is to avoid having to consider an uncountable infinity of premises that I
previously refrained from discussing the uncountable sequence of distinct or non distinct derivatives
of a given set.
Appendix A: Note. On Transfinite Numbers 313
The . Si are called the partial series of . S. The . Si , as numbers, hold meaning for finite .i; for
λ<ω
.i = ω we agree that . Si , as number, exists only if the ordinary series . λ=1 u λ converges,
and in such a case, it is equal to the sum of this series. The numbers . Si thus defined, for
.i < ω, are partial sums of . S. For extending the definition of these numbers to any index
equal to .α at most, let us agree that we have, for any .λ of first kind
. Sλ = Sλ−1 + u λ−1 ,
and, for any .λ of second kind, . Sλ will denote the limit, if it exists, towards which any simply
infinite sequence . Sλ1 , Sλ2 , . . . relative to increasing numbers .λ1 , λ2 , . . . converges, defining
.λ as the smallest number greater than them.
When these definition apply to all values of .λ, such that .λ ≤ α, the transfinite series is
said to be convergent, and its sum . S is the number . Sα , which has just been defined. The
convergence of a series of order type .α thus requires the existence of a definite limit for each
number of second kind, at most equal to .α.
For the series
λ<α
. [F(xλ ) − F(xλ−1 )],
λ=1
relative to a continuous function . F(x), in which we have assumed .x0 = a, xα = b, and
where we replace . F(xλ ) − F(xλ−1 ) with zero when .λ is of second kind, it is clear that:
1. We have
.
. S1 = F(x1 ) − F(a);
2. If we have
.
28 This explanation could have been omitted in the main body of this work and postponed until now,
which indicates how natural and, in a way, necessary the following definitions are.
29 The presence of a term .u restores the agreement between the definition of order types for finite
0
and transfinite .α. See the footnote 7 of this chapter.
314 Appendix A: Note. On Transfinite Numbers
3. If we have
.
. Sλ = F(xλ ) − F(a)
for all the numbers .λ less than a number .μ of second kind, and if we take an increasing
sequence of numbers .λi greater than .λi , then, since the point .xλi increases towards .xμ , we
have, by the continuity of . F,
Hence, by transfinite induction, we see that the series is convergent and has the sum
. S = F(b) − F(a).
Here are two propositions concerning transfinite series, some particular cases of which are
frequently used in classical analysis, for example, in transforming an ordinary series into a
double-indexed series.
In a convergent transfinite series, we can group consecutive terms in any manner and
consider the resulting partial sums as performed. In other words, if we have
the .μ’s constitute a finite, simply infinite or transfinite sequence of order type .β, if the first
series is convergent, then ⎛ ⎞
λ<α
λ<λ
μ<β μ+1
. uλ = ⎝ uλ⎠ .
λ=0 μ=0 λ=λμ
λ<μ
. Sμ = uλ
λ=1
exists and is equal to the sum of the series .Vμ obtained by removing from .Vα , the terms
which do not appear in . Sμ . This certainly holds for .μ = 1; if this did not hold for any value
of .μ, there would be a first value .μ0 for which the proposition would not be true. Now we
suppose that .μ0 is of first kind. Then, we have
. Sμ0 −1 = Vμ0 −1
But
. Sμ0 = Sμ0 −1 + u μ0 −1 , Vμ0 = Vμ0 −1 + u μ0 −1 ,
hence
. Sμ0 = Vμ0 ,
which is in contradiction with the definition of .μ0 .
Therefore, let us assume .μ0 is of second kind. Then for .μ < μ0 , we have
. Sμ = Vμ ;
furthermore, .Vμ is a partial series deduced from .Vμ0 by removing some terms, and every
term of .Vμ0 belongs to .Vμ as long as .μ < μ0 is sufficiently large. Therefore, if the simply
infinite sequence
.μ1 < μ2 < · · · < μ0
defined .μ0 as the first number greater than all the .μi , then .Vμi tend towards .Vμ and as a
result we have
. Sμ0 = lim Sμi = lim Vμi = Vμ0 ;
of M. Borel, which can be restated as: We have an infinite30 number of intervals .(ai , bi )
such that every point of .a ≤ x ≤ b is strictly interior to at least one of them. Let us choose
an interval .(ai1 , bi1 ) containing .a in its interior, an interval .(ai2 , bi2 ) containing .bi1 in its
interior, an interval .(ai3 , bi3 ) containing .bi2 , etc. If we do not reach .b in this way, there is a
point, let us call it .biω , which is the limit point of .bi . Let us choose an interval .(aiω+1 , biω+1 )
containing .biω , etc. It is clear that by using .bλ , a chain of intervals covering .(a, b) is defined.
Moreover, it is clear that .(a, bin ) can be covered with a finite number of intervals .(ai , bi );
and the same holds for .(a, biω+1 ) since this interval can be covered with .(aiω+1 , biω+1 ) and
some others, in finite number of the .(ain , bin ). By continuing in this transfinite manner, we
see that .(a, b) can be covered using a finite number of intervals .(ai , bi ).
The transformation of this reasoning yields the conclusion that the interval .(a, b) can
be covered using a finite number of intervals .(ai , bi ); otherwise, there would exist a value
of .x such that .(a, x + h) cannot be covered in this way, although .(a, x − h) could be, for
any small .h > 0. However, this is impossible; because there exists an interval .(ai0 , bi0 )
containing .x, and consequently, .(a, bi0 ) can be covered by this interval along with those
which allow for the covering of .(a, ai0 ); the existence of the value .x is impossible. This is
the reasoning from Sect. 8.2.
Examining this transformation of the reasoning highlights the advantages and disadvan-
tages of each form: the second form is more rapid and appears familiar. It allows us to better
visualise the possible generalisations, and restrictive hypothesis that can be eliminated.
However, it provides only an existence theorem, while the first form offers an operational
procedure for choosing a finite number of .(ai , bi ), fit to cover the entire .(a, b).
M. Borel, right from the beginning, emphasised that his reasoning provided a . regular
procedure. to. effectively determine. the intervals.(ai , bi ). Certainly, we could quibble
on the word effectively; point out that one cannot . effectively . carry out an operation
which comprises infinitely many stages.31 However, following this line of thinking, we
would not effectively determine a function when only a series expansion for it were found.
We can, certainly, refuse to use the term ‘effectively’ in this case; but is this not already a
significant achievement to have solved a problem as well as the one of determining a function
when we have provided the law governing the successive terms in its series expansion?
The chains of intervals are introduced in the search for primitive functions in the fol-
lowing way: Let . f (x) be the derivative of a continuous function . f (x); there exist intervals
.(a0 , a1 ), (a1 , a2 ), . . . in each of which we have an inequality of the form
These intervals, and these inequalities, are the ones which we have considered in the proof
of existence of solutions of the equation . y = f (x). However, we generally put ourselves
in conditions where we can cover the entire considered interval using a finite number of
these intervals .(ai , ai+1 ). If we assume nothing about . f (x), then the intervals .(ai , ai+1 )
are infinite in number and they form a chain of intervals which can always be used for the
approximation of . f (x) through the exact formula
λ<α
. f (x) − f (a) = f (x) − f (aα ) + [ f (aλ+1 ) − f (aλ )].
λ=0
This result is obtained in various chapters of this book using arguments of transfinite induc-
tion. However, it is clear, according to what has been discussed earlier, that we can deduce
it from a reductio ad absurdum argument whenever it is shown that no point .x could be the
last for which the previous inequality holds.
The reader could check the modifications that are required to be brought to our exposition
to rid it from any use of the transfinite numbers and transfinite induction. We will perform
this transformation of reasoning only for the first of the propositions related to derivatives
obtained with using the method of chains, namely32 :
A function is determined up to an additive constant when its derivative is known at every
point. Indeed, let . f (x) = f 1 (x) − f 2 (x) be the difference of two functions . f 1 and . f 2 , which
have the same finite derivative; thus, . f (x) has a zero derivative at any point. Our method of
chains yields
.| f (x) − f (a0 )| ≤ ε(x − a0 ),
32 We refer to Sect. 6.3; at this point, the conditions envisaged are a little more general than those
stated in the text. Here, we are placing ourselves in the simpler conditions.
318 Appendix A: Note. On Transfinite Numbers
(x0 , x0 + h) we have
.
which shows the contradiction in assuming that the considered inequality does not hold for
x > x0 .
.
33 (Jour. de Math., .1915, p. .176). M. Denjoy, however, does not obtain this proof in the same way
as we do here, by transforming a reasoning previously given in the first edition of this book. On the
contrary, he contrasts his method with those presented in the first edition. However, he only refers to
proofs using the theorem of finite increments. He obviously missed that the method of chains, since
it allowed for the search for primitive functions of a given derivative, thereby provided the proof that
these functions were determined up to an additive constant, and that I had used this method of proof
on page .79 of the first edition. However, I had not bothered to show, as I have done in this new edition
(Sect. 6.3), that the method of chains yielded previously established statements; I had primarily used
it to establish new results.
I take this opportunity to mention that I disagree with M. Denjoy and some other authors who
consider reasonings based on the derivatives as entirely different, with some using the notion of
integral, while the others do not, but instead use the Mean Value theorem:
Any theorem based on the integration can be expressed using only the Mean Value theorem. Moreover,
aiming to give more uniformity to this work, I wanted, in Chap. 10, to deduce some theorems obtained
by M. Denjoy using totalisation without resorting to the totalisation, just as I had deduced, some
theorems of integration in Chap. 9. This was quite easy for me and I did not digress significantly from
the proofs of M. Denjoy, as there is very little contradictions between the ideas used in the reasonings
which involve the integration and those which do not.
That, of course, does not mean that I ignore the interest of various forms of proofs, the grand
elegance of some of them, and the progress that comes from using the simplest methods. On the
contrary, it is the comparison of these forms of demonstrations which has led me to the reflections
that I express in the text.
34 See the footnote 14 of the Chap. 8.
Appendix A: Note. On Transfinite Numbers 319
One can argue that .V I I is existence theorem of Cantor - Bendixson, while .V I provides
a transfinite operational method for solving Cantor–Bendixson problem: . given a closed
set . F, decompose it into a countable set . D and a perfect set . P ..
Similarly, in M. Baire’s research, a distinction must be made between M. Baire’s the-
orem: . any function of class one is point-wise discontinuous on every perfect set and
conversely . M. Baire’s problem: . given a point-wise discontinuous function on any
perfect set, find a series of continuous functions whose sum it is .. Nothing stops us from
hoping that we can prove M. Baire’s theorem without using the transfinite methods. However,
the operational method given by M. Baire for solving his problem is inherently transfinite
and it cannot be justified without transfinite methods.
In the M. Denjoy’s research, the statements refer to the operation of totalisation, the very
definition of which is transfinite; we cannot hope to avoid the transfinite methods here.
However, nothing stops us from hoping that we can eventually manage to replace the
operational method of Cantor–Bendixson (statement .V I ), the operational method of M.
Baire, and the totalisation of M. Denjoy with non transfinite methods, while still being able
to solve the problem of Cantor–Bendixson, problem of M. Baire and the problem of primitive
functions.
Indeed, it has been possible to demonstrate the Cantor–Bendixson theorem and Baire’s
theorem without transfinite methods, and the Cantor–Bendixson problem can be solved
without transfinite numbers. Let us see how.
In three fields of researches currently under consideration, transfinite numbers were used
to denote the successive elements of the sequences. These elements are closed sets, all
different from each other, each of them containing those that come after it in the sequence.
We know (Sect. A.4), that such a sequence is either finite or countable. Now, in the questions
under study, the sequence can stop only when we reach a set that contains no isolated points
or is perfect, and proof of the theorem boils down to showing that, under some conditions, the
sequence does not terminate at a perfect set. Thus everything comes down to characterising
the perfect set which could arise through a property of its points.
Here, I will focus only on the Cantor–Bendixson theorem35 ; in this case, it is obvious
that the points of the perfect set . P are characterised by being those for which there are an
uncountably infinite number of points form the set in every neighbourhood.
35 I have given several proofs of M. Baire’s theorem without using the transfinite (Bull. de la Soc.
math. de France, .1904.—Note . I I of the Lessons of M. Borel, Sur les fonctions de variable reelle.-
Journ, de Math., .1905). These proofs mainly relied, in accordance with what is mentioned in the text,
on the definition of a perfect set.
M. E. Lindelof (Acta mathematica, t..29) and myself (in the first edition of this book) almost
simultaneously showed that Cantor–Bendixson’s theorem could be derived from the notion that M.
Lindelof called a condensation point. I believe that it is more commonly referred to as accumulation
point now, and I adopt this terminology.
In my presentation, as I arrived at theorems .V I and .V I I simultaneously, it became unclear that
the demonstration of theorem .V I I was independent of transfinite numbers. Here, I am adopting the
exposition of M. Lindelof, which is preferable over mine in many respects, meaning that I focus only
on theorem .V I I .
320 Appendix A: Note. On Transfinite Numbers
With this insight, let us define an accumulation point of a set . E as any point .x such that
in every interval containing .x in its interior, there exist uncountably many points of . E.36
The Bolzano–Weierstrass theorem suggests the following statement: Every set . E, that
is uncountable in .(a, b) admits accumulation points. Indeed, let .x be the smallest value in
.(a, b) such that there are only at most a countably infinite number of points of. E in.(a, x − h)
and there are uncountably infinite number of points in .(a, x + h) for any .h > 0. Such a point
indeed exists. Now, in .(x − h, x + h) there are an uncountably infinite number of points of
. E; therefore, . x is an accumulation point of . E.
From this proof, it also follows that in .(a, x), there are at most, only a countably infinite
number of points of . E since there are at most only a countably infinite number of points in
. a, x −
1
n , for any .n. Therefore, . x cannot be at .b. There is at least one accumulation point
in the interior of .(a, b).
According to this, an accumulation point cannot be isolated; for if.x was the only accumu-
lation point of . E in .(a, b), there would be at most only a countably infinite number of points
of . E in .(a, x) and in .(x, b), and hence in .(a, b); and .x would not be an accumulation point.
Now, the set of accumulation points is obviously closed, therefore the set of accumulation
points of an uncountable set is perfect.
Let us apply this to a closed set . F. It contains its derivative . F and, consequently, the
perfect set . P of its accumulation points. Therefore, . F is the sum of . P and the set of those
points which are contained in the various intervals .(l, m) contiguous to . P. However, in
each .(l, m) there are only at most a countably infinite number of points of . F, and the
.(l, m) intervals are finite or countable in number. Hence, the set . F − P is countable. The
. F = P + D.
If . E had not been assumed to be closed, we would similarly observe that the set . D of points
of . E that are not accumulation points of . E, is countable. As for the points of . E − D, these
are those accumulation points of . E that belong to . E. They form a set which is not only
everywhere dense over the set . P of accumulation points, but also everywhere accumulated
on . P. By this, we mean that, in every interval .(α, β) containing points of . P in its interior,
there are an uncountably infinite number of points that are common to . E and . P. Otherwise,
. E would indeed be countable in .(α, β), which is absurd.
Therefore, this new method elegantly proves the Cantor—Bendixson theorem and even
generalises it. Moreover, it solves the Cantor - Bendixson’s problem if we adopt a convention
When I provided these methods, I indicate the interest of proofs derived from the transfinite
because they offer regular operational procedures, not only for proving the theorems, but also for
solving the problems (Journ. de Math., .1905, p..183; C.R.Acad. Sc., .1903).
The terms theorem and problem, which so aptly concretise the necessary distinctions, are due to
M. de la Vallée Poussin.
36 I am, therefore, considering the case of sets of points on a straight line, as we have always done so
far. However, the reasoning extends to the .n-dimensional space.
Appendix A: Note. On Transfinite Numbers 321
that determining the accumulation points of a set is one of the operations, we always consider
to be feasible in practice. And this is because we actually know how to perform it frequently.