0% found this document useful (0 votes)
15 views291 pages

Pedometrics in Brazil - Waldir de Carvalho Junior, Helena Saraiva, Koenow Pinheiro, - 2024 - Spring

The document discusses the progress and developments in pedometrics in Brazil, highlighting the growth of studies and the establishment of key initiatives since the 2000s. It outlines the II Pedometrics Brazil event, which gathered experts to share innovative research and methodologies in soil science, particularly focusing on digital soil mapping and soil sensing technologies. The book aims to enhance understanding of soil functioning and promote sustainable soil management practices for future generations.

Uploaded by

leandropaixaoq20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views291 pages

Pedometrics in Brazil - Waldir de Carvalho Junior, Helena Saraiva, Koenow Pinheiro, - 2024 - Spring

The document discusses the progress and developments in pedometrics in Brazil, highlighting the growth of studies and the establishment of key initiatives since the 2000s. It outlines the II Pedometrics Brazil event, which gathered experts to share innovative research and methodologies in soil science, particularly focusing on digital soil mapping and soil sensing technologies. The book aims to enhance understanding of soil functioning and promote sustainable soil management practices for future generations.

Uploaded by

leandropaixaoq20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 291

Progress in Soil Science

Waldir de Carvalho Junior


Helena Saraiva Koenow Pinheiro
Marcos Bacis Ceddia
Gustavo Souza Valladares Editors

Pedometrics
in Brazil
Progress in Soil Science

Series Editors
Alfred E Hartemink, Soil Science, University of Wisconsin, Madison, WI, USA
Alex B. McBratney, Sydney Institute of Agriculture School of Life and Environ-
mental Sciences, The University of Sydney, Sydney, NSW, Australia
Progress in Soil Science series publishes books that contain novel approaches in soil
science in its broadest sense – books in the series should focus on true progress in
a particular area of the soil science discipline. The scope of the series is to publish
books that enhance the understanding of the functioning and diversity of soils in all
parts of the globe. The series includes multidisciplinary approaches to soil studies
and welcomes contributions of all soil science subdisciplines. Key themes: soil
science - soil genesis, geography and classification - soil chemistry, soil physics,
soil biology, soil mineralogy - soil fertility and plant nutrition - soil and water
conservation - pedometrics - digital soil mapping - proximal soil sensing - soils
and land use change - global soil change - natural resources and the environment.
Submit a proposal: Proposals for the series will be considered by the Series Editors.
An initial author/editor questionnaire and instructions for authors can be obtained
from the Publisher, Dr. Robert K. Doe ([email protected]).
Waldir de Carvalho Junior . Helena Saraiva
Koenow Pinheiro . Marcos Bacis Ceddia .
Gustavo Souza Valladares
Editors

Pedometrics in Brazil
Editors
Waldir de Carvalho Junior Helena Saraiva Koenow Pinheiro
Embrapa Solos Federal Rural University of Rio de Janeiro
Brazilian Agricultural Research Corporation Seropédica, Brazil
Rio de Janeiro, Brazil
Gustavo Souza Valladares
Marcos Bacis Ceddia Federal University of Piauí
AgroTecnologies and Sustainability Teresina, Brazil
Federal Rural University of Rio de Janeiro
Seropédica, Brazil

ISSN 2352-4774 ISSN 2352-4782 (electronic)


Progress in Soil Science
ISBN 978-3-031-64578-5 ISBN 978-3-031-64579-2 (eBook)
https://doi.org/10.1007/978-3-031-64579-2

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.


Preface

The knowledge construction and transference require organization and appropriated


skills. In Pedometrics, it is no different and the organization of knowledge is one of
the steps for this purpose.
In Brazil, studies based on pedometrics concepts and tools have shown remark-
able growth since the 2000s. The Second Global Workshop on Digital Soil Mapping
(DSM), held in 2006, started this trajectory at the country. Then, in 2009, the
Brazilian Society of Soil Science created the Pedometric Commission and, in the
same period, the Brazilian Network for Digital Soil Mapping (DSM Network) was
also created. As a result of a process of resumption of the pedological systematical
surveys, the National Soil Survey Program (PRONASOLOS) was created, which
has as its premises the use of DSM to create and propagate knowledge about the
Brazilian soils.
Lately, with the aim of supporting the dissemination of scientific and techno-
logical production on methodological procedures for pedometrics and digital soil
mapping in the tropics, the group of pedometricians from the Brazilian Society of
Soil Science proposed to hold an event that provides space to discuss ideas and
procedures building knowledge and making it available.
This is how the II Pedometrics Brazil was conceived and executed in November
2021. The event was attended by 307 participants from different nationalities,
including undergraduate and graduate students, professors, researchers and other
professionals linked to soil science.
The event brought together speakers and innovative research in four thematic
sessions, complementary to each other, namely: PEDOMETRICS: INNOVATION
IN TROPICS; LEGACY DATA: HOW TURN IT USEFUL?; ADVANCES IN
SOIL SENSING and PEDOMETRICS GUIDELINES TO SYSTEMATIC SOIL
SURVEY: TROPICAL STUDY CASES.
The thematic sessions gathered 51 unpublished abstracts, among which the best
evaluated became chapters of this book. The ordering of the chapters brings a flow
of ideas and knowledge, leading the reader to better understand the techniques and
procedures used by the pedometricians in Brazil.

v
vi Preface

The opening chapters present innovations regarding the use of machine learning
to model soil physical and chemical attributes, also soil-landscape relationship and
the use of innovative sensors. Soil-landscape relationship based on geomorphome-
tric concepts with multiscale approach is also discussed in the first chapter. Soil
physical attributes such as texture and water content were discussed in subsequent
chapters. Issues related to spatial dependence were also addressed, as well as the use
of electrical conductivity sensors and computerized 3D images. A digital platform to
the soil science community, which allows not only the search, upload and download
of soil data but also the creation and management of projects, is presented.
In sequence, chapters addressing the use of legacy data in pedometrics are
presented. Discussions focused on how to use, standardize, perform exploratory
analysis from these datasets are addressed, but also the procedures to model soil-
key properties, as soil total carbon, for example.
The innovations regarding soil sensing, both proximal and remote, are presented
in chapters “Predicting and Mapping of Soil Carbon and Nitrogen Stocks by
Diffuse Reflectance Spectroscopy and Magnetic Susceptibility in Western Plateau
of São Paulo, Iron Rods as Markers for Soil Horizon Depths and Point Scatterers
for Estimating Pulse Velocity in GPR Imagery, Random Forest-Based Fusion of
Proximal and Orbital Remote Sensor Data for Soil Salinity Mapping in a Brazilian
Semi-arid Region, The Particle Size Causes a Change in the Determination of Soil
Color Via the Nix Pro 2 Sensor, Mapping Soil Salinity: A Case Study from Marajó
Island, Brazilian Amazonia and Applied Morphometry to Digital Soil Mapping in
Detailed Scale”, which include proximal sensors, sensors in the MID-IR range and
their spectra, radar sensors, sensor fusion (proximal and remote) for soil mapping
and the measurement of soil color with proximal sensors. In these chapters, technical
and procedural innovations in the use of soil sensing are addressed through study
cases.
The last chapters bring examples and methodological procedures to DSM,
including the provision of databases and R scripts.
We hope that readers enjoy all the experiences contained in this book and that
they can use the presented examples wisely, to better understand the world we live
in, aiming the monitoring and the sustainable use of our soils, for food security
and preservation of natural resources, leaving for future populations a healthy and
balanced world.

Rio de Janeiro, Brazil Waldir de Carvalho Junior


Seropédica, Brazil Helena Saraiva Koenow Pinheiro
Seropédica, Brazil Marcos Bacis Ceddia
Teresina, Brazil Gustavo Souza Valladares
Contents

MultiSoils: A Digital Platform for Information Search


and Project Management in Soil Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Marcos Bacis Ceddia, Erika Flávia Machado Pinheiro,
João Pedro Larangeira, Jorge de Abreu Soares, Renato Campos Mauro,
Diego Nunes Brandão, Lúcia Helena Cunha dos Anjos,
Anderson Nascimento Manhães, André Luiz Coutinho Merlo,
Edson Landim de Almeida, Frederico Santos Machado,
and Jorge Eduardo Santos Paes
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The Initiative, its Assumptions, and the architecture
of the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The rationale of MultiSoils platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 The References and Manuals Adopted by the Platform . . . . . . . . . . . . . 5
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 The Utilities of the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Multiscalar Geomorphometric Generalization to Delineate Soil
Textural Patterns on Amazon Watersheds Landscapes . . . . . . . . . . . . . . . . . . . . . . 15
Cauan Ferreira Araújo, Raimundo Cosme de Oliveira Jr,
and Troy Patrick Beldini
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Environmental Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Multiscale Geomorphometric Generalization . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Soil Sampling and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Modeling by Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vii
viii Contents

4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Applying Machine Learning Techniques to Model and Map Soil
Surface Texture Using Limited Legacy Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Luís Flávio Pereira, Cássio Marques Moquedace,
Gabriel Phelipe Nascimento Rosolem, Maria da Conceição de Sousa,
Márcio Rocha Francelino, and Elpídio Inácio Fernandes-Filho
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Soil Texture Samples and Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Selecting Predictors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Modeling and Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Predicting Soil Physical-Hydric Attributes Based on Pedotransfer
Functions and Algorithms for Quantitative Pedology . . . . . . . . . . . . . . . . . . . . . . . 47
Priscilla Azevedo dos Santos, Helena Saraiva Koenow Pinheiro, Waldir de
Carvalho Junior, Nilson Rendeiro Pereira, Silvio Barge Bhering, and Igor
Leite da Silva
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.1 Study Site and Hydrological Featuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Soil Database and Infiltration Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Modeling Based on Pedotransfer Functions for Soil
Physical-Hydraulic Attributes Determination . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3 Results and Discussions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Conclusions and Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Spatial Dependence of Organic Carbon and Granulometry
in Archaeological Soils of Lagoa Grande das Queimadas,
Northeastern Brazil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Rabech Grasiely Gomes Marques, Miguel Alvores Lima Neto,
Gustavo Souza Valladares, Demétrio da Silva Mützenberg,
and Aline Gonçalves de Freitas
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2 Geological, Environmental and Archaeological Background . . . . . . . . . . . . . . . 64
3 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Contents ix

Application of Electrical Conductivity Profiling for


the Characterization and Textural Discretization of a Technosol . . . . . . . . . . 75
Alexandre Muselli Barbosa and Camila Camolesi Guimarães
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.1 Soil Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2 Soil Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3 Electrical Conductivity Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1 Soil Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Electrical Conductivity Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Soil Porosity Differences Among Grass-Covered and Exposed
Soils Measured by High Resolution X-Ray Computed
Microtomography (microCT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Marcelo Wermelinger Lemes, Alessandra Silveira Machado,
Gustavo Mattos Vasques, Hugo Machado Rodrigues,
Ricardo Tadeu Lopes, and Reiner Olíbano Rosas
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Using Legacy Soil Data to Plan New Data Collection: Study Case
of Rio de Janeiro State: Brazil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Elias Mendes Costa, Hugo Machado Rodrigues, Ana Carolina de Ferreira,
Marcos Bacis Ceddia, and Douglath Alves Corrêa Fernandes
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.1 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.2 Soil Survey in the State of Rio de Janeiro . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.3 Environmental Covariates and Dissimilarity Index . . . . . . . . . . . . . . . . . . 103
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
x Contents

Exploratory Analysis from Harmonized Legacy Soil Data


to Support Digital Soil Mapping in Brazilian Midwest . . . . . . . . . . . . . . . . . . . . . . 115
Waldir de Carvalho Junior, Nilson Rendeiro Pereira,
Silvio Barge Bhering, Braz Calderano Filho, Cesar da Silva Chagas,
Helena Saraiva Koenow Pinheiro, José Ronaldo Pereira,
Carlos Henrique Lemos Lopes, and Renan Borges Leal
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Soil Organic Carbon Stock Estimation Using Legacy Data: A
Case Study of North Fluminense Region—BR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Marcos Bacis Ceddia, Hugo Machado Rodrigues,
Ana Carolina de Souza Ferreira, Elias Mendes Costa,
Érika Flávia Machado Pinheiro, and Douglath Alves Corrêa Fernandes
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.1 Study Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.2 The PROJIR Soil Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.3 Calculation of Soil Organic Carbon Stock—SOCS . . . . . . . . . . . . . . . . . . 132
2.4 Soil Particle Size Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.5 Covariates Tested to Predict SOCS Using ML Techniques . . . . . . . . . . 134
2.6 Digital Soil Mapping Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.7 Geostatistical Approach—Ordinary Kriging (OK) . . . . . . . . . . . . . . . . . . 139
2.8 The Semivariogram and Its Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.9 The Ordinary Kriging (OK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.10 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.11 Model 1—All covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.12 Model 2—Recursive Feature Elimination—RFE . . . . . . . . . . . . . . . . . . . . 142
2.13 Model 3—Expert Knowledge (EK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.14 Evaluation of the Accuracy of the Maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.15 Evaluation of the Relative Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.1 Descriptive Statistics of Organic Carbon Stock Data . . . . . . . . . . . . . . . . 144
3.2 Geostatistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.3 Random Forest Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Aerogeophysical Data to Modeling Soil Properties: A Study Case
in Bom Jardim—RJ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Blenda Pereira Bastos, Helena Saraiva Koenow Pinheiro,
Waldir de Carvalho Junior, and Lúcia Helena Cunha dos Anjos
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Contents xi

2 Site Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160


3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Predicting and Mapping of Soil Carbon and Nitrogen Stocks
by Diffuse Reflectance Spectroscopy and Magnetic Susceptibility
in Western Plateau of São Paulo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Angélica Santos Rabelo de Souza Bahia and José Marques Júnior
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Iron Rods as Markers for Soil Horizon Depths and Point
Scatterers for Estimating Pulse Velocity in GPR Imagery . . . . . . . . . . . . . . . . . . 185
Carlos Wagner Rodrigues do Nascimento, Marcos Bacis Ceddia,
Gustavo Mattos Vasques, Hugo Machado Rodrigues,
Ronaldo Pereira de Oliveira, and Saulo Siqueira Martins
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Random Forest-Based Fusion of Proximal and Orbital Remote
Sensor Data for Soil Salinity Mapping in a Brazilian Semi-arid
Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Silvio R. L. Tavares, Gustavo M. Vasques, Ronaldo P. Oliveira,
Marlon M. Dantas, and Hugo M. Rodrigues
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
2 Soil Salinization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
3 Monitoring Soil Salinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5 Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6 Proximal and Remote Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
xii Contents

The Particle Size Causes a Change in the Determination of Soil


Color Via the Nix Pro 2 Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Viviane Flaviana Condé, Thays Vieira Bueno, Jéssica Ribeiro Oliveira,
Anifo Soares Mamudo Ibraimo, Marcio Rocha Francelino,
and Elpídio Inácio Fernandes-Filho
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Mapping Soil Salinity: A Case Study from Marajó Island,
Brazilian Amazonia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Renata Jordan Henriques, Fábio Soares de Oliveira,
Carlos Ernesto Gonçalves Reynaud Schaefer, Márcio Rocha Francelino,
Eduardo Osório Senra, Valéria Ramos Lourenço, David Lukas de Arruda,
and Paulo Roberto Canto Lopes
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Applied Morphometry to Digital Soil Mapping in Detailed Scale . . . . . . . . . . 235
Gustavo Souza Valladares, Waldir de Carvalho Junior,
and Helena Saraiva Koenow Pinheiro
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Prediction of Soil Carbon Stock in the PIAUI State Coast
by Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Mirya G. T. Portela, Gustavo S. Valladares, Marcos G. Pereira,
Léya J. R. S. Cabral, João V. A. Amorim, and Giovana M. de Espindola
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
2.1 Characterization of the Study Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
2.2 Soil Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
2.3 Determination of Total Soil Carbon Stocks . . . . . . . . . . . . . . . . . . . . . . . . . . 248
2.4 Remote Sensing Covariate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
2.5 Predictive Methods Evaluated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Contents xiii

Methods and Challenges in Digital Soil Mapping: Applied


Modelling with R Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Elpídio Inácio Fernandes-Filho, Cássio Marques Moquedace,
Luís Flávio Pereira, Gustavo Vieira Veloso, and Waldir de Carvalho Junior
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
2 Soil Sampling for DSM Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
2.1 Legacy Data Used in This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
3.1 Data Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
3.2 Defining Covariates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
3.3 An Example of Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
4 Selection of Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
4.1 Variance-Based Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
4.2 Redundancy-Based Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
4.3 Importance-Based Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
4.4 An Example of Predictors Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
5 Modelling, Prediction and Predictors Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
5.1 An Example of Model Fitting and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 275
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
MultiSoils: A Digital Platform for
Information Search and Project
Management in Soil Science

Marcos Bacis Ceddia , Erika Flávia Machado Pinheiro,


João Pedro Larangeira, Jorge de Abreu Soares, Renato Campos Mauro,
Diego Nunes Brandão, Lúcia Helena Cunha dos Anjos,
Anderson Nascimento Manhães, André Luiz Coutinho Merlo,
Edson Landim de Almeida, Frederico Santos Machado,
and Jorge Eduardo Santos Paes

1 Introduction

The Brazilian territory has continental dimensions (8,510,345.540 km2 ) with a


great diversity of biomes and represents a significant challenge for the survey and
mapping of soils. Despite the importance of soil information for territorial planning
and management, data and maps are mostly generated at a low-detailed scale
(1:250,000 or smaller). In some regions (associated with large centers or specific and
located demands), there are more detailed data and maps (greater than 1:100.00). In
addition, the little that exists is not easily accessible to society and does not gather,
in a single database management system (DBMS), the wide range of soil infor-

M. B. Ceddia (✉) · E. F. M. Pinheiro


Department of AgroTechnologies and Sustainability (DATS), Institute of Agronomy, Federal
Rural University of Rio de Janeiro (UFRRJ), Seropédica, Brazil
J. P. Larangeira
Graduate Program in Agronomy-Soil Science (PPGA-CS – UFRRJ). Federal Rural University of
Rio de Janeiro (UFRRJ), Seropédica, Brazil
J. de Abreu Soares · R. C. Mauro · D. N. Brandão
Federal Center for Technological Education Celso Suckow da Fonseca (CEFET-RJ). R. Gen.
Canabarro, Rio de Janeiro, Brazil
L. H. C. dos Anjos
Soils Department, Agronomy Institute, Federal Rural University of Rio de Janeiro (UFRRJ),
Seropédica, RJ, Brazil
A. N. Manhães · A. L. C. Merlo · E. L. de Almeida
Graduate Program in Computer Science (PPCIC), (CEFET-RJ). R. Gen. Canabarro, Rio de
Janeiro, RJ, Brazil
F. S. Machado · J. E. S. Paes
Petrobras Research Center (CENPES. Av. Horácio Macedo, 950 – Cidade Universitária, Rio de
Janeiro, RJ, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_1
2 M. B. Ceddia et al.

Table 1 Types of databases developed by soil scientists and institutions in Brazil


Initiatives Classification/Language Available data
BDSolos (EMBRAPA) DBMS https://www.bdsolos.cnptia.embrapa.br/
PostgreSQL
BDiA (IBGE) Repository https://bdiaweb.ibge.gov.br/#/consulta/
pedologia
febr Repository https://www.pedometria.org/soildata/
Hybras (CPRM) DBMS https://www.sgb.gov.br/en/Hydrology/
Microsoft Access 2007 HYBRAS
format.
A data repository, often referred to as a data file or data library, is generic terminology referring to
a segmented dataset used for reporting or analysis. A DBMS is a database management system, a
set of software for creating, editing, storing, and retrieving data in tables

mation (general and morphological description of soil observations, characteristics


physical, chemical, heavy metals, hydrocarbons, radionuclides, hydropedological
analysis, among others). Also noteworthy is the lack of a platform that allows the
planning and management of projects in soil science, integrating all project stages
(planning, collection and analysis of soil, and generation of reports).
Table 1 presents the main Brazilian initiatives of soil scientists and institutions
to organize soil data. BDSolos is a relational DBMS that stands out for being
the oldest, gathers the greatest diversity of data, and allows the consultation of
soil profile information (Simões, 2015). The IBGE’s Environmental Information
Database (BDiA) was launched in 2018 and brought together the collection of
thematic cartographic databases produced by the IBGE over the last 20 years,
based on fieldwork of the natural resources mapping project, also incorporating
the database of the RADAMBRASIL project, who carried out the environmental
survey in the 1970s and 80 s. The BDiAWeb portal, a platform for viewing and
consulting the Environmental Information Bank, brings together the collection
of thematic databases of natural resources in the national territory, adjusted to
a scale of 1:250,000, produced by the IBGE within the scope of the Mapping
of Natural Resources project. These data were produced in four thematic areas:
Geology, Geomorphology, Pedology, and Vegetation (“BDIA – Banco de Dados
de Informações Ambientais,” n.d.). The Free Brazilian Repository for Open Soil
Data – febr is a centralized repository of soil information that allows both the search
for data and the download of results (Samuel-Rosa et al., 2018). Hybras is a database
focused on hydrophysical data of soils in Brazil (Ottoni et al., 2018).
In this context, the MultiSoils platform (www.multisoils.org) is presented. The
term MultiSoils expresses the notion of a diversity of soils, their functions, and
dimensions. The purpose of this work is to present the MultiSoils Platform to the soil
science community, which allows not only the search, upload, and download of soil
data but also the creation and management of projects in the following areas: digital
agriculture, contaminant assessment and environmental risk, soil science education,
hydropedological studies, soil survey and mapping, soil moisture monitoring, and
MultiSoils: A Digital Platform for Information Search and Project Management. . . 3

radionuclide studies. The platform offers solutions for the following common issues
associated with soil data information:
1. Limitations about metadata descriptors and no public interfaces to allow the
insertion of new data. Poor interfaces to query data;
2. Disperse datasets organized in spreadsheets/or specific databases with few soil
attributes (data silos), which hampers the global view of soil information and the
relationships between attributes and soil types, as well as those with other areas
of knowledge;
3. The data provenance and its curation – The users must check not only whether
the data is repeated but also whether it is valid for use;
4. The data available is not open – The available data fails to fulfill the eight Open
Data and the FAIR principles;
5. Lackness of an efficient system to collect new data in situ. There are no Apps for
data collection in the field nor to integrate them with new soil proximal sensors;
6. Lackness of an efficient system to manage projects in diverse areas of soil
science;

2 Methodology

2.1 The Initiative, its Assumptions, and the architecture


of the Platform

The MultiSoils platform is an initiative of professors from the Federal Rural Uni-
versity of Rio de Janeiro (Dept. of AgroTechnologies and Sustainability and Dept.
of Soils- Institute of Agronomy-UFRRJ) and from the Federal Center for Techno-
logical Education Celso Suckow da Fonseca (CEFET-RJ), in partnership with the
Leopoldo Américo Miguez de Mello Research Center (CENPES-Petrobrás).
The fundamental assumption of the platform is to offer a public and collaborative
space for consultation and project management in the field of soil science. The
MultiSoils platform is free for users with a collaborative purpose (providing data
to make available on the platform’s public panel). Once registered on the platform,
users will be able to create and manage projects in the following areas: (a) Soil
survey and mapping; (b) Assessment of soil contaminants and environmental risk;
(c) Digital agriculture; (d) Hydropedological studies; (e) Radionuclide inventories;
(f) Soil moisture monitoring in real time; (g) Education in soil science and; (h) Rural
extension. The platform offers a robust tool to search for soil data and create and
manage projects in real time (integrating activities with apps in the field, on the
web, and in analysis laboratories). To meet this proposal, so far, the platform was
structured to gather in a single environment a great diversity of soil data (general and
morphological description, chemical, physical, and hydraulic properties of the soil,
heavy metals and micronutrients, mineralogical analysis, paste extract saturated,
4 M. B. Ceddia et al.

Fig. 1 An overview of the MultiSoils Platform architecture

spectral data, hydrocarbons, radionuclides, and soil moisture monitoring in real-


time).
The MultiSoils platform was developed in PHP and JavaScript using Laravel
Framework version 10.20.0. The database was implemented in PostgreSQL version
15.4. The system is hosted in the AWS Cloud platform. To help with activities
carried out in the field, a mobile version of the system has been developed for
Android and iOS-based operating systems. Figure 1 presents an overview of the
MultiSoils Platform architecture.

2.2 The rationale of MultiSoils platform

The MultiSoils platform provides users with two interaction interfaces: via the
web and an app. On the web, the user can perform the following functionalities:
(a) Register; (b) Data query; (c) Project creation and management; (d) Plan field
activities; (e) Monitor field activities and laboratory analyses; (f) Generate reports
and; g) Upload and Download data.
Through the app, the user can perform the following activities: (a) Consult online
the data of public projects on the platform; (b) Make the general and morphological
description of soil observations (allows working offline); (c) Send the descriptions
for the project registered on the web;
MultiSoils: A Digital Platform for Information Search and Project Management. . . 5

2.3 The References and Manuals Adopted by the Platform

All criteria adopted on the platform (attributes of the general description, morphol-
ogy, and soil analyses) are adapted from manuals widely referenced in the literature.
The source documents used: (1) Soil Description and Collection Manual in the Field
(Santos et al., 2015); (2) Pedology Technical Manual. (IBGE, 2007); (3) Brazilian
Soil Classification System (Santos, 2018). (4) Munsell Soil Color Chart (Munsell
Color (Firm), 2010); (5) Guidelines for Soil Profile Descriptions (Guidelines for
soil description, 2006); (6) Soil Survey Manual (“Soil Survey Manual 2017,” n.d.).
(7) Soil Analysis Methods Handbook (Teixeira et al., 2017).

3 Results and Discussion

3.1 The Utilities of the Platform

Figure 2 shows the first screen of the web platform. Registration on the MultiSoils
platform is required. When registering, personal information will be requested,
some mandatory (name, individual taxpayer registration – cpf in portuguese, and
e-mail) and others optional (function, education, ORCID, and gender). All this
information will be used by the MultiSoils platform to authenticate and maintain
the user’s activity history. It will not be shared with any third-party organization
except with the user’s express consent.
After registration, the user will have access to the platform’s central panel
(DASHBOARD) (Fig. 3). In this step, the user will have an access level called Basic
User. The basic user is anyone who only searches for soil data and information.

Fig. 2 MultiSoils platform entry screens


6 M. B. Ceddia et al.

Fig. 3 Searching soil information using the Public Project’s panel in the MultiSoils Web Portal

The platform provides registration and access interfaces that allow advanced search
for soil data and information already available to the public. In this case (Basic
User), data considered to be in the public domain will be freely available (Fig. 3).
As an example, the basic user will be able to download the field App (only for
consultation) and use the web platform to search for information on soils that have
been published. Examples of consultation are the following: (1) Search for soils and
attributes by project registered on the platform; (2) Search for soils and attributes
MultiSoils: A Digital Platform for Information Search and Project Management. . . 7

by country (3) Search for soils and attributes by State of the Federation (4) Search
for soils and attributes by municipality (5) Soil search by search radius (Fig. 3a);
These functions are available on the web platform and in the field App. In the case
of searches via the web, the user will be able to see all soil observations of each
project (soil profiles and soil boreholes), generate reports, and export the searched
data via files in rtf, pdf, and Excel format.
The MultiSoils platform also distinguishes two more user categories (advanced
and intermediate users). Advanced user refers to project coordinators. These users
(students, researchers, professors, and professionals in general, working for public
or private companies) use the platform to conduct projects and now have access to
all the platform’s functionalities. To become an Advanced User, the user requests
a change of category and signs a specific disclaimer. When becoming an advanced
user, the coordinator has access to the “Projects” and “Control Room” tabs (Figs. 4
and 5).
In the “Projects” panel, the project coordinator can create a research project (Fig.
4a). At this stage, the coordinator fills in information such as: (a) the characteristics
of the project (nature, title, objectives, scale, coverage area, start and end date, and
whether it will become public or private); (b) Participating institutions; (c) project
members; (d) project locations (state and municipalities covered) and; (e) attribute
files. In attribute files, the coordinator can customize his project by choosing
which analyzes will be edited in his project, such as General and morphological
description, physical analyses, chemical analyses, micronutrients and heavy metals,
spectral data, sulfuric attack, saturated paste extract, hydraulic attributes (soil
water retention, model parameters fitted to water retention curves and hydraulic
conductivity), radionuclides, hydrocarbons, and soil moisture monitoring (Fig. 4b).
A coordinator can have more than one project and, in each project, include several
team members. These team members are considered the intermediate users, who
can enter field and laboratory data (using field and web platform Applications).
In this case, unlike the advanced user, he is not allowed to delete project data or
register new project team members. After creating the projects, the coordinator will
have access to the “control room” panel, where it will be possible to view all the
projects coordinated (Fig. 5a). In this panel, the coordinator and team can insert soil
observations (whether from legacy data or a new project being started). The project
may refer to the digitization of a project already carried out in the past (legacy data)
or a new project being started. In this second case, the coordinator and his team can
only insert the identification code and coordinates and part of the general description
of each observation via the web. These new observations will appear with the status
“planned point” (green pins), as seen in Fig. 5b. These planned points can be edited
in the field (through the app – MultiSoils) and/or on the web. When editing the
planned points is started, the color of the pin changes to orange. This functionality
allows the coordinator to remotely monitor the project’s performance in real-time
(Fig. 5b). On the web, the coordinator and his team can generate reports of each
observation (according to the Brazilian Society of Soil Science rules – rtf and pdf
format) and download data in rtf, pdf, and Excel format (Fig. 5b).
8 M. B. Ceddia et al.

Fig. 4 The Project panel to customize a research project using the MultiSoils Platform. (a)
Creating a project. (b) Selecting the attribute files

Figure 6 shows the screen containing the general description of a soil observation.
On this screen, the coordinator and his team can spatially visualize the observation
together with the chosen profile picture (editable – Fig. 6a). On this screen, all
fields of the general description of the soil are organized (according to Santos et
al., 2015; IBGE, 2007), offering bars for choosing predefined fields or editing. It is
also possible to insert photos and videos on this screen (Fig. 6b). It is essential to
highlight that the entire general description can be filled in in the field through the
MultiSoils app. In this case, all fields of a soil observation will already be filled in
and can be edited on the web.
In the sequence (Fig. 7a), the user can visualize and edit the horizons and
layers of each observation (soil profile and hole). By opening the icon of each
MultiSoils: A Digital Platform for Information Search and Project Management. . . 9

Fig. 5 The “Control Room” panel for managing projects, editing data and tracking activities. (a)
Project visualization by a coordinator. (b) Viewing the status of soil observations (planned, in
progress, and finished)

horizon/layer, the user can visualize and edit the morphological description and
all the analyses planned for that specific project. It should be noted that the
morphological description can be edited on the web when dealing with a project
with legacy data. In the case of new projects, the coordinator and his/her team can
enter data in the field using the MultiSoils app and via the web. As much for the
10 M. B. Ceddia et al.

Fig. 6 Viewing and editing the general description of soil observation. (a) Spatial visualization of
a soil observation. (b) Viewing all fields of the “general description” (including photos and videos)

general description as for the morphological description and all the soil analyses,
the platform offers Batch files for uploading data (Fig. 7b).
Figure 8 shows the basic screens of the MultiSoils app. The app was developed
to carry out three functions in the same environment: (1) Search and view soil
data in public projects registered on the MultiSoils platform; (2) Viewing, edit,
and sending the general and morphological description of planned soil observations
to the MultiSoils platform database; (3) Insertion, editing and sending the general
and morphological description of a new observation in a project registered on the
platform. Figure 8a, b show the first screens of the app. When opening the app, the
user will have access to the following icons: (a) public projects; (b) my projects; (c)
help; (d) who we are; and (e) terms of use. When accessing public projects, the user
can visualize their location with two options of background maps (physiographic
with relief or satellite image).
MultiSoils: A Digital Platform for Information Search and Project Management. . . 11

Fig. 7 Viewing and editing horizons and layers in the MultiSoils platform. (a) Viewing and editing
horizons and layers. (b) Viewing and editing morphological description and soil analysis

On this screen, while connected to the internet, the user can click on the search
indicator and make queries using the following pre-established options: (a) by type
of project. (b) by project title; (c) by federation unit; (d) by municipality; (e) by
search radius; and (f) by soil order (Fig. 8c). In addition to searches, the app offers
the user explanatory material about the 13 soil orders of the Brazilian classification
system (order definitions, territorial expression, and photos). When accessing the
“My Projects” icon, the user can view the list of projects he/she coordinates or
participates in (Fig. 8d). It is important to note that this screen only allows viewing
and editing of projects registered on the platform under the “non-public projects”
category. Once selecting a project, the user can view the soil observations planned
12 M. B. Ceddia et al.

Fig. 8 Some screenshots of the MultiSoils app. (a) First Screen (part 1). (b) First Screen (part
2). (c) Public Projects Screen. (d) My Projects Screen. (e) Selecting a project. (f) Visualizing the
observations. (g) Selecting an observation. (h) General Description. (i) Morphological Description
MultiSoils: A Digital Platform for Information Search and Project Management. . . 13

to be carried out in the field (Fig. 8e). The user can select a planned observation (by
touching the mobile device screen or choosing from the observations menu). The
user can also create a new observation (extra observation) and proceed to editing
(Fig. 8e). The user can enlarge the view of the soil observations and choose the
point to edit (both using the mobile device screen – Fig. 8f and accessing the
list of descriptions to be carried out – Fig. 8g). When starting editing, the app
will lead the user to sequential screens where it will be possible to edit both the
general description of the observation (Fig. 8h) and the morphological description
of horizons and layers (Fig. 8i). In the final part of the general description, the user
will be able to insert photos and videos of the profile and the landscape to improve
the understanding of the environment and the characteristics of the soil. At the end
of the morphological description, the user can check all the description screens and
send the observation to the web platform. It is important to note that all fieldwork on
the “My Projects” screen can be conducted without internet access (offline). Thus,
the user can work remotely and send observations when he accesses the internet.
The app stores the observations described in the offline condition.

4 Conclusions

The MultiSoils Platform is ready to be made available to the soil science community
in Brazil (www.multisoils.org). The proposal of the developers (UFRRJ/CEFET-
RJ/PETROBRAS) is to make its use free and within a proposal of cooperation with
users and respective institutions.
The field app is available on Android and iOS platforms and can be downloaded
for free. Like every digital platform, the system is in continuous improvement.
Other modules are being developed, and we are open to cooperative interaction with
national and international researchers and institutions.
The platform can be widely used to support the Pronasolos program.

Acknowledgments The authors are grateful for the financial support of Petrobras within the scope
of the Project entitled: Digital Mapping of Soils in Oil and Gas Exploration and Production Areas –
Case Studies of Fields in the North and Northeast of Brazil (Contract UFRRJ/Fapur/Petrobras,
contract number 5850.0105881.17.9).

References

BDIA. (n.d.). Banco de Dados de Informações Ambientais [WWW Document]. https://


bdiaweb.ibge.gov.br/#/consulta/pedologia. Accessed 23 Aug 2024.
dos Santos, H.G. (2018). Sistema brasileiro de classificação de solos, 5a edição revista e ampliada.
ed. Embrapa, Brasília.
Guidelines for soil description, 4th ed. (2006). Food and Agriculture Organization of the United
Nations.
14 M. B. Ceddia et al.

IBGE. (2007). Manual técnico de pedologia / IBGE. Coordenação de Recursos Naturais e Estudos
Ambientais (p. 320).
Munsell Color (Firm). (2010). Munsell soil color charts: with genuine Munsell color chips. 2009
year revised. Munsell Color.
Ottoni, M. V., Ottoni Filho, T. B., Schaap, M. G., Lopes-Assad, M. L. R. C., & Rotunno Filho, O.
C. (2018). Hydrophysical Database for Brazilian Soils (HYBRAS) and Pedotransfer functions
for water retention. Vadose Zone Journal, 17, 170095. https://doi.org/10.2136/vzj2017.05.0095
Samuel-Rosa, A., Gubiani, P., Ribeiro, E., Ottoni, M., Medeiros, P., Reichert, J. M., Silva Siqueira,
D., Dotto, A., Collier, L., Valladares, G., Pedron, F., Pedroso, J., Filippini, J., Oliveira, R.,
Caviglione, J. H., Miguel, P., Lepsch, I., Gris, D., Rosin, N., & Vasques, G. (2018). Bringing
together Brazilian soil scientists to share soil data.
Santos, R. D., et al. (2015). Manual de Descrição e Coleta de Solo no Campo. 7. Edição Revisada
e Ampliada: Sociedade Brasileira de Ciência do Solo (SBCS). 102 p.
Simões, M. G. (2015). Democratização da informação de solos do brasil: geoportal e banco de
dados de solos com acesso via web 32.
Soil Survey Manual. (n.d.). Soil Science Division Staff. United States Department of Agriculture.
Agriculture Handbook No. 18. Issued March 2017 and Minor Amendments February 2018.
Teixeira, P. C., Donagemma, G. K., Fontana, A., & Teixeira, W. G. (2017) Manual de Métodos de
Análise de Solo.
Multiscalar Geomorphometric
Generalization to Delineate Soil Textural
Patterns on Amazon Watersheds
Landscapes

Cauan Ferreira Araújo, Raimundo Cosme de Oliveira Jr,


and Troy Patrick Beldini

1 Introduction

The soils of the Amazon basin are predominantly formed by strong chemical
weathering processes typical of a warm and moist climate acting on a sedimentary
geological basement over a long time of stability in environmental conditions
(Schaefer et al., 2017). As a result, the long-term pedological processes are the ones
that are most spatially distributed, with emphasis on ferralitization. However, other
processes can become prevalent or significant and determine important changes
in a given portion of soil under specific environmental conditions, like gleization,
elutriation and clay translocation (Kämpf & Curi, 2012). Therefore, in order to know
the patterns of soil class distribution, it is necessary to identify and delineate the
preferred environments for the occurrence of textural gradients.
To achieve the task of producing digital soil maps of the Amazon region at a high
level of resolution, the pedological processes in each pedoenvironment need to be
modeled at a level of detail that could be useful for a low-cost soil survey approach,
resulting in soil class maps at the subgroup categorical level. Furthermore, this
pathway of digital soil mapping contributes to a better pedogenetic understanding
of the soil in the landscape, which is a significant challenge to soil science in a
pedometric context (Arrouays et al., 2020; Wadoux et al., 2021).
Elements of the landscape control the processes acting on soils; therefore, the
soil-landscape approach constitutes one of the most powerful conceptual tools in
mapping activities, especially at scales with an intermediate or greater level of detail.

C. F. Araújo (✉) · T. P. Beldini


Universidade Federal do Oeste do Pará, Pará, Brazil
e-mail: [email protected]
R. C. de Oliveira Jr
Embrapa, Pará, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 15


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_2
16 C. F. Araújo et al.

The soil-landscape relationship is related to the concept of the catena (Milne, 1935).
In a catena, variations in soils along a slope are attributed to the translocation of
soluble elements and to erosive and depositional processes, not excluding different
source materials. Subsequently, the analyses of soil-landscape relations (Huggett,
1975), contemplated three-dimensional models of the slopes. In the context of
digital soil mapping, in the scorpan model paradigm (McBratney et al., 2003), soil-
landscape process modeling can be described as an interdisciplinary object of the
interface between pedometry-geomorphometry (Ma et al., 2019).
The scale issues of soil-landscape relationships are related to the complex
interactions of both elements and how these processes occur and are perceived.
Several geomorphology studies report a time–space coupling between landform
size-scales and lifetime (Schmidt & Andrew, 2005; Wysocki & Schoeneberger,
2011). From a soil perspective, different pedological process will manifest influence
at short, mid or long time-scales (Targulian & Krasilnikov, 2007; Kämpf & Curi,
2012). In contrast, topography studies in soil physics demonstrate a complex water
dynamic related to the nested geometry of slopes, considering relief and micro-
relief patterns, resulting in trends in the movement of particles and solutes and
changes in texture and chemical parameters of soils (Florinsky, 2016; Hu et al.,
2020). Therefore, multiscalar topography influences a particular soil distribution
in two general aspects, overlay of pedological processes that occurred at different
times, and driving forces in the present time, determined by the sum of forces better
correlated with one, several, or many geomorphologic scales.
Some aspects of spatial scaling in digital soil mapping have been summarized
in non-exhaustive reviews (Malone et al., 2013; Pachepsky & Hill, 2017). The
hierarchical definition of scale can be used to understand the soil phenomena, from
the soil region, passing through watersheds, catena, pedon, horizons and finally
molecular interactions. The characteristics of measurement affect the results of
analysis, and in this sense, the modifiable areal unit problem (MAUP) represents a
key issue. Information transfer across scales could be classified in upscaling, in less
detail, or downscaling, with greater detail, but both of these require must consider
bias.
With respect to the scale of covariables for predictive soil mapping, highest
DEM resolutions do not necessarily produce the highest accuracy (Cavazzi et al.,
2013; Samuel-Rosa et al., 2015). Despite the potential of machine learning to
produce complex and nonlinear predictions, few studies have investigated multiscale
perspective in covariables to account for physical process that are not predictable
by finer scale environmental information (Wadoux et al., 2020). Some studies
propose data-driven techniques for selection of pixel size or neighborhood size
for a particular landscape (Hengl, 2006; Smith et al., 2006), but this approach
produces a variety of results in different geomorphic units, which complicate the
interpretation of scalar components in the soil-landscape relationship. For a friendly
interpretation of scale relationships on soil-landscape models, this study proposed a
cartographic-based criterion to formalize the scale correspondence to pixel size for
geomorphometric covariables.
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 17

The present study tested the hypothesis whether multiscale geomorphic repre-
sentation, obtained from cartographic generalization of a digital elevation model,
can improve digital soil mapping. To achieve this goal, this case study applied the
Random Forest algorithm to a multiscale geomorphometric database to predict soil
surface texture.

2 Material and Methods

The procedures described in this section were performed using the open-source
software QGIS 3.10; SAGA GIS 2.3; GRASS GIS 7.8; and R Programming 3.5
(Conrad et al., 2015; GRASS, 2019; QGIS, 2019; R Core Team, 2019).

2.1 Study Area

The study was conducted in the Iripixi Lake (ILW) and Caipuru Lake (CLW)
watersheds, with an area of 27,137 ha and 28,315 ha respectively, located in the
Trombetas River basin in Oriximiná-Pará in the Eastern Amazon, as shown in Fig. 1.
The Alter do Chão Formation is a Cretaceous sedimentary deposit (CPRM,
2008). Locally, in western Pará, the formation process of Alter do Chão took place
in a depositional environment of a sinuous fluvial system, resulting in the record of
a succession of sandstones, conglomerates and mudstones (Mendes et al., 2012).
It was formed approximately 135 million years ago and has evolved in accord
with tectonic motion ever since (Somoza & Ghidella, 2012). Later, in the context
of the beginning of the Andean uplift, another depositional formation buried the
Alter do Chão Formation in the Paleocene, approximately 55 million years ago. The
movement resulting from the epeirogenesis of the South American continent and the
variations in the global level of the seas caused the dissection and almost complete
destruction of this Paleogenic Detritus-Lateritic Coverage in the Oligocene, 30
million years ago. In the final phase of geological evolution, from the Miocene,
approximately 10 million years ago, continental movements related to the advanced
stages of the Andean uplift and the variations in the global level of the ocean due to
glaciations reduced the base level and promoted the dissection of the Alter do Chão
Formation.
From the Quaternary onwards, 1.4 million years ago, the region went through
a period of geological stability (Somoza & Ghidella, 2012). In this sense, we can
infer that geomorphological and pedological contemporary evolutionary dynamics
have been developing since then. However, in this period there were at least three
major reductions in the global ocean level that may have initiated erosive processes,
soil rejuvenation and reorganization of drainage patterns (Bridgland, 2021). The
geomorphic units of these watersheds are classified as a homogeneous dissection
with coarse drainage density and weak incision depth (IBGE, 2008, 2009). The
18 C. F. Araújo et al.

Fig. 1 Study area location in the Eastern Amazon. (Source: author)

pedoenvironment of this study area are in the upper lands of the lower Amazon basin
and the most abundant soil classes in the area are Latossolo Amarelo, Latossolo
Vermelho-Amarelo, Argissolo Vermelho-Amarelo and Gleissolos Haplicos (Schae-
fer et al., 2017).

2.2 Environmental Covariates

In the proposed mapping scale, vegetation and topography factors are the main
sources of soil variation. In this study, to represent such factors, the source
multispectral images Landsat 8 and Shuttle Radar Topography Mission Digital
Elevation Model 30 m (SRTM DEM 30 m), were used, respectively. The Landsat
8 images are from September 11, 2017, corrected for surface reflectance with the
LaSRC algorithm by USGS (U.S. Geological Survey, 2019). The SRTM DEM is a
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 19

digital elevation model based on stereoscopic radar survey, and has 30 m pixels
(Farr et al., 2007). The corrections made in SRTM DEM were filling in sinks,
and reduction of deforestation effect by the estimated canopy addition method
(Brochado, 2015). The topography information was upscaled and organized into
generalized multiscale geomorphometric variable groups, as detailed in the next
section.

2.3 Multiscale Geomorphometric Generalization

The multiscale geomorphometric generalization (MGG), is an upscaling operation,


based on cartographic concepts of generalization of digital maps (Li & Openshaw,
1993; Guilbert et al., 2019). This approach can be applied for any geomorphic
variable, including elevation models, land-forms units, and primary and secondary
derivatives.
This operation results in variables at different scales, arranged in groups accord-
ing to criteria required for the analysis. The framework of these upscaling methods
brings to the pedometric perspective the understanding that the soil-landscape
relationship occurs through complex and multiscale interactions. In this sense, the
formalization of the desirable scales of analysis and modeling occurs both in their
definition and in the group arrangements.
For the MGG operation, vector and raster representations demand different
approaches for upscaling, because each of them has specific scale transformation
problems due to their mathematical structures. Furthermore, it is necessary to have a
unique reference for the scales for compatible representation of geomorphic features
in both types of variables, thus allowing for joint interpretation. In this study, the
concept of minimum mappable area for soil surveys (IBGE, 2015) was considered
to define pixel sizes in relation to cartographic scale. The detailed descriptions for
each of the four scales used are in Table 1. The area equivalence between raster and
vector is calculated as a function of a 5 × 5 pixel grid, considered a conservative
parameter to determine a geomorphic feature.
The MGG was applied to the following geomorphometric covariables: elevation
(Elev), slope, relative slope position (RSP), topographic wetness index (TWI), plan

Table 1 Correspondence between scale and pixel size for Multiscale Geomorphometric Gener-
alization (MGG), using the concept of minimum mappable area
Scale Minimal mappable area (m2 ) Pixel size (m) Pixel areaa (m2 ) pa/mmab (%)
1:25000 25,000 30 22,500 90
1:50000 100,000 60 90,000 90
1:75000 225,000 90 202,500 90
1:100000 400,000 120 360,000 90
a For a 5 × 5 window
b Ratio pixel area (pa) by minimal mappable area (mma), in percentage
20 C. F. Araújo et al.

General geomorphometry generalization

Local Derivatives
averages Multiscale calculations Multiscale
DEM Derivatives

Digital
Elevation
Model
Specific geomorphometry generalization

Landforms Feature
Geomorphons
classification exclusions Multiscale
geomorphons

Fig. 2 Methodology flowchart of MGG for the topography covariables. (Source: author)

curvature (PlanCurv), profile curvature (ProfCurv), topographic factor of water


erosion (LS) and geomorphons.
These geomorphic variables, at different scales, were obtained from SRTM
DEM from two upscaling methods, as illustrated in Fig. 2. Using local averages
on covariable elevation, in 2 × 2, 3 × 3, 4 × 4 windows, for resolutions 60 m,
90 m and 120 m, respectively, and subsequent derivatives covariable calculation.
Classification of geomorphons (Jasiewicz & Stepinski, 2013) was followed by
the exclusion of polygons smaller than the minimum mappable area for each
scale. Such methods correspond to cartographic generalization applied to general
geomorphometry and specific geomorphometry, respectively (Zinck, 2016).
In this case study, was used a machine learning approach to identify and select
the optimum scales of variables for modeling. It was therefore necessary to provide
a multiple database for training and evaluation each scale. In this sense, the variables
were organized from the combination of the set of topography variables, arranged
in all possible combinations.
The application of MGG to the original DEM database resulted in 28 continuous
geomorphometric variables, the distributions of which are illustrated in Fig. 3. Some
distributions had smaller changes, like TWI and LS, with a reduction of occurrence
of the extreme values and some scatter reduction in the upscaling direction. This
could be explained by the lowest representation of smaller geomorphic features
that contributed to counting of upper and lower limits on dynamics of the water on
the slope. Some variable distributions had highly significant changes, such as slope
and geomorphic curvatures, which had a sharp and progressive reduction of scatter
related to smoothest generalized surface. Considering the PlanCurv, it is observable
that the upscaled derivative better describes the features of the valleys and spikes in
the study area, as illustrated in Fig. 4.
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 21

Fig. 3 Distributions of geomorphometry raster variables at original and generalized scales.


(Source: author)

Fig. 4 3D View of covariables on original and generalized scales. (Source: author)

Some studies test upscaling effects on digital topographic information, with


comparable results. The contextual spatial modeling, using gaussian space scale
rates to produce a set of coarse resolution DEMs, had a similar result of smooth
slope and geomorphic curvatures (Behrens et al., 2018). In contrast, other studies
22 C. F. Araújo et al.

have proposed sophisticated calculations for generalization of DEM considering


questions of feature preservation, and could be tested with the MGG framework. The
Feature Preserving DEM Smoothing (FPDEMS) method reduces the complexity of
the surface at the detailed spatial scales at which roughness dominates, while not
significantly altering the topographic complexity at larger spatial scales (Lindsay et
al., 2019). Other approaches use a multi-point algorithm to rapidly and accurately
retrieve the critical points, using drainage-constrained TIN, to produce coarser-
resolution DEMs (Zhou & Chen, 2011; Wu et al., 2019). Other approaches to
classification of landforms, like the topographic position index (de Reu et al.,
2013), and k-median clustering (Szypuła & Wieczorek, 2020), also had parameters
that could be adjusted for the proposed correspondence of scales. Therefore, these
techniques can be included and tested with the MGG framework in future research.

2.4 Soil Sampling and Analysis

Soil sampling was performed in 9 pilot areas, considering covariables to evaluate


the effect of topography related to variation in soil distributions. Each pilot area
was sampled at 10 points, a sufficient density for semi-detailed soil surveys,
compatible with the 1:25,000 scale soil maps (IBGE, 2015). The sample points were
distributed according to a stratified random arrangement by the conditioned latin
hypercube method (Minasny & McBratney, 2006; Biswas & Zhang, 2018), with
raster topography covariates, described in the previous section, at a 1:25.000 scale.
At the total of 90 sample points, the morphological description of the A horizon was
performed (dos Santos et al., 2015) and soil samples were collected in the 0–30 cm
depth for physical and chemical analyses (Kettler et al., 2001; EMBRAPA, 2017).

2.5 Modeling by Random Forest

The modeling of the soil-landscape was done using Random Forest (Breiman,
2001), a machine learning algorithm frequently used to produce digital soil maps
(Lamichhane et al., 2019). Some characteristics of this algorithm that are worth
mentioning are that it can handle categorical and continuous variables, it can do
regression and classifications, it is robust for overfitting problems, and is feasible
for interpretation of variable relationships, including linear and non-linear systems
(Malone et al., 2017).
First, the set of training cases and those intended for validation were defined.
The selection was made randomly, with proportions of 70% and 30%, respectively.
The training was done for every group of geomorphometric covariables, one group
at a time, for modeling of some soil attributes, namely, A horizon thickness,
pH, silt and sand content. Next, for the prediction of the multiscale generalized
geomorphometric groups, the groups with the best adjustment were evaluated
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 23

and selected based on the highest values of % variation explained by the model.
Finally, the modeling structure and results of original scale geomorphometric
and generalized geomorphometrics groups were compared. For evaluation of this
predictions, visual analysis of digital maps and multiway plots of forest structures
and effect of variables on prediction were used, calculated with Random Forest
Explainer in the R package (Paluszynska et al., 2019).
The prediction of soil texture was made by considering the silt and sand raster
layers using the Brazilian soil classification system (Santos et al., 2018). For
evaluation of this prediction, the confusion matrix was calculated with the Kappa
index, and the user’s and producer’s accuracy (Liu et al., 2007).

3 Results and Discussion

This section discusses the Random Forest prediction of particle size, which is able
to produce soil textural classifications for land users and stakeholders and will focus
on the question of whether model adjustment can be improved by the MGG, and
if the soil particle size maps result in a more accurate soil textural classification.
Additionally, are presented some considerations on the patterns of occurrence of
pedogenetic processes that give rise to textural gradients in the watershed studied.
For prediction of particle size of surface layer, sand and silt fractions were
used because of the low contribution of clays. This could be explained by local
characteristics of the Alter do Chão Formation lithology, with an overall fine to
medium grained sandstone content in the upper portion, and medium and coarse
sand-stones with small contribution of red claystones in the lower portion (Mendes
et al., 2012).
The Random Forest models for silt and sand, at original scale geomorphometrics,
had a poor adjustment, as shown in Table 2. When considering the best adjusted
MGG groups, despite the considerable portion of randomness related to the
heterogeneity of soils, the % variation explained is reasonably higher and mean
of squared residuals is reasonably lower. The model’s predictors have a principal
contribution from variables at 1:75.000 and 1:100.000 and can identify tendencies
at the watershed scale.
The most significant covariables of particle size prediction are shown in Fig.
5a, b: Elev and RSP at coarser scales, associated with stratigraphy and long-term

Table 2 Model adjustment for silt and sand


Particle Size Predictors % Var explained Mean of squared residuals
Silt Original scale 3.82 0.0007345
MGG 1:75000 + 1:100000 31.73 0.0000674
Sand Original Scale 6.9 0.0023651
MGG 1:25000 + 1:75000 32.43 0.0002224
24 C. F. Araújo et al.

Fig. 5 Variable prediction importance: (a) model of sand by geomorphometric group 1:25000 plus
1:75000; (b) model of silt by geomorphometric group 1:75000 plus 1:100000. (Source: author)

hillslope transportation; ProfCurv at coarser scales, and PlanCurv at original and


coarser scales, related with accumulation, transit and dissipation zones (Florinsky
et al., 2002).
The prediction of sand and silt content, at original and multiscale generalized
geomorphometrics, is illustrated in Fig. 6. In both variable groups, the MGG
has produced maps with less noise and more recognizable patterns related to
geomorphic features. These results corroborate the hypothesis that the topography
has an influence, in a larger spatial context, and has prevalence on prediction of
soil particle size contents in the tested basin. In contrast, a case study with Random
Forest with 30 m and 90 m DEM did not achieve significant differences in prediction
(Bhering et al., 2016). Despite some similarity with covariables importance, like
Elev and RSP, the modeling is done on single scale datasets. In this sense, we
can argue the importance of observing soil-landscape phenomena from a multiscale
perspective.
The MGG was able to increase the accuracy of superficial layer soil texture
classifications, as shown in Table 3. The most significant improvements occur in
the sandy loam (MeAr) and loamy sand (ArMe) soil texture classes, both with a
smaller contribution area at the mapping site in relation to the more relevant sand
(MAr) class. Also, the user’s accuracy has a considerably higher result, so the MGG
increased the reliability of each mapped class. In the same way, the Kappa index also
has higher values for MGG geomorphometric variables.
A case study done on a farm in China, with machine learning on single and
multiple scales variables (Shi et al., 2018), also found better results with a range
of appropriate scales, even using only local derivatives. In this sense, the MGG
framework has greater potential be-cause DEM transformation before derivate
calculation allows for use of both local and regional derivatives in a multiscale group
arrangement.
Elutriation is the most relevant process for the reorganization of profiles in the
mapped basins, occurring primarily under high runoff conditions. This selective
erosion process, carrying mainly finer particles under conditions of forest vegetation
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 25

Fig. 6 Predictive maps of silt (a, b) and sand (c, d) at original scale geomorphometrics and
multiscale generalized geomorphometrics, respectively. (Source: author)

Table 3 Accuracy evaluation for soil texture classification


User’s / Producer’s accuracy Kappa Index
Geomorphometric variables MAr ArMe MeAr All Classes
Original Scale 75% / 84% 72% / 58% 20% / 20% 0.43
MGG 81% / 88% 76% / 71% 100% / 67% 0.62

cover, occurs in the high and convex portions of the slope, due to the greater
potential energy of water in such positions. The elutriation process gradually
removes clay from the upper layers, removing them from the profile, and eventually
produces a textural gradient. Clay translocation is a relevant but localized process for
the reorganization of profiles in the mapped basins, occurring primarily under high
percolation conditions. This process of vertical movement of clays in the profile has
a low distribution due to the reduced contents of clays in the geological basement.
Therefore, the differentiation that results in textural gradients arising from the
accumulation of clays in the lower layers is restricted to portions with intense water
movement. In these places it was possible to notice the presence of weak clay films,
26 C. F. Araújo et al.

which is the main indication of the predominance of clay translocation in such


profiles. For both processes, the current stage of such evolution may or may not meet
the requirements for classification as an Argissolo. In this sense, the ARGISSOLO
AMARELO Distrófico típico and LATOSSOLO AMARELO Distrófico argissólico
soils could mapped as an undifferentiated group (PAd + LAdarg).
The elutriation process occurs on summits, ridges, shoulders, spurs and slopes.
The most likely areas are related to flat and convex areas; and with low and
intermediate wetting rates; That is, conditions with water content and energy
availability, promoting the occurrence of surface runoff (Möller & Volk, 2015).
The clay translocation process occurs in valleys, slopes, footslopes and hollows.
In these segments, the process is more likely to occur when associated with high
levels of humidity, without saturation; and in portions of convergence of flows, with
concave plan curvatures; and in portions of flow deceleration, with concave profile
curvatures. That is, conditions with significant water flow, converging laterally and
losing kinetic energy, primarily promoting vertical flow (Wysocki & Schoeneberger,
2011).

4 Conclusions

The MGG improved model adjustment for silt and sand particles and also improved
the accuracy of metrics of soil texture classification of surface layer, especially for
the most unusual classes, with the Kappa Index going from 0.43 to 0.62. Topography
influences at a coarser spatial scale and has prevalence on prediction of soil particle
size contents in the studied watershed.
Future development of the MGG framework should address generalization of
DEM concerning feature preservation and comparison of landform classification
adaptable at multiple scales.

References

Arrouays, D., McBratney, A., Bouma, J., et al. (2020). Impressions of digital soil maps: The good,
the not so good, and making them ever better. Geoderma Regional, 20, e00255. https://doi.org/
10.1016/j.geodrs.2020.e00255
Behrens, T., Schmidt, K., MacMillan, R. A., & Viscarra Rossel, R. A. (2018). Multiscale contextual
spatial modelling with the Gaussian scale space. Geoderma, 310, 128–137. https://doi.org/
10.1016/j.geoderma.2017.09.015
Bhering, S. B., da Chagas, C. S., de Junior, W. C., et al. (2016). Mapeamento digital de areia, argila
e carbono orgânico por modelos Random Forest sob diferentes resoluções espaciais. Pesquisa
Agropecuária Brasileira, 51, 1359–1370. https://doi.org/10.1590/S0100-204X2016000900035
Biswas, A., & Zhang, Y. (2018). Sampling designs for validating digital soil maps: A review.
Pedosphere, 28, 1–15. https://doi.org/10.1016/S1002-0160(18)60001-3
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1201/
9780367816377-11
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 27

Bridgland, D. (2021). The role of geomorphology in the Quaternary. Geological Society, London,
Memoirs, 16, M58-2021-14. https://doi.org/10.1144/M58-2021-14
Brochado, G. T. (2015). Atenuação do efeito do desflorestamento em dados SRTM por meio de
diferentes técnicas de interpolação. Instituto Nacional de Pesquisas Espaciais.
Cavazzi, S., Corstanje, R., Mayr, T., et al. (2013). Are fine resolution digital elevation models
always the best choice in digital soil mapping? Geoderma, 195–196, 111–121. https://doi.org/
10.1016/j.geoderma.2012.11.020
Conrad, O., Bechtel, B., Bock, M., et al. (2015). System for Automated Geoscientific Analyses
(SAGA) v. 2.1.4. Geoscientific Model Development, 8, 1991–2007. https://doi.org/10.5194/
gmd-8-1991-2015
CPRM. (2008). Mapa geológico do estado do Pará.
de EMBRAPA CNPS. (2017). Manual de Métodos de Análise de Solo, 3a Edição. EMBRAPA-
CNPS.
de Reu, J., Bourgeois, J., Bats, M., et al. (2013). Application of the topographic position
index to heterogeneous landscapes. Geomorphology, 186, 39–49. https://doi.org/10.1016/
j.geomorph.2012.12.015
dos Santos, R. D., dos Santos, H. G., Ker, J. C., et al. (2015). Manual de descrição e coleta de solo
no campo, 7a edição. Sociedade Brasileira de Ciência do Solo.
Farr, T. G., Rosen, P. A., Caro, E., et al. (2007). The Shuttle Radar Topography Mission. Reviews
of Geophysics, 45, RG2004. https://doi.org/10.1029/2005RG000183
Florinsky, I. V. (2016). Influence of topography on soil properties. In Digital Terrain analysis in
soil science and geology (2nd ed., p. 482). Elsevier.
Florinsky, I. V., Eilers, R. G., Manning, G. R., & Fuller, L. G. (2002). Prediction of soil properties
by digital terrain modelling. Environmental Modelling & Software, 17, 295–311. https://
doi.org/10.1016/S1364-8152(01)00067-6
GRASS DT. (2019) Geographic Resources Analysis Support System (GRASS) Software, Version
7.8.
Guilbert, E., Boguslawski, P., & Isikdag, U. (2019). Multidimensional and multiscale GIS. ISPRS
International Journal of Geo-Information, 8, 6–8. https://doi.org/10.3390/ijgi8120523
Hengl, T. (2006). Finding the right pixel size. Computational Geosciences, 32, 1283–1298. https:/
/doi.org/10.1016/j.cageo.2005.11.008
Hu, G. R., Li, X. Y., & Yang, X. F. (2020). The impact of micro-topography on the interplay
of critical zone architecture and hydrological processes at the hillslope scale: Integrated
geophysical and hydrological experiments on the Qinghai-Tibet Plateau. Journal of Hydrology,
583, 124618. https://doi.org/10.1016/j.jhydrol.2020.124618
Huggett, R. J. (1975). Soil landscape systems: A model of soil Genesis. Geoderma, 13, 1–22.
https://doi.org/10.1016/0016-7061(75)90035-X
IBGE. (2008). Estado do Pará (p. 1). Mapa de Geomorfologia.
IBGE. (2009). Manual técnico de geomorfologia, 2a . IBGE.
IBGE. (2015). Manual Técnico de Pedologia, 3a . IBGE.
Jasiewicz, J., & Stepinski, T. F. (2013). Geomorphons-a pattern recognition approach to classi-
fication and mapping of landforms. Geomorphology, 182, 147–156. https://doi.org/10.1016/
j.geomorph.2012.11.005
Kämpf, N., & Curi, N. (2012). Formação e evolução do solo (pedogênese). In J. C. Ker, N. Curi, C.
E. G. R. Schaefer, & P. Vidal-Torrado (Eds.), Pedologia: fundamentos (pp. 207–302). SBCS.
Kettler, T. A., Doran, J. W., & Gilbert, T. L. (2001). Simplified method for soil particle-size
determination to accompany soil-quality analyses. Soil Science Society of America Journal,
852, 849–852.
Lamichhane, S., Kumar, L., & Wilson, B. (2019). Digital soil mapping algorithms and covariates
for soil organic carbon mapping and their implications: A review. Geoderma, 352, 395–413.
https://doi.org/10.1016/j.geoderma.2019.05.031
Li, Z., & Openshaw, S. (1993). A Natural Principle for the Objective Generalization of Digital
Maps. Cartography and Geographic Information Systems, 20, 19–29. https://doi.org/10.1559/
152304093782616779
28 C. F. Araújo et al.

Lindsay, J. B., Francioni, A., & Cockburn, J. M. H. (2019). LiDAR DEM smoothing and
the preservation of drainage features. Remote Sensing, 11, 17–19. https://doi.org/10.3390/
rs11161926
Liu, C., Frazier, P., & Kumar, L. (2007). Comparative assessment of the measures of thematic
classification accuracy. Remote Sensing of Environment, 107, 606–616. https://doi.org/10.1016/
j.rse.2006.10.010
Ma, Y., Minasny, B., Malone, B. P., & Mcbratney, A. B. (2019). Pedology and digital soil mapping
(DSM). European Journal of Soil Science, 70, 216–235. https://doi.org/10.1111/ejss.12790
Malone, B. P., McBratney, A. B., & Minasny, B. (2013). Spatial scaling for digital soil mapping.
Soil Science Society of America Journal, 77, 890–902. https://doi.org/10.2136/sssaj2012.0419
Malone, B. P., Minasny, B., & McBratney, A. B. (2017). Use R for Digital Soil Mapping. Springer
International Publishing.
McBratney, A. B., Mendonça Santos, M. L., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117, 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4
Mendes, A. C., Truckenbrod, W., & Rodrigues, A. C. R. N. (2012). Análise faciológica da
Formação Alter do Chão (Cretáceo, Bacia do Amazonas), próximo à cidade de Óbidos, Pará,
Brasil. Revista Brasileira de Geociencias, 42, 39–57.
Milne, G. (1935). Some suggested units of classification and mapping particularly for East African
soils. Soil Research, 4, 183–198.
Minasny, B., & McBratney, A. B. (2006). A conditioned Latin hypercube method for sampling
in the presence of ancillary information. Computational Geosciences, 32, 1378–1388. https://
doi.org/10.1016/j.cageo.2005.12.009
Möller, M., & Volk, M. (2015). Effective map scales for soil transport processes and related process
domains - Statistical and spatial characterization of their scale-specific inaccuracies. Geoderma,
247–248, 151–160. https://doi.org/10.1016/j.geoderma.2015.02.003
Pachepsky, Y., & Hill, R. L. (2017). Scale and scaling in soils. Geoderma, 287, 4–30. https://
doi.org/10.1016/j.geoderma.2016.08.017
Paluszynska A, Biecek P, Jiang Y (2019) randomForestExplainer: Explaining and Visualizing.
QGIS DT. (2019). QGIS Geographic Information System.
R Core Team R. (2019). R: A language and environment for statistical computing.
Samuel-Rosa, A., Heuvelink, G. B. M., Vasques, G. M., & Anjos, L. H. C. (2015). Do more
detailed environmental covariates deliver more accurate soil maps? Geoderma, 243–244, 214–
227. https://doi.org/10.1016/j.geoderma.2014.12.017
Santos HG, Jacomine P, dos Anjos LHC, et al (2018) Sistema brasileiro de classificação de Solos,
5a . Sociedade Brasileira de Ciência do Solo.
Schaefer, C. E. G. R., de Lima, H. N., Teixeira, W. G., et al. (2017). Solos da região amazônica.
In N. Curi, J. C. Ker, R. F. Novais, et al. (Eds.), Pedologia - Solos dos Biomas Brasileiros (pp.
75–111). SBCS.
Schmidt, J., & Andrew, R. (2005). Multi-scale landform characterization. Area, 37, 341–350.
https://doi.org/10.1111/j.1475-4762.2005.00638.x
Shi, J., Yang, L., Zhu, A.-X., et al. (2018). Machine-learning variables at different scales vs.
knowledge-based variables for mapping multiple soil properties. Soil Science Society of
America Journal, 82, 645–656. https://doi.org/10.2136/sssaj2017.11.0392
Smith, M. P., Zhu, A. X., Burt, J. E., & Stiles, C. (2006). The effects of DEM resolution
and neighborhood size on digital soil survey. Geoderma, 137, 58–69. https://doi.org/10.1016/
j.geoderma.2006.07.002
Somoza, R., & Ghidella, M. E. (2012). Late Cretaceous to recent plate motions in western South
America revisited. Earth and Planetary Science Letters, 331–332, 152–163. https://doi.org/
10.1016/j.epsl.2012.03.003
Szypuła, B., & Wieczorek, M. (2020). Geomorphometric relief classification with the k-median
method in the Silesian Upland, southern Poland. Frontiers in Earth Science, 14, 152–170.
https://doi.org/10.1007/s11707-019-0765-9
Multiscalar Geomorphometric Generalization to Delineate Soil Textural. . . 29

Targulian, V. O., & Krasilnikov, P. V. (2007). Soil system and pedogenic processes: Self-
organization, time scales, and environmental significance. Catena (Amst), 71, 373–381. https:/
/doi.org/10.1016/j.catena.2007.03.007
U.S. Geological Survey. (2019). Landsat 8 Surface Reflectance Code (LASRC) Poduct Guide (No.
LSDS-1368 Version 2.0), p. 40.
Wadoux, A. M. J. C., Minasny, B., & McBratney, A. B. (2020). Machine learning for digital
soil mapping: Applications, challenges and suggested solutions. Earth-Science Reviews, 210,
103359. https://doi.org/10.1016/j.earscirev.2020.103359
Wadoux, A. M. J. C., Heuvelink, G. B. M., Lark, R. M., et al. (2021). Ten challenges for the future
of pedometrics. Geoderma, 401, 115155. https://doi.org/10.1016/j.geoderma.2021.115155
Wu, Q., Chen, Y., Wilson, J. P., et al. (2019). An effective parallelization algorithm for DEM
generalization based on CUDA. Environmental Modelling and Software, 114, 64–74. https://
doi.org/10.1016/j.envsoft.2019.01.002
Wysocki, D. A., & Schoeneberger, P. J. (2011). Geomorphology of Soil Landscapes. In P. M.
Huang, Y. Li, & M. E. Sumner (Eds.), Handbook of soil science: Properties and processes (pp.
1–26). Chemical Rubber Company Press.
Zhou, Q., & Chen, Y. (2011). Generalization of DEM for terrain analysis using a compound
method. ISPRS Journal of Photogrammetry and Remote Sensing, 66, 38–45. https://doi.org/
10.1016/j.isprsjprs.2010.08.005
Zinck, J. A. (2016). The Geomorphic Landscape: Criteria for Classifying Geoforms. In Geopedol-
ogy (pp. 77–99). Springer International Publishing.
Applying Machine Learning Techniques
to Model and Map Soil Surface Texture
Using Limited Legacy Data

Luís Flávio Pereira, Cássio Marques Moquedace,


Gabriel Phelipe Nascimento Rosolem, Maria da Conceição de Sousa,
Márcio Rocha Francelino, and Elpídio Inácio Fernandes-Filho

1 Introduction

Soils are essential and non-renewable natural resources which play important roles
in the main biogeochemical processes that guarantee life on Earth (Adhikari &
Hartemink, 2016; Polidoro et al., 2016). These functions are reflected in a wide
spectrum of ecosystem services for humans, such as provision of food, fiber
and fuel, water and climate regulation, nutrient cycling support, and aesthetic
and spiritual cultural services (Adhikari & Hartemink, 2016). The provision of
ecosystem services is dependent on soil attributes and their interactions, that are
mainly regulated by three basic soil properties: texture, mineralogy and soil organic
matter (Palm et al., 2007). Soil texture influences physical, chemical and ecological
processes, such as water storage, availability and movement, cation exchange capac-
ity (fertility) and biodiversity (Ribeiro et al., 2012; Román Dobarco et al., 2017).
Unlike organic matter, soil texture and mineralogy are stable properties in short and
medium term regarding the soil management. Among these three attributes, texture
is the most common feature characterized because it is comparatively simpler and
cheaper to measure. Thus, soil texture is a common input in pedotransfer functions
for deriving other hydrological soil properties (Vereecken et al., 1992), and for
meteorological, hydrological and precision agriculture modeling and assessing

L. F. Pereira (✉) · C. M. Moquedace · M. R. Francelino · E. I. Fernandes-Filho


Department of Soils, Federal University of Viçosa, Viçosa, Brazil
e-mail: [email protected]; [email protected]; [email protected]; [email protected]
G. P. N. Rosolem
Department of Rural Engineering, Federal University of Santa Catarina, Florianópolis, Brazil
e-mail: [email protected]
M. d. C. de Sousa
Department of Biology, Federal University of Ceará, Fortaleza, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 31


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_3
32 L. F. Pereira et al.

(Laborczi et al., 2016). In this way, soil texture maps can provide fundamental
data for understanding the conditions and trends in soil functions, which affect the
provision of ecosystem services, helping farmers, policymakers and practitioners in
the decision-making of related subjects.
The entire Brazilian territory is covered by exploratory soil surveys (maps) (scale
1:1,000,000), but less than 1.5% on a scale greater than or equal to 1:50,000 (dos
Santos et al., 2013). These maps were mostly produced by conventional surveying,
photointerpretation, and manual polygon delimitation, processes that are expensive
and time-consuming. Updating these maps using the conventional procedure would
require the entire repetition of these production processes (Zhu et al., 2001). On the
other hand, digital soil mapping techniques allow a faster and cheaper production of
soil property maps, based on the combination of these legacy data collections with
environmental covariates related to soil formation factors (McBratney et al., 2003).
However, standardized and large datasets are scarce, what hinders the performance
of digital soil mapping techniques for large and geodiverse regions, especially in
developing and underdeveloped countries. Due to this limitation, the production of
soil texture maps in Brazil has been carried out mainly at the local or microregional
level (Demattê et al., 2016; Sayão & Demattê, 2018; Pinheiro et al., 2018; Chagas
et al., 2016). Among the various techniques available, the use of machine learning
in soil science has grown in recent years to model different soils’ properties and
phenomena (Padarian et al., 2019). Different machine learning models have been
applied to map soil texture, such as Cubist (Castro-Franco et al., 2017), Decision
Trees (Laborczi et al., 2016), Multivariate Adaptive Regression Splines (Ballabio et
al., 2016), Multiple Linear Regressions (Mondejar & Tongco, 2019) and Random
Forest (Chagas et al., 2016).
In this sense, the main objective of this study was to map the soil surface
texture (0–20 cm) in Minas Gerais state, Brazil, using machine learning techniques
and limited legacy data. For this purpose, we specifically aimed to: (i) Select the
main covariates useful to model and map the soil surface texture; (ii) Evaluate the
performance of different machine learning algorithms for modeling and prediction;
and (iii) Quantify the uncertainties of coarse sand, fine sand, silt and clay contents
prediction.

2 Material and Methods

2.1 Study Area

The Minas Gerais state is a large and geodiverse located in southeastern Brazil (Fig.
1). The state has an area of about 587,000 km2 , slightly smaller than area of Portugal
and Spain together, and presents high geodiversity and climate variability (Fig. 2).
The climate in the region varies in a south-north range of aridity, from tropical with
a dry summer in it north to subtropical without dry season and temperate summer
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 33

Fig. 1 Study area and soil samples characterization

in south, regarding the Köppen’s classification (Alvares et al., 2014). The altitudes
vary from less than 200 m in the erosional depressions of the Doce River, until
more than 2000 m in the Mantiqueira Ridges. The altitudinal and climatic gradient,
associated with different rock types led to regoliths and soils with different textures
in space and depth (Fig. 2). Deep and highly weathered soils (Oxisols/Latossolos
and Ultisols/Argissolos) cover about 65% of the state area, but soils in the northern
part tend to be shallow and less weathered (Entisols/Neossolos, Alfisols/Luvissolos
and Planossolos, and Inceptsols/Cambissolos) (UFV, 2010). The complex relations
between climate and geodiversity result in a mosaic of three main domains of
vegetation: Cerrado, the “Brazilian savanna” (54%), Atlantic Forest, a blend of
tropical rain and dry forests (40%), and Caatinga, a blend of several types of dry
forests (6%) (IBGE, 2019).

2.2 Soil Texture Samples and Covariates

To model and map surface texture for the whole area, we used 667 georeferenced
samples (Fig. 1) containing standardized measures of coarse sand (2.00–0.20 mm),
fine sand (0.20–0.05 mm), silt (0.05–0.002 mm), and clay (< 0.002 mm) contents
at 0–20 cm depth (de Souza et al., 2016), and 119 covariates related to soil
formation factors, based on the SCORPAN model (McBratney et al., 2003) (Table
1). Covariates included climate and bioclimatic data from WorldClim (Hijmans et
al., 2005), parent material types and regolith texture (CPRM, 2004), volumetric
water content in soil (Copernicus Climate Change Service, 2019), soil classes
34 L. F. Pereira et al.

Fig. 2 Environmental characteristics of the Minas Gerais state. Regolith texture: MC Mainly
clayey; MS Mainly sandy; MSSC Mainly sandy-silty-clayey; MSC Mainly silty-clayey; VSSC
Variable from sandy to silty-clayey; VD Variable in the depth. Soils: TG - Texture Gradient (mostly
Ultisols); C - Cambissolos (Inceptsols); R -Neossolos (Entisols); G - Gleissolos (mostly Aquents).
Latossolos = Oxisols and Nitossolos = Ultisols, Oxisols (Kandic) or Alfisols

(UFV, 2010), elevation (NASA, 2020), and morphometric variables from the digital
elevation model processing using the RSAGA (Brenning et al., 2018) and rgrass7
(Bivand, 2019) packages in the R software (R Core Team, 2020). All covariates
were harmonized to 500 m of spatial resolution through cubic splines using the
gdalUtilities package (O’Brien, 2020).

2.3 Selecting Predictors

We applied a sequential predictor selection, following two steps. The first step was
eliminating the covariate with biggest global correlation from covariate pairs with
more than 95% spearman correlation, using the findcorrelation function of the caret
package (Kuhn, 2020). The second step was the application of the Recursive Feature
Elimination algorithm (RFE) over the set of covariates selected in the first step. The
RFE performs a selection of the best subset of predictors based on their importance
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 35

Table 1 Environmental covariates used to model the soil texture for the state of Minas Gerais
Relief Temp. Bioclim. Prec. Soil and Parent Material
Elevation Tmean 1 BIO1 Prec 1 Rock Type
Aspect Tmean 2 BIO2 Prec 2 Regolith Texture
Convergence Index Tmean 3 BIO3 Prec 3 Soil Classes
Cross-Sectional Curvature Tmean 4 BIO4 Prec 4 Soil Moisture 0–7 cm
Curvature Classification Tmean 5 BIO5 Prec 5 Soil Moisture 7–28 cm
Difference Tmean 6 BIO6 Prec 6 Soil Moisture 0–28 cm
Diffuse Insolation Tmean 7 BIO7 Prec 7 –
Direct Insolation Tmean 8 BIO8 Prec 8 –
Direct to Diffuse Ratio Tmean 9 BIO9 Prec 9 –
Diurnal Anisotropic Heating Tmean 10 BIO10 Prec 10 –
Duration of Insolation Tmean 11 BIO11 Prec 11 –
Flow Line Curvature Tmean 12 BIO12 Prec 12 –
General Curvature Tmin 1 BIO13 – –
Hill Height Tmin 2 BIO14 – –
Hill Index Tmin 3 BIO15 – –
Hillslope Index Tmin 4 BIO16 – –
Landforms Tmin 5 BIO17 – –
Longitudinal Curvature Tmin 6 BIO18 – –
Mass Balance Index Tmin 7 BIO19 – –
Maximal Curvature Tmin 8 – –
Mid-Slope Positon Tmin 9 – – –
Minimal Curvature Tmin 10 – – –
Multi-resolution Ridge Top Tmin 11 – – –
Flatness (MrRTF)
Normalized Height Tmin 12 – – –
Plan Curvature Tmax 1 – – –
Profile Curvature Tmax 2 – – –
Protection Index Tmax 3 – – –
Real Surface Area Tmax 4 – – –
Slope Tmax 5 – – –
Slope Height Tmax 6 – – –
Standardized Height Tmax 7 – – –
Sunrise Tmax 8 – – –
Sunset Tmax 9 – – –
Surface Specific Points Tmax 10 – – –
Tangential Curvature Tmax 11 – – –
Terrain Ruggedness Index Tmax 12 – – –
(TRI)
Terrain Surface Convexity – – – –
Terrain Surface Texture – – – –
Topographic Position Index – – – –
Topographic Wetness Index – – – –
(continued)
36 L. F. Pereira et al.

Table 1 (continued)
Relief Temp. Bioclim. Prec. Soil and Parent Material
Total Curvature – – – –
Total Insolation – – – –
Valley Depth – – – –
Valley Depth – – – –
Valley Index – – – –
Vector Terrain Ruggedness – – – –
Table.Temp. Temperature. Tmean Mean Temperature; Tmin Minimum Temperature; Tmax Maxi-
mum Temperature.
Bioclim. Bioclimatic. BIO1 Annual Mean Temperature; BIO2 Mean Diurnal Range (Mean of
monthly (max temp – min temp)); BIO3 Isothermality (BIO2/BIO7) (×100); BIO4 Temperature
Seasonality (standard deviation ×100); BIO5 Max Temperature of Warmest Month; BIO6
Min Temperature of Coldest Month; BIO7 Temperature Annual Range (BIO5-BIO6); BIO8
Mean Temperature of Wettest Quarter; BIO9 Mean Temperature of Driest Quarter; BIO10
Mean Temperature of Warmest Quarter; BIO11 Mean Temperature of Coldest Quarter; BIO12
Annual Precipitation; BIO13 Precipitation of Wettest Month; BIO14 Precipitation of Driest
Month; BIO15 Precipitation Seasonality (Coefficient of Variation); BIO16 Precipitation of Wettest
Quarter; BIO17 Precipitation of Driest Quarter; BIO18 Precipitation of Warmest Quarter; BIO19
Precipitation of Coldest Quarter
Prec.Precipitation
Numbers from 1 to 12 = months from January to December

to the model. Thus, the model is optimized by using the smallest subset without
significant loss in predictive performance (Kuhn & Johnson, 2013). The subsets are
constructed increasing the number of variables following an importance ranking.
The ranking was established by initially running the model using the entire set of
covariates. We started using discrete subsets from 2 to 35 covariates, increasing the
size of subsets one by one covariate. After, subsets bigger than 35 covariates were
increased 5 by 5 covariates until the total number of covariates.

2.4 Modeling and Mapping

The predictive modeling for each particle size (coarse sand, fine sand, silt and
clay) was done using five different algorithms: Random Forest (RF); Cubist,
Multivariate Adaptive Regression Spline (MARS); Support Vector Machines with a
Radial Basis Function—(SVM Radial); and Stochastic Gradient Boosting (GBM).
We used five models with the purpose of reducing the uncertainty in texture
prediction by selecting the most suitable one, since each algorithm applies a
different mathematical approach to pattern recognition and prediction. For each
model, the data split in training and holdout test, predictor’s selection, modeling,
and mapping were performed 100 times, using 75% of samples for training and
cross-validation and 25% for a holdout testing. The metrics adopted for validation
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 37

were: coefficient of determination (R2 ), mean absolute error (MAE) and root-mean-
square error (RMSE). The maps predicted by the model with the best performance
were assessed regarding their consistency and uncertainties. For this evaluation, we
used the average of predicted contents, the coefficient of variation (CV) in the 100
predictions, the sum of the mean content of all texture fractions modeled, and the silt
content calculated by difference of fractions (total content of all soil particle sizes
(100%) subtracted by the sum of coarse sand, fine sand, and clay modeled contents).
Bellow, we present a brief description of each model applied in this chapter. All
models were fitted and optimized using the caret and other associated packages
(Kuhn, 2020) in the R software (R Core Team, 2020). Additional hyperparameters
not cited bellow were kept as default.

2.4.1 Randon Forest (RF)

The RF algorithm in the randomForest package (Liaw & Wiener, 2002) is based
on an ensemble of decision trees, in which the predicted value is the average
of predictions from all trees. In the algorithm training, a bootstrap is applied in
samples and a random sampling is applied in covariates in order to reduce the tree’s
redundance and the model overfitting chances (Wadoux et al., 2020). For RF, we
optimized the number of covariates used for training each tree, the so called mtry
hyperparameter.

2.4.2 Cubist

The Cubist model in the cubist package (Kuhn & Quinlan, 2021) works creating
“if it’s after and then” rules. Each rule is associated to a multivariate linear model,
which calculates the predicted value when the rule is met. This approach allows
linear models to describe linearity in several parts of non-linear relations in a
predictor (Zeraatpisheh et al., 2019). For cubist, the hyperparameters committees
and neighbors were optimized.

2.4.3 Multivariate Adaptive Regression Spline (MARS)

The MARS model in the earth package (Milborrow, 2020) is a non-parametric


method based in the partition of a functional relationship (base function) in several
parts of linear segments (nodes). The model for each segment and the total
number of segments are automatically determined, and operated by initially fitting
a complex and finetuned model. After that, the less important base functions are
removed, reducing the model complexity without losses in the overall performance
(Conoscenti et al., 2015). The hyperparameters degree and nprune were optimized.
38 L. F. Pereira et al.

2.4.4 Support Vector Machines with a Radial Basis Function (SVM


Radial)

The SVM Radial in the kernlab package (Karatzoglou et al., 2004; Meyer et
al., 2021) is based on statistical learning and the maximum margin principle,
maximizing the separation between the support vectors in a radial kernel (Cortes
& Vapnik, 1995). We optimized the hyperparameter C.

2.4.5 Stochastic Gradient Boosting (GBM)

The GBM in the gbm package (Greenwell et al., 2020) combines boosting and
bagging to build sequential trees. From the first tree, the subsequent one is built
considering only the residuals of the previous tree. The boost is implemented by
selecting a random sample for each step, without replacement (Friedman, 2002;
Rahman et al., 2020). The hyperparameters interaction.depth and n.trees were
optimized.

3 Results and Discussion

Considering the limited legacy data approach, our results showed a fair predic-
tive performance of machine learning models (Fig. 3). All models presented no
overfitting, with holdout test performance even superior than the observed for the
cross-validation in the training step. Random Forest model had the best performance
for all soil particle sizes and metrics assessed, followed by Cubist. SVM Radial,

Fig. 3 Model performance for the 100 runs for each particle size in the test and training steps
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 39

GBM and MARS, respectively. The finding was similar to others studies that also
observed Random Forest outperformance compared to different algorithms and
geostatistical methods to model soil surface texture (Keshavarzi et al., 2022; Kaya
et al., 2022; Chagas et al., 2016). Therefore, we chose the Random Forest models
to map the final results. This model achieved a R2 of 0.22, 0.34, 0.36 and 0.38 for
coarse sand, fine sand, silt and clay, respectively. Coarse and fine sand presented
the lowest R2 compared to the others particles size, so we hypothesize that merging
both fraction in a single sand class might improve models’ performance. On the
other hand, clay fraction presented the highest R2 , but also the biggest RMSE and
MAE. However, these bigger errors are related to the higher clay contents in the
study area, compared to the others texture contents (Figs. 2 and 6).
The observed range of 0.22 to 0.38 for the coefficient of determination indicates a
moderate agreement between the predicted and actual soil texture contents (Castro-
Franco et al., 2017). Although surface soil texture results among studies cannot
be directed related due to several variability in methods (for example, sample
density, environmental diversity, and different training and validation approaches),
our R2 can be considerate regular, compared to studies using similar predictive and
validation methods (Keshavarzi et al., 2022; Castro-Franco et al., 2017; Ballabio
et al., 2016; Niang et al., 2014). Keshavarzi et al. (2022) modeled topsoil texture
(0–20 cm) in the Piedmont plain of Iran using Random Forest algorithm, DEM
and remotely sensed covariates, and observed a R2 of 0.07, −0.22 and 0.38 for
sand, silt and clay in the holdout test (30% of samples). Similarly, Castro-Franco
et al. (2017) mapped topsoil texture (0–20 cm) in the southern Argentine Pampas
using the Cubist algorithm, covariates derivate from DEM, remote sensing and
WorldClim, and obtained a R2 of 0.18, 0.18 and 0.07 for sand, silt and clay in the
holdout test (30% of samples). On the other hand, Ballabio et al. (2016) modeled
topsoil texture (0–20 cm) for the 25 Members States of European Union, achieving
a R2 of 0.49, 0.47 and 0.50 for sand, silt and clay contents in a holdout test (25% of
samples). The authors used the MARS algorithm, covariates from remote sensing,
DEM and WorldClim data. Futhermore, Niang et al. (2014) modeled topsoil texture
in Rouville County, Canada, using Support Vector Regressions and radar data and
also found R2 values superior to 0.65 for all soil fractions in a holdout test (20% of
samples).
Relatively to RMSE, our results presented higher values than observed by other
studies (Keshavarzi et al., 2022; Shahriari et al., 2019; Castro-Franco et al., 2017;
Ballabio et al., 2016; Niang et al., 2014; Adhikari et al., 2013). Even though a lower
RMSE is associated to greater predictive capacity, this metric cannot be used to
compare different properties since it is dependent to range or magnitude of input
data values (Henderson et al., 2005; Pinheiro et al., 2018). So, by comparing coarse
and fine sand, silt and clay content scale observed on the sample data (Fig. 2) we
consider satisfactory the magnitude of RMSE values.
Regarding to the importance in predicting soil texture, a range of 7 to 15
covariates from the 119 initial ones was needed to stabilize the performance of
the RF models, what highlights the importance of selecting predictors to build
well-fitted and parsimonious models (Fig. 4). Models for clay contents required
40 L. F. Pereira et al.

Fig. 4 Mean performance of the RF model in relation to the number of variables selected during
the application of the RFE algorithm

the lowest number of predictors to achieved the best performance, while sand and
silt were more responsive to the addition of less important predictors. In general,
it is possible notice that the worse the performance, the greater the number of
required predictors to achieve a stable performance. Such relationship appears be
caused by the model’s difficult in pattern recognition, in which low importance
predictors are selected due to small improvements of models by adjustment to noise
(overfitting). The RFE role of avoiding overfitting is confirmed by the decrease
in model’s performance with the excessively increase of predictors in coarse sand
contents (Figs. 3 and 4).
The most important predictors and its ranking variated among particle sizes
(Fig. 5). Soil and parent material predictors were always at the top of the ranking,
followed by climate and bioclimate predictors. The regolith texture presented as
an important predictor for all fractions analyzed, especially to the sand fractions.
Machado and Silva (2010) described that this covariate was structed based on
rock mineral compositions of Minas Gerais. A strong relationship between parent
material and soil surface texture was also observed by Samuel-Rosa et al. (2015)
and Pinheiro et al. (2018) in small watersheds in Southern and Southeastern Brazil,
respectively. Therefore, we believe that others covariates related to parent material,
such as airborne gamma ray spectrometry and magnetometry data, might contribute
to improve model performance in further studies.
Soil classes had similar influence of regolith texture. The reason can be attributed
to the fact that texture is one major attribute to differentiate classes on the first
and second categorical level in the Brazilian Soil Classification System. Due to the
south-north climate trend along Minas Gerais, climate and bioclimate covariates
had a significant contribution in all particle size, as expected. On the other hand,
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 41

Fig. 5 Ranking of the 15 most important predictors considering the average importance of the 100
runs for the RF model

relief was the least important type of predictor, being more important for prediction
of fine sand contents. This minor contribution of relief in the texture prediction
was different from observed in other studies. We believe that this might had happen
because of two reasons. First, rescaling DEM to a low resolution produces a smother
surface (Riza et al., 2021), which might affect differently derivate covariates. Second
reason is that relief effects was better capture thought the climate and bioclimate
covariates, once climate and relief are strongly related in the study area (Fig. 2).
These results are also confirmed by a visual-spatial assessment (Figs. 6 and
7). In general, all particle sizes presented low CV values and low spatial and
quantitative inconsistences of the values of sum-of-contents and silt-by-difference
metrics. Models had more stable predictions in south, southeast and east parts of the
Minas Gerais state, a mainly sandy-silty-clayey region with Oxisols (Latossolos)
and Ultisols (Argissolos) (Fig. 2). Field knowledge and cartographic comparisons
also confirm a good spatial and quantitative consistency of maps, which could be
improved by collecting more soil texture samples on low sample density areas
(Fig. 6).
42 L. F. Pereira et al.

Fig. 6 Modeled maps of mean content from the 100 runs and spatial uncertainties for each particle
size using the Random Forest model

Fig. 7 Histograms of sum-of-contents and silt-by-difference maps

4 Conclusions

We conclude that machine learning techniques can produce fair consistent maps
for superficial texture and the associated models’ uncertainties, even using limited
legacy data. Random Forest was the best model and the most important predictors
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 43

were distributed across the SCORPAN factors. Soils are mostly clayey at the
surface, and fine sand particles are dominant in low-clay areas, which is mainly
related to parent materials and weathering conditions.

Acknowledgements This work was supported by the Conselho Nacional de Desenvolvimento


Científico e Tecnológico (CNPQ) and Coordenação de Aperfeiçoamento de Pessoal de Nível
Superior – Brasil (CAPES) – Financing code 001.

References

Adhikari, K., & Hartemink, A. E. (2016). Linking soils to ecosystem services — A global review.
Geoderma, 262, 101–111. https://doi.org/10.1016/j.geoderma.2015.08.009
Adhikari, K., Kheir, R. B., Greve, M. B., Bøcher, P. K., Malone, B. P., Minasny, B., et al. (2013).
High-resolution 3-D mapping of soil texture in Denmark. Soil Science Society of America
Journal, 77(3), 860–876. https://doi.org/10.2136/sssaj2012.0275
Alvares, C. A., Stape, J. L., Sentelhas, P. C., Gonçalves, J. L. M., & Sparovek, G. (2014). Köppen’s
climate classification map for Brazil. Meteorologische Zeitschrift, 22, 711–728. https://doi.org/
10.1127/0941-2948/2013/0507
Ballabio, C., Panagos, P., & Monatanarella, L. (2016). Mapping topsoil physical properties at
European scale using the LUCAS database. Geoderma, 261, 110–123. https://doi.org/10.1016/
j.geoderma.2015.07.006
Bivand, R. (2019). rgrass7: Interface between GRASS 7 geographical information system and R.
Brenning, A., Bangs, D., & Becker, M. (2018). RSAGA: SAGA Geoprocessing and terrain
analysis.
Castro-Franco, M., Domenech, M. B., Borda, M. R., & Costa, J. (2017). A spatial dataset of topsoil
texture for the southern Argentine Pampas. Geoderma Regional, 12, 18–27. https://doi.org/
10.1016/j.geodrs.2017.11.003
Chagas, C., de Carvalho Junior, W., Bhering, S. B., & Calderano Filho, B. (2016). Spatial
prediction of soil surface texture in a semiarid region using random forest and multiple linear
regressions. Catena, 139, 232–240. https://doi.org/10.1016/j.catena.2016.01.001
Conoscenti, C., Ciaccio, M., Caraballo-Arias, N. A., Gómez-Gutiérrez, Á., Rotigliano, E., &
Agnesi, V. (2015). Assessment of susceptibility to earth-flow landslide using logistic regression
and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily,
Italy). Geomorphology, 242, 49–64. https://doi.org/10.1016/j.geomorph.2014.09.020
Copernicus Climate Change Service. (2019). ERA5-Land hourly data from 2001 to present.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297. https:/
/doi.org/10.1007/BF00994018
CPRM. (2004). Carta Geológica do Brasil ao Milionésimo: sistema de informações geográficas–
SIG [Geological Map of Brazil 1: 1.000. 000 scale: Geographic Information System–GIS].
de Souza, J. J. L. L., et al. (2016). Geochemistry and spatial variability of metal(loid) concen-
trations in soils of the state of Minas Gerais, Brazil. Science of the Total Environment, 505,
338–349.
Demattê, J. A. M., Alves, M. R., da Terra, F. S., Bosquilia, R. W. D., Fongaro, C. T., & da
Barros, P. P. S. (2016). Is it possible to classify topsoil texture using a sensor located 800
km away from the surface? Revista Brasileira de Ciência do Solo, 40. https://doi.org/10.1590/
18069657rbcs20150335
dos Santos, H. G., Luiz Diamante Aglio, M., de Oliveira Dart, R., de Lourdes Breffin, M. M.,
& Silva de Souza, J. (2013). Distribuição Espacial dos Níveis de Levantamento de Solos no
Brasil. XXXIV Congresso Brasileiro De Ciência Do Solo, 1–4.
44 L. F. Pereira et al.

Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis,
38, 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Greenwell, B., Boehmke, B., Cunningham, J., & GBM Developers. (2020). gbm: Generalized
boosted regression models.
Henderson, B. L., Bui, E. N., Moran, C. J., & Simon, D. A. P. (2005). Australia-wide predictions
of soil properties using decision trees. Geoderma, 124(3–4), 383–398. https://doi.org/10.1016/
j.geoderma.2004.06.007
Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., & Jarvis, A. (2005). Very high-resolution
interpolated climate surfaces for global land areas. International Journal of Climatology, 25,
1965–1978. https://doi.org/10.1002/joc.1276
IBGE. (2019). Biomas e Sistema Costeiro-Marinho do Brasil – 1:250 000. Disponível em. https://
www.ibge.gov.br/geociencias/informacoes-ambientais/15842-biomas.html?=&t=sobre
Karatzoglou, A., Hornik, K., Smola, A., & Zeileis, A. (2004). kernlab – An S4 package for kernel
methods in R. Journal of Statistical Software, 11, 1–20. https://doi.org/10.18637/jss.v011.i09
Kaya, F., Başayiğit, L., Keshavarzi, A., & Francaviglia, R. (2022). Digital mapping for soil texture
class prediction in northwestern Türkiye by different machine learning algorithms. Geoderma
Regional, 31, e00584.
Keshavarzi, A., del Árbol, M. Á. S., Kaya, F., Gyasi-Agyei, Y., & Rodrigo-Comino, J. (2022).
Digital mapping of soil texture classes for efficient land management in the Piedmont plain of
Iran. Soil Use and Management, 38(4), 1705–1735.
Kuhn, M. (2020). Caret: Classification and regression training.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling, applied predictive modeling.
Springer New York. https://doi.org/10.1007/978-1-4614-6849-3
Kuhn, M., & Quinlan, R. (2021). Cubist: Rule – And instance-based regression modeling.
Laborczi, A., Szatmári, G., Takács, K., & Pásztor, L. (2016). Mapping of topsoil texture in
Hungary using classification trees. Journal of Maps, 12, 999–1009. https://doi.org/10.1080/
17445647.2015.1113896
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News.
Machado, M. F., & Silva, S. F. (2010). Geodiversidade Do Estado De Minas Gerais: Programa
Geologia Do Brasil Levantamento Da Geodiversidade. Serviço Geológico Brasileiro–CPRM.
McBratney, A. B., Mendonça Santos, M. L., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117, 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2021). e1071: Misc
Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071). TU
Wien.
Milborrow, S. (2020). Earth: Multivariate adaptive regression splines.
Mondejar, J. P., & Tongco, A. F. (2019). Estimating topsoil texture fractions by digital soil
mapping – A response to the long-outdated soil map in The Philippines. Sustain. Environ.
Res., 29, 31. https://doi.org/10.1186/s42834-019-0032-5
NASA JPL. (2020). NASADEM merged DEM global 1 arc second V001.
Niang, M. A., Nolin, M. C., Jégo, G., & Perron, I. (2014). Digital mapping of soil texture using
RADARSAT-2 polarimetric synthetic aperture radar data. Soil Science Society of America
Journal, 78(2), 673–684. https://doi.org/10.2136/sssaj2013.07.0307
O’Brien, J. (2020). gdalUtilities: Wrappers for “GDAL”. Utilities Executables.
Padarian, J., Minasny, B., & McBratney, A. (2019). Machine learning and soil sciences: A review
aided by machine learning tools (pp. 1–29). https://doi.org/10.5194/soil-2019-57
Palm, C., Sanchez, P., Ahamed, S., & Awiti, A. (2007). Soils: A contemporary perspec-
tive. Annual Review of Environment and Resources, 32, 99–129. https://doi.org/10.1146/
annurev.energy.31.020105.100307
Pinheiro, H. S. K., Carvalho Junior, W. D., Chagas, C. D. S., Anjos, L. H. C. D., & Owens, P. R.
(2018). Prediction of topsoil texture through regression trees and multiple linear regressions.
Revista Brasileira de Ciência do Solo, 42. https://doi.org/10.1590/18069657rbcs20170167
Polidoro, J. C., Mendonça-Santos, M. L., Lumbreras, J. F., Coelho, M. R., Carvalho Filho, A.,
Motta, P. E. F., Carvalho Junior, W., Araújo Filho, J. C., Curcio, G. R., Correia, J. R., Martins,
Applying Machine Learning Techniques to Model and Map Soil Surface. . . 45

E. S., Spera, S. T., Oliveira, S. R. M., Bolfe, E. L., Manzatto, C. V., Tosto, S. G., Venturieri,
A., Sa, I. B., Oliveira, V. A., Shinzato, E., Anjos, L. H. C., Valladares, G. S., Ribeiro, J. L.,
Medeiros, P. S. C., Moreira, F. M. S., Silva, L. S. L., Sequinatto, L., Aglio, M. L. D., & Dart,
R. O. (2016). Programa Nacional de Solos do Brasil (PronaSolos). Embrapa Solos.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for
Statistical Computing.
Rahman, M. M., Zhang, X., Ahmed, I., Iqbal, Z., Zeraatpisheh, M., Kanzaki, M., & Xu, M. (2020).
Remote sensing-based mapping of senescent leaf C:N ratio in the sundarbans reserved 49
forest using machine learning techniques. Remote Sensing, 12, 1375. https://doi.org/10.3390/
RS12091375
Ribeiro, M. R., de Oliveira, L. B., & de Araújo Filho, J. C. (2012). Caracterização morfológica do
solo. In Pedologia: Fundamentos (p. 343). SBCS.
Riza, S., Sekine, M., Kanno, A., Yamamoto, K., Imai, T., & Higuchi, T. (2021). Modeling soil
landscapes and soil textures using hyperscale terrain attributes. Geoderma, 402, 115177. https:/
/doi.org/10.1016/j.geoderma.2021.115177
Román Dobarco, M., Arrouays, D., Lagacherie, P., Ciampalini, R., & Saby, N. P. A. (2017).
Prediction of topsoil texture for Region Centre (France) applying model ensemble methods.
Geoderma, 298, 67–77. https://doi.org/10.1016/j.geoderma.2017.03.015
Samuel-Rosa, A., Heuvelink, G. B. M., Vasques, G. M., & Anjos, L. H. C. (2015). Do more detailed
environmental covariates deliver more accurate soil maps? Geoderma, 243, 214–227. https://
doi.org/10.1016/j.geoderma.2014.12.017
Sayão, V. M., & Demattê, J. A. M. (2018). Soil texture and organic carbon mapping using surface
temperature and reflectance spectra in Southeast Brazil. Geoderma Regional, 14, e00174.
https://doi.org/10.1016/j.geodrs.2018.e00174
Shahriari, M., Delbari, M., Afrasiab, P., & Pahlavan-Rad, M. R. (2019). Predicting regional spatial
distribution of soil texture in floodplains using remote sensing data: A case of southeastern Iran.
Catena, 182, 104–149. https://doi.org/10.1016/j.catena.2019.104149
UFV et al. (2010). Mapa de solos do Estado de Minas Gerais. Universidade Federal de Viçosa;
Fundação Centro Tecnológico de Minas Gerais; Universidade Federal de Lavras; Fundação
Estadual do Meio Ambiente.
Vereecken, H., Diels, J., Van Orshoven, J., Feyen, J., & Bouma, J. (1992). Functional evaluation
of pedotransfer functions for the estimation of soil hydraulic properties. Soil Science Society of
America Journal, 56, 1371–1378. https://doi.org/10.2136/sssaj1992.03615995005600050007x
Wadoux, A. M. J. C., Samuel-Rosa, A., Poggio, L., & Mulder, V. L. (2020). A note on knowledge
discovery and machine learning in digital soil mapping. European Journal of Soil Science, 71,
133–136. https://doi.org/10.1111/ejss.12909
Zeraatpisheh, M., Ayoubi, S., Jafari, A., Tajik, S., & Finke, P. (2019). Digital mapping of soil
properties using multiple machine learning in a semi-arid region, central Iran. Geoderma, 338,
445–452. https://doi.org/10.1016/j.geoderma.2018.09.006
Zhu, A. X., Hudson, B., Burt, J., Lubich, K., & Simonson, D. (2001). Soil mapping using GIS,
expert knowledge, and fuzzy logic. Soil Science Society of America Journal, 65, 1463–1472.
https://doi.org/10.2136/sssaj2001.6551463x
Predicting Soil Physical-Hydric
Attributes Based on Pedotransfer
Functions and Algorithms for
Quantitative Pedology

Priscilla Azevedo dos Santos , Helena Saraiva Koenow Pinheiro ,


Waldir de Carvalho Junior , Nilson Rendeiro Pereira ,
Silvio Barge Bhering , and Igor Leite da Silva

1 Introduction

Constant changes in land use and cover dynamics often emphasize the need of
better understanding Earth’s surface phenomena, based on spatial data analysis and
continuous monitoring procedures that are enabled by the application of digital
mapping and modeling using environmental covariates and field data (Santos,
2020). In-depth knowledge about the environmental input variables introduced in
predictive models makes the produced information more reliable and robust, as
well as guarantees that cartographic products are representative of investigated site’s
reality (Kraemer, 2007).
García-Sinovas et al. (2001) investigated water dynamics in a hydrographic basin
and highlighted the basic infiltration rate and soil hydraulic conductivity as the

P. A. dos Santos (✉)


Modeling and Geological Evolution Graduate Program, Geoscience Institute (Petrology and
Geotectonics Department), Federal Rural University of Rio de Janeiro, Seropédica, Rio de
Janeiro, Brazil
e-mail: [email protected]
H. S. K. Pinheiro
Agronomy Institute (Soil Department), Federal Rural University of Rio de Janeiro, Seropédica,
Rio de Janeiro, Brazil
e-mail: [email protected]
W. de Carvalho Junior · N. R. Pereira · S. B. Bhering
Embrapa Soils (National Center for Soil Research), Jardim Botânico, Rio de Janeiro, Brazil
e-mail: [email protected]; [email protected]; [email protected]
I. L. da Silva
Applied Statistics Graduate Program, Pitágoras Institute (Mathematics Department), Federal
Rural University of Rio de Janeiro, Seropédica, Rio de Janeiro, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 47


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_4
48 P. A. dos Santos et al.

most relevant soil attributes in the undestanding soil infiltration processes and water
flow movement in soils. These parameters are required to predict water flow in
soils, presenting high spatial variability and dependencie on soil structure, which
properties condition the bir and Ksat variables. Thus, determining these parameters
is essential to enable hydropedological studies in hydrographic basins.
Brazil’s available soils databases has few information about soil-water parame-
ters, such as basic infiltration rate (bir) and saturated hydraulic conductivity (Ksat),
due to non-systematic performance of soil infiltration tests during pedometric
surveys and the difficulty in measuring these parameters in the deepest soil layers
(Santos, 2021).
The aim of the present study was to model soils physical-hydric attributes,
such as basic infiltration rate (bir) of water in soils and saturated hydraulic
conductivity (Ksat), in Guapi-Macacu river basin, Rio de Janeiro State, through
the implementation of pedotransfer functions. As a way to achieve this objective,
multiple linear regression models and tree-based machine learning algorithms are
implemented to predict these attributes in the basin’s superficial and subsuperficial
soils layers. Furthermore, depth functions based on algorithms for quantitative
pedology (AQP) are also applied concerning to represent complex information
inherent in soil profiles.

2 Materials and Methods

2.1 Study Site and Hydrological Featuring

Guapi-Macacu River basin is located in Rio de Janeiro metropolitan region; it covers


the area known as Guanabara Bay, which is defined by the State Environmental
Institute—INEA as part of Hydrographic Region V (HR-V) (Hwa et al., 2010). The
basin is located in Southeastern Rio de Janeiro State and interconnects Guapimirim,
Cachoeiras de Macacu and Itaboraí counties and Guapiaçu village. The basin has a
catchment area of approximately 1250 km.2 enclosed by a 199.2 km perimeter.
According to Pinheiro (2015) and Santos (2020), the hydrological featuring of
the basin has shown implications of site geometry and morphometry on basic infil-
tration rate (bir) and on saturated hydraulic conductivity (Ksat), mainly associated
with the incidence (or not) of soil hydromorphism.

2.2 Soil Database and Infiltration Tests

Pedological database comprised physical and chemical data about Guapimirim-


Macacu River basin, which were collected during soil survey carried out by
Embrapa Solos (Chagas et al., 2015). Physical-hydric measurements (bir and Ksat)
Predicting Soil Physical-Hydric Attributes Based on . . . 49

Fig. 1 Map depicting the distribution of bir and Ksat sampling points in the basin (Santos, 2021)

were associated with profiles described by Chagas et al. (2015), complementing


the region’s pedological database. In total, 36 soil profile points were subjected to
descriptive, morphological, physical-chemical analysis, as well as to soil infiltration
tests. The distribution of the analyzed soil profile points is shown in Fig. 1.
Bir hydropedological data were collected with the aid of Guelph permeameter
(Moisture, 2012), measured at 1-to-5-minute intervals depending on the flow of the
reservoir where the measurement was monitored. Measured bir values were inserted
in the transformation spreadsheet developed by the permeameter manufacturer in
order to calculate Ksat, based on Richard’s equation (Richards, 1931). The Water
column, the bottom holes radius drilled in soils (different for surface and subsurface
measurements), the analyzed soils texture and measured bir values (cm.min.−1 ) were
the required parameters to calculate Ksat values.
Two bir measurements per soil layer (surface layer: 0–20 cm and subsurface
layer: 20–40 cm) were performed in each sampling point, in order to obtain
representative bir values associated with the sampled soils layers, as well as enabling
a reliable physical-hydric attributes spatial variability modeling throughout soils
profiles. The surface and subsurface soil layers average depth value were considered
for soil profiles characterization and ksat calculation purposes.
50 P. A. dos Santos et al.

2.3 Modeling Based on Pedotransfer Functions for Soil


Physical-Hydraulic Attributes Determination

Pedotransfer Functions (PTFs) are used to overcome lacks of continuous informa-


tion and measurements of attributes along soil profiles. PTFs rationale is based
on the calibration of mathematical equations and statistical methods capable of
estimating physical-hydric properties along soil profiles (estimate of water flow
vertical transfer), which properties lack information in pedological surveys (Wadoux
et al., 2020). It is important highlighting data harmonization as pre-processing step
in PTF application, which can be performed based on Algorithms for Quantitative
Pedology (AQP) (Beaudette et al., 2013; Junior et al., 2015; Pinheiro et al., 2016,
2018; Xavier et al., 2019).
According to Wösten et al. (2001), parametric models based on Multiple Linear
Regression (MLR) are the simplest and the most often used models in PTFs. Among
the several models already implemented by the scientific community, Tranter et al.
(2007) have mentioned tree-based models (Regression Trees and Random Forest)
as non-parametric metrics of comprehensive and positive nature used to predict
parameters such as soil bulk density and saturated hydraulic conductivity; these
metrics make significant contributions to soil mappings conducted in depth.
The accuracy and precision of models based on regression and machine learning
are often evaluated through three basic metrics, namely: RMSE, MAE and R.2 . The
Root Mean Squared Error (RMSE) (Eq. (1)), as its name suggests, is the square
root of the mean square difference between the real (observed value) and estimated
(predicted value) values of the investigated variable (Morettin & Bussab, 2017). The
association of this information with cross-validation to calculate the index enables
objectively assessing the models and selecting the best model to be used in PTF.

| n
|1 ⎲( )2
.RMSE = ⏌ yi − ŷi (1)
n
i=1

Wherein: .ŷi is the predicted or estimated value recorded for the investigated
variable; .yi is the real or observed value recorded for the investigated variable; n is
the total number of observations in the sample; .yi − ^yi = εi is the model’s residual.
Mean Absolute Error (MAE) (Eq. (2)) is a metric that provides the mean
absolute difference between the real (observed) and predicted (or estimated) values
calculated through the model for the investigated variable (Morettin & Bussab,
2017). MAE can be used to assess model’s performance in case there are no outliers
or if their values do not directly influence the accuracy of the model. The lower the
MAE values, the higher the accuracy of the adjusted model.

1 ⎲ || |
n
.MAE = yi − ŷi | (2)
n
i=1
Predicting Soil Physical-Hydric Attributes Based on . . . 51

The metric value recorded for the coefficient of determination (R.2 ) provides
an idea about the extent to which the variability rate of the dependent variable is
explained through independent variables. In other words, it reflects the strength of
the association between the predicted variable and the model, in a scale ranging
from 0 to 1, based on the analysis of covariance of predictors (Morettin & Bussab,
2017). Therefore, the model presenting the best fit is desirable to have a high R.2
value (higher than 70%). The quality of regression model fit is measured through
R.2 statistics by comparing the Regression Sum of Squares (RSS), which represents
the variability of the response variable (.ŷi ) explained by regression line, to the
Total Sum of Square (TSS), which represents the variability of observed values (.yi )
around its mean. This process leads to the expression (Eq. (3)):
∑n || |
|
RSS i=1 ŷi − ȳi
.R =
2
= ∑n (3)
T SS i=1 |yi − ȳi |

Regression becomes perfect when observed and estimated values are equal, in
other words, when the Sum of Squared Errors (SSE) of the model is equal to
zero. Therefore, R.2 approaches value 1 (one) (Morettin & Bussab, 2017). Non-
significance of the model is observed when values estimated for response variables
are equal to the mean values observed for them; thus, RSS is equivalent to zero and
the coefficient of determination of the model approaches value 0 (zero) (Morettin &
Bussab, 2017). The coefficient of determination provides a quality metrics to adjust
linear regression models (single or multiple) and it is currently adopted to evaluate
the quality of quantitative classifiers used in the machine learning method.

2.4 Methodology

Descriptive statistical analysis was initially applied to collected data to identify


soil attributes capable of acting as potential input variables (potential predictors)
in pedotransfer models, as well as to identify the input variables presenting
intrinsic association with physical-hydric soil attributes (Ksat and bir) through
hydromorphism-based separation.
Database comprised independent variables derived from Chagas et al. (2015)
and the following physical-chemical soil attributes and information: granulometric
composition (fractions of earth, pebbles, gravel), texture (sand, silt, clay), clay
dispersed in water, porosity, particle density, soil density, organic carbon, soil color
(matrix, chroma and value), T value, soil pH in water, profile identification (ID),
geolocation (UTM1 X, Y and Z coordinates) and soil classification based on the
Brazilian Soil Classification System (SiBCS) (dos Santos et al., 2018).

1 UTM = Universal Transverse Mercator Coordinate System.


52 P. A. dos Santos et al.

According to Chagas (2006) and Pinheiro (2012), texture (granulometric com-


position), dispersed clay, porosity and organic matter amount (organic carbon) in
soils are important attributes used in hydrological studies, since they are directly
associated with water infiltration and flow in the soil. Other attributes, such as
soil density, pH, and T value, were analyzed as potentially important predictors
in modeling processes. The aforementioned attributes were subjected to prior
exploratory analysis in order to assess their behavior towards physical-hydric
variables. Variables presenting Pearson’s correlation coefficient (R) higher than 0.5
were maintained in the model.
The predictors selected to compose the final database, which were theoretically
associate water variables, such as basic water infiltration rate in the soil (bir) and
saturated hydraulic conductivity (Ksat), comprised 10 (ten) soil attributes, namely:
sand, silt, clay, clay dispersed in water, T value, hydrogen potential in water (pH),
soil density (sd), particle density (pd), porosity, and organic carbon.
After the exploratory analysis was complete, AQP tools were implemented for
data harmonization, slicing and homogenization, based on Beaudette et al. (2013)
metrics. The following soil-depth functions were applied for this purposes: (a)
Spline function - for data standardization; (b) Slice-Wise technique - for slicing
and harmonizing soil horizons (intervals every 1 cm); and (c) slab - wise function -
for aggregating data based on specified depth intervals (according to two intervals
of 20 cm depth) (Fig. 2). The harmonization process resulted in four tables used to
model pedotransfer functions (Fig. 3).

Fig. 2 Research methodology flowchart (Santos, 2021)


Predicting Soil Physical-Hydric Attributes Based on . . . 53

Fig. 3 Procedures flow adopted to estimate pedotransfer functions for the dataset (Santos, 2021)

Database was separated according to the hydrological featuring of the basin,


based on its hydromorphism condition. Thus, database was separated into hydro-
morphic (12 samples) and non-hydromorphic (24 samples) soils, as well as based
on soil layer. Next, Recursive Feature Elimination (RFE) (Guyon et al., 2002) was
applied to select the attributes that maintained the most relevant association between
soil parameters and physical-hydric attributes.
Pedotransfer functions were implemented and calibrated based on regression
(Multiple Linear Regression - MLR) and tree models (Regression Trees - RT and
Random Forest - RF), by using bir and Ksat data separated based on soil layer
(0–20 cm and 20–40 cm) and hydromorphism. The implemented models were used
to estimate target attributes values for the horizons of the remaining 83 points of
the total soil database of the basin, considering for this research purpose two soil
layers (0–20 cm and 20–40 cm). Predicted models were evaluated through cross-
validation, based on statistical metrics of quality and accuracy (MAE, RMSE and
R.2 ).
Validation results were used to select the final models to estimate the variability
of physical-hydric soil attributes. The adopted methodological process is presented
in Fig. 2. All procedures adopted for the processing, statistical analysis and elabora-
tion of pedotransfer functions were performed in R and RStudio softwares (Team,
2020b,a).
Based on database features associated with the hydromorphic nature of the
evaluated soils, and on the analysis applied to different soil layers, the models
were trained in separately, thus generating 4 (four) functions for each of the three
proposed models (MLR, RT and RF), as shown in detail in Fig. 3.
54 P. A. dos Santos et al.

3 Results and Discussions

Residuals of MLR models were tested at 5% significance level (.α = 5% or 0.05),


and it led to results that rejected the null hypothesis (.H0 ). This finding pointed out
that these models cannot be applied to model physical-hydric data, based on the
formulated hypothesis (Table 1). The Shapiro-Wilk (SW) test (Shapiro & Wilk,
1965) has shown results lower than the significance level (.α = 0.05) adopted to
analyze the MLR models, for both Ksat and bir; this outcome has indicated that the
analyzed data did not follow normal probability distribution. The Breusch-Pagan
(BP) test (Breusch, 1978; Breusch & Pagan, 1979) has shown heteroscedasticity
of variances in the modeled data set, which led to the inefficiency and bias of
the adopted model estimators, since the tested models presented values lower than
.α = 0.05, and it accounted for the rejection of the null hypothesis (.H0 ). Based

on the Durbin-Watson (DW) test (Durbin & Watson, 1951), values recorded for
MLR models applied in Ksat and bir modeling have evidenced the models residuals
autocorrelation (.H0 rejection at .α = 0.05), and it led to the omission of explanatory
variables and to deficient mathematical model specification.
Test results have indicated that the analyzed data should not be recommended
for PTF development in order to model physical-hydric soil attributes such as bir
and Ksat based on MLR principles, since the theoretical requirements were not met.
According to Izbicki and Santos (2018), these criteria use is not necessary at the
time to calibrate tree-based functions (Regression Trees and Random Forest).
The best data transformation was performed by using the “best Normalize”
package in order to circumvent this problem (Peterson & Cavanaugh, 2019).
Results have shown normality of model residuals for transformed data, although
residuals’ heteroscedasticity and dependence results remained unchanged, based on
tests shown in Table 1. Another alternative would lie on removing likely outlier
candidates from the model; however, this solution was ruled out due to the nature
of the analyzed data (high spatial variability). Results obtained in these tests have
indicated - for reasons intrinsic to the phenomenon based on which data were
described (water movement in the soil) - that the variability of physical-hydraulic
soil attributes could not be explained through multiple regression models due to data
heterogeneity, although statistical transformations were applied to them.
Functions estimated for Ksat, based on the Random Forest model, have reached
RMSE stabilization (minimization) from the approximate number of 250 trees. On
the other hand, bir functions presented RMSE stabilization, starting at 150 trees,
in all evaluated horizons. Models adopted to estimate Ksat have used, on average,
6 input variables selected through RFE, for modeling purposes. Porosity, silt and
particle density were the most important attributes in the variables rank, respectively.
RFE has selected 5 variables, on average, in the PTFs used to estimate bir based
on the Random Forest model. Bir’s importance ranking shown silt, porosity, and
particle density as the most explanatory variables of the models.
According to the analysis of PTFs based on regression trees, the model used
for Ksat prediction (Fig. 4 - A, B, C, D) presented different node progression
Predicting Soil Physical-Hydric Attributes Based on . . . 55

Table 1 Regressive model adequacy analysis (Santos, 2021)


Test Hyphotesesa
ANOVA (Fisher-Snedecor) .H0 : The adjusted model has no significance in the estimated
regressors (.β1 = β2 = · · · = βn = 0).
.H1 : The adjusted model has significance in its estimated
regressors (in at least one of them) (.β1 /= β2 /= · · · /= βn /= 0).
Shapiro-Wilk .H0 : Residuals are normally distributed.

.H1 : Residuals are not normally distributed.

Breusch-Pagan .H0 : equal variances (homoscedasticity).

.H1 : different variances (heteroscedasticity).


Durbin-Watson .H0 : autocorrelation between residuals is equal to zero
(independence of residuals).
.H1 : autocorrelation between residuals is different from zero
(dependence of residuals).
Interpretation: If p-value .< α b at confidence level .β c , the null hypothesis (.H0 ) is rejected;
consequently, the alternative hypothesis (.H1 ) is accepted. If p-value .> α, the null hypothesis
(.H0 ) is accepted.
a.H and .H are the null and alternative hypotheses of the tests, respectively
0 1
b.α is the significance level adopted in the tests
c .β = 1 − α is the confidence level for the respective .α adopted in the tests; .β1 , .β2 , . . . , .βn are
the slope parameters of the adjusted regression model line (.Y = β0 + β1 ∗ X1 + β2 ∗ X2 + · · · +
.βn *.Xn )
Source: Snedecor (1934), Durbin and Watson (1951), Shapiro and Wilk (1965), Breusch (1978),
and Breusch and Pagan (1979)

and subdivision for each estimated layer type, reaching 1 knot in Z = 0–20 cm
(hydromorphic) and ranging from 2 to 4 knots in the other layers. Hydromorphic
layer Z = 0–20 cm stood out for showing fast convergence in the associated PTF
estimate; tree structuring was solely based on porosity (64% of data were separated
at porosity .< 50 cm3 /100 cm3 , whereas 36% of them were separated at porosity .>
50 cm3 /100 cm3 ) due to hyperparameters associated with the model; this outcome
has indicated low adjustment in model estimates.
Concern to bir analysis (Fig. 4 - E, F, G, H), functions have mostly converged
between 2 and 4 knots, starting from different attributes. PTFs were applied to
soil profile data in order to predict bir and Ksat values for 86 sampling points
distributed over the basin’s site (legacy survey dataset), which resulted in a total
of 394 horizons with their respective associated layers. The predicted data set
was gathered in a validation table in order to check the models quality used to
estimate research variables. Values were predicted based on pedotransfer calibration
functions estimated for each soil layer type (0–20 cm and 20–40 cm), hydrological
condition (hydromorphism and non-hydromorphism) and associated attribute (bir
and Ksat) (Fig. 3).
The model validation stage was based on Leave-One-Out Cross-Validation -
LOOCV (Webb et al., 2011). All three models were evaluated based on LOOCV
56 P. A. dos Santos et al.

Fig. 4 Regression Trees and iterations in pedotransfer functions models calibration, wherein: A,
B, C, D. saturated hydraulic conductivity (Ksat), and E, F, G, H. soil basic infiltration rate (bir); by
hydromorphic (Z = 0–20 cm) and non-hydromorphic (Z = 20–40 cm) soil layers (Santos, 2021)

by using data predicted for each adjusted pedotransfer function. Statistical quality
parameters of the models are shown in Tables 2 and 3.
Based on the analysis applied to metric parameters resulting from PTFs estimated
for each particularity of the soil (hydromorphism and layer depth) (Table 2), the
Random Forest model achieved the best model quality performance in validation
(R.2 .> 0.80 or 80%) of PTFs estimated for both bir and Ksat. The regression tree
model has shown median performance (R.2 .> 0.50 or 50%) in estimating Ksat for
non-hydromorphic surface (R.2 = 0.6344) and subsurface (R.2 = 0.5511) soil layers;
both trees were primarily determined based on soil particle density (Pd). On the
other hand, only the hydromorphic subsurface soil layer (R.2 = 0.5610) has shown
such a performance for bir, whither porosity was its main attribute.
Predicting Soil Physical-Hydric Attributes Based on . . . 57

Table 2 Statistical quality parameters of trained models after cross-validation (Santos, 2021)
Statistical metrics
Variables Parametersa Model MAEb RMSEb R.2
Ksat Z = 0–20 cm; H MLR 0.0037 0.0052 0.6727
RT 0.0044 0.0080 0.2394
RF 0.0004 0.0006 0.9951
Z = 20–40 cm; H MLR 0.0028 0.0045 0.6512
RT 0.0033 0.0065 0.2722
RF 0.0015 0.0033 0.8139
Z = 0–20 cm; NH MLR 0.0096 0.0131 0.3749
RT 0.0069 0.0100 0.6344
RF 0.0019 0.0040 0.9416
Z = 20–40 cm; NH MLR 0.0068 0.0102 0.4169
RT 0.0048 0.0090 0.5511
RF 0.0020 0.0036 0.9260
bir Z = 0–20 cm; H MLR 0.2685 0.3423 0.4591
RT 0.2692 0.3924 0.2894
RF 0.0393 0.0668 0.9794
Z = 20–40 cm; H MLR 0.1171 0.2045 0.7624
RT 0.1379 0.2780 0.5610
RF 0.0818 0.1539 0.8655
Z = 0–20 cm; NH MLR 0.6007 0.7490 0.4469
RT 0.5292 0.7852 0.3921
RF 0.0796 0.1586 0.9752
Z = 20–40 cm; NH MLR 0.4703 0.5762 0.5677
RT 0.3948 0.6746 0.4074
RF 0.1521 0.3115 0.8737
a Z soil layer; H hydromorphic soils, NH non-hydromorphic soils
b unit= cm.min.−1 . PTF Pedotransfer function, MLR multiple linear regression, RT regression
trees, RF random forest, MAE mean absolute error, RMSE root mean squared error, .R 2 coefficient
of determination

The other PTFs estimated through the regression tree model (Table 2) have
shown low performance (R.2 .< 0.50% or 50%), based on their respective layers and
hydrological conditions. This outcome showed their inability to generalize physical-
hydric soil attributes through the implemented models, which showed that less than
50% of variability observed for Ksat and bir attributes can be explained by input
variables of the analyzed models (soil attributes).
Regarding MLR, PTFs estimated for Ksat in hydromorphic surface and sub-
surface soil layers achieved performances higher than 60%, which was measured
through R.2 . This outcome has indicated 60% of Ksat explainability by the evaluated
soil attributes (Table 2). Based on bir assessment (Table 2), only the PTF estimated
for the hydromorphic subsurface soil layer (R.2 = 0.7624) has shown quality ranging
from medium to high (0.70 .< R.2 .< 0.80), and also the smallest model validation
58 P. A. dos Santos et al.

Table 3 Total data statistical parameters obtained in cross-validation (Santos, 2021)


Statistical metricsa
Variables Models MAEb RMSEb R.2
ksat MLR 0.0066 0.0101 0.4543
RT 0.0055 0.0091 0.5578
RF 0.0015 0.0033 0.9409
bir MLR 0.4218 0.5750 0.5223
RT 0.3796 0.6281 0.5300
RF 0.0902 0.1922 0.9466
aPredicted vs. estimated
bunit= cm.min.−1 . MLR multiple linear regression, RT regression trees, RF random forest, MAE
mean absolute error, RMSE root mean squared error, .R 2 coefficient of determination

errors (MAE = 0.1171 and RMSE = 0.2045). The bir value estimated for the non-
hydromorphic subsurface layer presented median performance (R.2 = 0.5677 .>
0.50), although it recorded high PTF estimation error values (MAE = 0.4703 and
RMSE = 0.5762).
Overall, PTFs estimated for Ksat have shown lower validation errors (Table 2),
as evidenced by MAE and RMSE metrics, which have indicated that the predicted
values were precise, although not accurate, in comparison to field observations.
In relation to bir, MAE and RMSE metrics have shown low adjustment of PTFs
estimated through all three methods (Table 2). The MLR model applied to the non-
hydromorphic surface soil layer presented raised values in PTF prediction errors
(MAE = 0.6007 and RMSE = 0.7490).
Overall, Random Forest was the model presenting the most explanatory results
for PTF modeling based on the investigated variables, in comparison to the other
models (Table 3). It recorded high values for statistical prediction indicators (R.2 .>
0.80 or 80%) and reduced values for errors in residuals (MAE .< 0.01 or 1% and
RMSE .< 0.02 or 2%).
Statistical validation indicators (Table 3) have emphasized the RF the best
performance model in calibrating PTFs, for predicted variables and respective
layers, in comparison to MLR and RT (observed versus predicted values). Low R.2
adjustment in RLM models can be explained by non-compliance with theoretical
assumptions, and it reinforces the need of data evaluation prior to modeling. Based
on the comparison of results recorded for all three models (Table 3), MLR models
used in association with machine learning (tree-based models, in this case) presented
improved performance since tree-based prediction models use a computational
structure capable of storing a regression model in each tree leaf and provides a
data set partition to induce the final model to reach the best quality estimation of
physical-hydric attributes by using the preliminar input variables (soil attributes).
Predicting Soil Physical-Hydric Attributes Based on . . . 59

4 Conclusions and Suggestions

AQP use enabled a soil attribute variabilities depth analysis, as well as allowed
correlating soil features to the investigated physical-hydric soil attributes. Further-
more, AQP tools provided significant information that supported the decision-make
process, mainly in selecting the models input variables and input data harmoniza-
tion. AQP used in association with machine learning models, mainly with Random
Forest, has proved to be a potential tool for preliminary studies about pedotransfer
functions and its use in research aimed at soil hydric functions is recommended.
Tree-based models performed better in validating predicted data compared to
multiple regression model. The results indicated that, although the hydrographic
basin observed attributes shown adequate to linear behave statistical assumptions,
regressive models were uncapable of estimating accurate values for bir and Ksat
due to its high variability in soil profiles. In addition, machine learning models
demonstrated better interaction and performance with this attributes due to their
adjustment algorithms, ability to make assertive predictions, memorize data and
reproduce patterns in a simultaneous, iterative manner, as well as presenting the
most optimized response possible.
It is worth emphasizing that although PTFs are capable of vertically predicting
variation in soil attributes (in-depth analysis), such as granulometric composition,
physical-hydric parameters may not follow this logic due to their highly dynamic
nature.
Also, the incidence of flooded soils and floods can influence values measured for
the investigated variables and generate incorrect information during data surveying.
Although the research team was careful enough to check the climate and weather in
the region during field campaigns, it is suggested that surveys carried out to collect
physical-hydraulic data should be parameterized by season, soil conditions, depth
and other environmental factors that considered relevant in soil hydrology studies.
Results have evidenced the relevance of associating physical-hydric variables
with other soil properties. It is worth conducting studies focused on interrelating the
aforementioned variables to properties such as porosity, particle density, soil density
and clay dispersed in water; in association with predictive modeling based on soil
moisture and retention curves for more detailed and in-depth analyses.

Acknowledgments The authors acknowledge the Coordenação de Aperfeiçoamento de Pessoal de


Nível Superior (CAPES), Brazil—by the 001-finance code, for providing the Master’s scholarship
resource to this chapter main author, allowing this research to be prepared. The authors also like to
thank the Federal University of Rio de Janeiro—UFRRJ, for guaranteeing a public, free and quality
education; and the PPGMEG graduate Program and PETROBRAS (Petróleo Brasileiro S.A.) for
the financial support of the project. Furthermore, the authors are grateful to Embrapa Solos CNPS
center for the partnership maintained with UFRRJ Soils Department and for providing equipment
and support transport for carrying out field surveys.

Authors Contributions Conceptualization: Priscilla Azevedo dos Santos, Helena Saraiva


Koenow Pinheiro, Waldir de Carvalho Júnior, and Igor Leite da Silva; Methodology: Priscilla
60 P. A. dos Santos et al.

Azevedo dos Santos, Helena Saraiva Koenow Pinheiro, Waldir de Carvalho Júnior, Igor Leite da
Silva, Nilson Rendeiro Pereira, Silvio Barge Bhering; Software and processing: Priscilla Azevedo
dos Santos, and Igor Leite da Silva; Validation: Priscilla Azevedo dos Santos, Helena Saraiva
Koenow Pinheiro, Igor Leite da Silva, and Waldir de Carvalho Júnior; Formal Analysis: Priscilla
Azevedo dos Santos, and Igor Leite da Silva; Investigation: Priscilla Azevedo dos Santos, Helena
Saraiva Koenow Pinheiro, and Waldir de Carvalho Júnior; Resources: Helena Saraiva Koenow
Pinheiro, and Waldir de Carvalho Júnior; Data curation: Priscilla Azevedo dos Santos, Helena
Saraiva Koenow Pinheiro, Waldir de Carvalho Júnior, Nilson Rendeiro Pereira, and Silvio Barge
Bhering; Writing – original draft: Priscilla Azevedo dos Santos; Writing – review and editing:
Priscilla Azevedo dos Santos, Helena Saraiva Koenow Pinheiro, Waldir de Carvalho Júnior, Silvio
Barge Bhering, and Igor Leite da Silva; Visualization: Priscilla Azevedo dos Santos, Helena
Saraiva Koenow Pinheiro, and Igor Leite da Silva; Supervision: Helena Saraiva Koenow Pinheiro,
and Waldir de Carvalho Júnior; Project management: Helena Saraiva Koenow Pinheiro, Waldir de
Carvalho Júnior, Nilson Rendeiro Pereira, and Priscilla Azevedo dos Santos; Funding acquisition:
Helena Saraiva Koenow Pinheiro, and Waldir de Carvalho Júnior.

Resources This research relied on an amount of external funding provided by the PPGMEG
Graduate Program in partnership with National Petroleum Agency (ANP) and Petrobras, through
the “Alkaline and tholeiitic mafic magmatism of the Cretaceous and Paleogene in the State of Rio
de Janeiro in the continental area adjacent to the Santos and Campos basins” project (2017/00353-1
proposal code), as support to Santos (2021) dissertation project and this article elaboration.

Conflicts Of Interest The authors declare that there are no personal or financial conflicts of
interest that could influence the development and consolidation of this work.

References

Beaudette, D., Roudier, P., & O’Geen, A. (2013). Algorithms for quantitative pedology: A toolkit
for soil scientists. Computers and Geosciences, 52, 258–268. https://doi.org/10.1016/j.cageo.
2012.10.020.
Breusch, T. S. (1978). Testing for autocorrelation in dynamic linear models. Australian Economic
Papers, 17(31), 334–355. https://doi.org/10.1111/j.1467-8454.1978.tb00635.x.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient
variation. Econometrica, 47(5), 1287. https://doi.org/10.2307/1911963.
Chagas, C. D. S. (2006). Mapeamento digital de solos por correlação ambiental e redes neurais em
uma bacia hidrográfica no Domínio de mar de morros. Ph.D. Tese (Doutorado), Universidade
Federal de Viçosa, UFV, Viçosa, Minas Gerais, Brasil. https://locus.ufv.br//handle/123456789/
1672, Accessed Dec 01, 2021.
Chagas, C. D. S., Carvalho, Jr., W. D., Pereira, N. R., Bhering, S. B., Calderano, F. B., Fonseca, O.
O. M. D., Pinheiro, H. S. K., Muselli, A., & Jeune, W. (2015). Levantamento de reconhecimento
de alta intensidade dos solos das bacias hidrográficas dos rios Guapi-Macacu e Caceribu.
Boletim de Pesquisa e Desenvolvimento Empresa Brasileira de Pesquisa Agropecuária,
Embrapa Solos, Ministério da Agricultura, Pecuária e Abastecimento 257(Boletim de
pesquisa e desenvolvimento, Dados Internacionais de Catalogação na Publicação (CIP),
Embrapa Solos), 151. https://www.embrapa.br/busca-de-publicacoes/-/publicacao/1039563/
levantamento-de-reconhecimento-de-alta-intensidade-dos-solos-das-bacias-hidrograficas-
dos-rios-guapi-macacu-e-caceribu, Accessed Dec 01, 2021.
Durbin, J., & Watson, G. S. (1951). Testing for serial correlation in least squares regression. II.
Biometrika, 38(1–2), 159–178. https://doi.org/10.1093/biomet/38.1-2.159.
Predicting Soil Physical-Hydric Attributes Based on . . . 61

García-Sinovas, D, Regalado, C, Muñoz-Carpena, R., & Álvarez Benedí, J. (2001). Comparación


de los permeámetros de guelph y philip-dunne para la estimación de la conductividad hidráulica
saturada del suelo (pp. 31–36). https://abe.ufl.edu/faculty/carpena/files/pdf/zona_no_saturada/
temas_de_investigacion_v5/9.pdf, Accessed Dec 01, 2021.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification
using support vector machines. Machine Learning, 46(1/3), 389–422. https://doi.org/10.1023/
A:1012487302797.
Hwa, C. S., Hora, M. A. G., & Hora, A. F. (2010). Projeto Macacu. Planejamento estratégico da
região hidrográfica dos rios Guapi-Macacu e Caceribu-Macacu. Região Hidrográfica Baía de
Guanabara. http://www.projetomacacu.uff.br/, Accessed Dec 01, 2021.
Izbicki, R., & Santos, T. M. (2018). Machine learning sob a ótica estatística: Uma abordagem
preditivista para estatística com exemplos em R. Documento de Nota Técnica, Insper Instituto
de Ensino e Pesquisa, Departamento de Estatística. Universidade Federal de Santa Cata-
rina, UFSCar, 1(1), 212. http://est.ufmg.br/marcosop/est171-ML/MachineLearning_Izbicki.
pdf, Accessed Dec 01, 2021.
Junior, W. D. C., Pereira, N. R., Chagas, C. D. S., Bhering, S. B., & Calderano, F. B. (2015).
Pedologia Quantitativa: O Perfil mediano e o Perfil Médio. In Congresso Brasileiro de Ciência
do Solo 35. O solo e suas múltiplas funções: anais (pp. 1–4). Natal: Sociedade Brasileira
de Ciência do Solo. https://ainfo.cnptia.embrapa.br/digital/bitstream/item/137493/1/2015-138.
pdf, Accessed Dec 01, 2021.
Kraemer, G. B. (2007). Variabilidade espacial dos atributos do solo na delimitação das unidades de
mapeamento. 2007. 87p. Ph.D. Thesis, Dissertação (Mestrado) - Programa de Pós-Graduação
em Ciências do Solo,Universidade Federal do Paraná, Curitiba, PR, Universidade Federal do
Paraná, Curitiba, PR. http://hdl.handle.net/1884/13764, Accessed Dec 01, 2021.
Moisture, S. (2012). Operating instructions: Guelph permeameter. https://www.soilmoisture.
com/pdfs/Resource_Instructions_0898-2800_2800K1%20Guelph%20Permeameter%20.pdf,
Accessed Dec 01, 2021.
Morettin, P. A., & Bussab, W. O. (2017). Estatística Básica, 8th edn. Saraiva Educação S.A., São
Paulo, Brasil, Google Book. Accessed Dec 01, 2021.
Peterson, R. A., & Cavanaugh, J. E. (2019). Ordered quantile normalization: A semiparametric
transformation built for the cross-validation era. Journal of Applied Statistics, 47(13–15), 1–17
(2312–2327). https://doi.org/10.1080/02664763.2019.1630372.
Pinheiro, H. S. K. (2012). Mapeamento digital de solos por redes neurais artificiais da bacia
hidrográfica do rio Guapi-Macacu, RJ. Dissertação (Mestrado) - Mestrado em Agronomia,
Ciência do Solo, Universidade Federal Rural do Rio de Janeiro, Instituto de Agronomia,
Universidade Federal Rural do Rio de Janeiro, Seropédica, Rio de Janeiro. https://tede.ufrrj.
br/jspui/handle/jspui/3644, Accessed Dec 01, 2021.
Pinheiro, H. S. K. (2015). Métodos de mapeamento digital aplicados na predição de classes e
atributos dos solos da Bacia Hidrográfica do Rio Guapi Macacu, RJ. Ph.D. Thesis - Doutorado
em Agronomia, Ciência do Solo, Tese (Doutorado) - Universidade Federal do Rio de Janeiro,
Rio de Janeiro, Instituto de Agronomia, Universidade Federal do Rio de Janeiro, Seropédica,
Rio de Janeiro. https://tede.ufrrj.br/jspui/handle/jspui/1887, Accessed Dec 01, 2021.
Pinheiro, H. S. K., Chagas, C. D. S., Carvalho, Jr., W. D., & Anjos, L. H. C. D. (2016).
Ferramentas de pedometria para caracterização da composição granulométrica de perfis de
solos hidromórficos. Pesquisa Agropecuária Brasileira, 51(9), 1326–1338. https://doi.org/10.
1590/s0100-204x2016000900032.
Pinheiro, H. S., dos Anjos, L. H. C., Xavier, P. A., Chagas, C. S., & de Carvalho, Jr., W. (2018).
Quantitative pedology to evaluate a soil profile collection from the Brazilian semi-arid region.
South African Journal of Plant and Soil, 35(4), 269–279. https://doi.org/10.1080/02571862.
2017.1419385.
Richards, L. A. (1931). Capillary conduction of liquids through porous mediums. Physics, 1(5),
318–333. https://doi.org/10.1063/1.1745010.
Santos, P. A. (2020). Aplicação de ferramentas SIG nas análises geométrica e morfométricas
para caracterização hidrológica das bacias hidrográficas do Rio Guapi-Macacu, RJ. In: Anais
62 P. A. dos Santos et al.

da V Jornada de Geotecnologias do Estado do Rio de Janeiro (V JGEOTEC 2020), Anais


[recurso eletrônico] / V Jornada de Geotecnologias do Estado do Rio de Janeiro, vol 1, 1st edn,
Geopartners, Niterói, Rio de Janeiro, Brasil, p 1079. https://jgeotec.uff.br/wp-content/uploads/
sites/74/2021/08/Anais_JGEOTEC_2020_UFF_v005.pdf, Accessed Dec 01, 2021.
Santos, P. A. (2021). Mapeamento e modelagem digital da variabilidade tridimensional de atributos
físico-hídricos dos solos da bacia do rio Guapi-macacu—RJ, por estatística multivariada e
algoritmos. 156 f. Dissertação (Mestrado em Modelagem e Evolução Geológica)—Instituto
de Agronomia, Universidade Federal Rural do Rio de Janeiro, Seropédica. https://tede.ufrrj.br/
jspui/handle/jspui/6870, Accessed Dec 01, 2021.
dos Santos, H., Jacomine, P., dos Anjos, L., de Oliveira, V., Lumbreras, J., Coelho, M.,
de Almeida, J., de Araujo, F. J., de Oliveira, J., & Cunha, T. (2018). Sistema Brasileiro de
Classificação de Solos (Vol. 5, 5th ed.). Ministério da Agricultura, Pecuária e Abastecimento,
Empresa Brasileira de Pesquisa Agropecuária, Embrapa Solos, Embrapa, Brasília, Distrito Fed-
eral - DF. https://www.embrapa.br/solos/busca-de-publicacoes/-/publicacao/1094003/sistema-
brasileiro-de-classificacao-de-solos, Accessed Dec 01, 2021.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52(3–4), 591–611.
Snedecor, G. W. (1934). Calculation and interpretation of analysis of variance and covariance. Iowa
State College Division of Industrial Science Monographs (Vol. 1). Ames: Collegiate Press.
https://doi.org/10.1037/13308-000.
Team, R. (2020a). Rstudio: Integrated Development Environment for R. https://www.R-project.
org/.
Team, R. C. (2020b). A language and environment for statistical computing. https://www.R-
project.org/.
Tranter, G., Minasny, B., Mcbratney, A. B., Murphy, B., Mckenzie, N. J., Grundy, M., & Brough, D.
(2007). Building and testing conceptual and empirical models for predicting soil bulk density.
Soil Use and Management, 23(4), 437–443. https://doi.org/10.1111/j.1475-2743.2007.00092.x.
Wadoux, A. M. C., Minasny, B., & McBratney, A. B. (2020). Machine learning for digital soil map-
ping: Applications, challenges and suggested solutions. Earth-Science Reviews, 210(103359),
1–17. https://doi.org/10.1016/j.earscirev.2020.103359.
Webb, G. I., Sammut, C., Perlich, C., Horváth, T., Wrobel, S., Korb, K. B., Noble, W. S., Leslie,
C., Lagoudakis, M. G., Quadrianto, N., Buntine, W. L., Quadrianto, N., Buntine, W. L., Getoor,
L., Namata, G., Getoor, L., Han, X. J., Ting, J. A., Vijayakumar, S., Schaal, S., & Raedt, L. D.
(2011). Leave-one-out cross-validation. In: C. Sammut, & G. I. Webb (Eds.), Encyclopedia of
Machine Learning (pp. 600–601). Boston: Springer. https://doi.org/10.1007/978-0-387-30164-
8_469.
Wösten, J., Pachepsky, Y., & Rawls, W. (2001). Pedotransfer functions: Bridging the gap between
available basic soil data and missing soil hydraulic characteristics. Journal of Hydrology,
251(3–4), 123–150. https://doi.org/10.1016/S0022-1694(01)00464-4.
Xavier, P., dos Anjos, L., Pinheiro, H., Chagas, C. D. S., & Carvalho, Jr. W. D. (2019). Usage of
pedometrics for data evaluation and harmonization in soil profiles from Cerrado region, Mato
Grosso do Sul. In: World congress of soil science 2018, Rio de Janeiro (Vol. 21). Soil science:
Beyond food and fuel. In: Proceedings of the 21st WCSS 2018 (Vol. 2, p. 75) SBCS, 2019.
Spatial Dependence of Organic Carbon
and Granulometry in Archaeological
Soils of Lagoa Grande das Queimadas,
Northeastern Brazil

Rabech Grasiely Gomes Marques, Miguel Alvores Lima Neto,


Gustavo Souza Valladares , Demétrio da Silva Mützenberg,
and Aline Gonçalves de Freitas

1 Introduction

The evidences of ancient human occupation in the Northeast Region of Brazil


dated since to the Late Pleistocene and Early and Late Holocene (Peyre, 1993;
Martin, 1998; Guerín et al., 1999; Lahaye et al., 2013; Guidon et al., 2018) in a
scenario of intense climatic oscillations and possible interactions between humans
and megafauna mammals and the stage of intense technological activities, such as
lithic industries and the ceramic production. These natural and anthropic changes
reflect settlement patterns, choices and search for raw materials, socioeconomic
processes and the dispersion of archaeological sites in the landscape.
The formation processes of the archaeological record imply natural and cultural
transformations that affect the deposition of archaeological remains over time
(Schiffer, 1972; Stein, 1987). The geoarchaeological and contextual approach inte-
grates geoscience methods and paleoecological reconstruction from the perspective
of human ecology, where the climate, landscape, fauna, flora and chemical and
physical elements that make up the archaeological record are relevant in the cultural
interpretations of the themselves, in space and time, including soil formation
processes (Butzer, 1989; Waters, 1992). In a systemic perspective, the study of soils
(Renfrew, 1976) and the study of geomorphology of soils for paleoenvironmental
reconstructions (Gladfelter, 1997) help in understanding the cultural processes
of deposition, abandonment and reoccupation of archaeological sites and natural
depositional and post-depositional processes. In other words, occupancy deposits

R. G. G. Marques (✉) · M. A. Lima Neto · G. S. Valladares · A. G. d. Freitas


Federal University of Piaui, Teresina, Piauí, Brazil
e-mail: [email protected]; [email protected]
D. d. S. Mützenberg
Federal University of Pernambuco, Recife, Pernambuco, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 63


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_5
64 R. G. G. Marques et al.

are artifacts of human activity and interdisciplinary geoarchaeological, ethnoarchae-


ological, biological and chemical studies may result in globally applicable models,
for any period of time, culture and environment (Shahack-Gross, 2017).
The applicability and use of geostatistics as a methodology for analyzing data
in space or time is widespread in several branches of science and it is interested in
determining the spatial dependence of observations on a variable. In soil science,
geostatistics provides information about the spatial variability of its properties in a
sampled area, and this knowledge allows for proper management planning. Many
works use the technique to assess horizontal variability (Vieira, 1977; Valladares et
al., 2019; Marques Jr et al., 2014; Melo et al., 2016; Oliveira et al., 2018), few works
on vertical variability in the soil profile (Grego et al., 2011; Sampietro & da Silva
Lopes, 2016). Geostatistical analysis is a known and efficient method in the spatial
analysis of soil attributes (Isaaks & Srivastava, 1989; Goovaerts, 1977; Vieira et al.,
2008), as it takes into account spatial dependence (Vieira, 2000).
Soil classification systems are based and structured on morphogenesis, consider
morphological and genetic attributes in the classification, evidently associated with
physical and chemical analysis of soils (Embrapa, 2018; Soil Survey Staff, 2014;
IUSS Working Group WRB, 2015). The lithological or sedimentary discontinuities
of the soil’s parent materials are directly related to its formation, and can also
be evidenced by morphological, physical and chemical attributes (IUSS Working
Group WRB, 2015).
The knowledge of total organic carbon (TOC) and physical properties of soils
related to particle size distribution are fundamental for understanding the active
processes, potentials and weaknesses of a soil. Soil sampling for characterization
is carried out in layers. The comparison between these layers is useful in soil
classification and inferring their properties.
The depth distribution of soil attributes such as TOC and granulometry can
show the occurrence of lithological (or sedimentary) discontinuities. For this, the
evaluation of the spatial distribution of attributes in the different horizontal layers is
important in elucidating this issue.
The objective of this work was to evaluate the spatial dependence in depth
of the levels of TOC, and granulometric attributes, to evaluate the lithological
discontinuity and indication of climate change, in a hydromorphic soil of an open
trench in Lagoa Grande das Queimadas archaeological Site using descriptive and
geostatistical statistics methods.

2 Geological, Environmental and Archaeological


Background

The study area comprises the Lagoa Grande das Queimadas (Fig. 1) archaeological
site (LGQ) (23L 0726128 8980968) located at Várzea Branca municipality, between
two conservation units: the Serra da Capivara National Park and the Serra das
Confusões National Park, both in the Piauí State, Northeastern Brazil.
Spatial Dependence of Organic Carbon and Granulometry in Archaeological. . . 65

Fig. 1 Study area located of Lagoa Grande das Queimadas, Várzea Branca, Piauí State, Brazil

The regional geology comprises granites, gneisses, schists, limestones and


quartzites belonging to the Precambrian Sobradinho-Remanso Complex, inserted
in the peripheral depression of the middle São Francisco and by tertio-quaternary
dendrite-lateritic layers (CPRM, 2004). In the study area a Fluvic Gleysol occurs.
Precipitation rates in the area is 500 mm, with a dry season of 9 months a year and
average temperatures of 36 ◦ C in the warmer months and minimum temperatures of
19 ◦ C in the colder months (CPRM, 2004; Emperaire, 1989).
The regional hydrography is characterized by a set of temporary lakes and
intermittent watercourses belong to the Canindé-Piauí Sub-Basin, whose main
watercourse is the Piauí River. Rivers follow the dendritic pattern of watercourses.
Neosols, Argisols and Luvisols occur in the study area in semi-arid climate, whose
predominant biome is the caatinga with patches of arboreal cerrado, shallow soils
and the presence of plant taxa: angico (Anadenanthera), umbuzeiro (Spondias
tuberosa), favela (Cnidoscolus quercifolius), juremas (Fabaceae-Mimosoideae),
jatobás (Fabaceae-Caesalpiniodeae), marmeleiro (Croton sp.), aroeira (Schinus
terebinthifolia), pau-ferro (Libidibia ferrea), mororó (Bauhinia cheilanta), juazeiro
(Ziziphus joazeiro), mandacaru (Cereus jamacaru), caroá (Bromeliaceae), among
others (Ferraz et al., 1998; Lemos & Rodal, 2002).
The region has high archaeological relevance, due to the findings of lithic
artifacts (stone tools in quartz, quartzite and flint), and rock art attesting to ancient
human occupations. The region is also home to fossiliferous deposits of the
Quaternary megafauna (Guidon et al., 2007).
66 R. G. G. Marques et al.

3 Material and Methods

The research presents the TOC and granulometric analysis of 38 samples collected
up to 200 cm deep inside the Lagoa das Queimadas. Samples were collected every
10 cm, for the first 3 depths and the others every 5 cm. For the particle size analysis,
up to 31 samples were collected, with the first 3 layers having a thickness of 10 cm
and the other 5 cm.
The TOC were quantified following the Embrapa Soil Analysis Methods Manual
(Teixeira et al., 2017). To measure the TOC, the soil sample is placed in an acidic
medium and the TOC is determined by the oxidation of potassium dichromate, in
the end the remaining dichromate is titrated with ferrous ion. The granulometric
analysis followed the separation by sieves for the sand fraction and the pipette
method for the clay fraction (Teixeira et al., 2017).
After this procedure, the data were tabulated and descriptive statistical anal-
ysis performed. To understand its spatial dependence, data were analyzed using
semivariograms and kriging (Vieira, 2000). The depth of each layer was used as a
geographic coordinate, starting from the surface where the value zero was assigned.
The presence of spatial dependence was verified by adjusting the experimental
semivariogram (Vieira, 2000), a theoretical model that best represents the data was
developed, using variables such as range (a), nugget effect (Co), structural variance
(C1), and the sill (C0 + C1). The interrelationships between the variables in the
semivariogram may be used to determine whether spatial variability is present in
the analyzed attribute. If C0 = C0 + C1, the variation is random, and the range
value indicates the strength of variability (the lower the range value, the greater the
spatial variability). Thus, a geostatistical sampling scheme can be used to optimize
sampling, so that a smaller number of samples can be collected in areas with little
variability and larger numbers can be collected in areas with greater variability
(Zanão Júnior et al., 2010).
The semivariogram was estimated using the following equation:

1 ⎲
N (h)

.γ (h) = [Z (xi ) − Z (xi + h)]2 (1)
2N(h)
i=1

In which,
γ* = the semivariance among pairs of values separated by the distance h;
h = distance among measured values;
N (h) = number of pairs of measured points Z(xi ), Z(xi + h);
Z = attribute value; and,
xi = position of determined attribute.
The spatial dependence index (IDE), proposed by Zimback (2001), was cal-
culated to determine the degree of randomness. IDE values of <25%, 25–75%,
and > 75% represent weak, moderate, and strong spatial dependence, respectively.
Spatial Dependence of Organic Carbon and Granulometry in Archaeological. . . 67

4 Results and Discussion

The main horizons of the soil profile were identified (A, Cg, 2Agb and 2Cg, Fig. 2).
The color of the entire profile was hue 5Y (yellow), with a value ranging from 3 to 5
and chroma from 1 to 3, characterizing diagnostic horizon gley, and environment of
reduction and hydromorphism. The chroma higher in the bottom of the profile may
represent low TOC.
The presence of gravels in the first surface meter of the soil may indicate the
moment at which the environment became semi-arid, cover vegetation with low
biomass, and torrential rains, in conditions of lower weathering rates and accelerated
erosion.
Data analysis through descriptive statistics (Table 1) showed that the soils have
an average of 1.50 g kg−1 TOC, standard deviation and variance of 1, minimum
value 0.18 and maximum value 3.00, showing that the data has moderate variability.

Fig. 2 Soil profile of the Lagoa Grande das Queimadas archaeological site
68 R. G. G. Marques et al.

Table 1 Descriptive statistics for TOC and granulometry


Variable Unit. Mean Min. Max. S S2 A Kurt.
TOC g kg−1 1.50 0.18 3.00 1 1 −0.1 −1.6
Sand g/kg 149.07 42.5 425.5 122.26 14948.54 1.2 0.1
Silt g/kg 501.85 321.7 687.7 86.72 7519.79 0.4 −0.1
Clay g/kg 348.52 69.4 630.8 174.44 30427.88 −0.3 −1.4
TOC total organic carbon, Min. minimum, Max. maximum, S standard deviation, S2 variance, A.
asymmetry; Kurt. kurtosis

Fig. 3 TOC semivariogram of Lagoa Grande das Queimadas, Fluvic Gleysol profile

In addition, asymmetry presented a value of −0.1 and kurtosis of −1.6, indicating


that the values present asymmetry to the left and platykurtic distribution.
The spatial dependence of TOC was verified by adjusting the semivariogram
(Fig. 3). A good fit of the semivariogram can be observed, indicating a moderate
spatial dependence.
The adjusted model had a nugget effect (C0 = 0.15), a sill of 0.39 indicating
spatial dependence and the range showed that these samples are correlated up to a
depth of 11.58 cm (Fig. 3).
The data have a structural variance (C1) of 0.24, with a coefficient of determi-
nation (R2 ) of 0.71 and a degree of spatial dependence of 62%. Furthermore, the
gaussian model was the best suited to the observed data.
The Fig. 4 shows the depth distribution of TOC observed in the samples and
estimated by kriging. Up to a depth of approximately 85 cm there is a decrease in
TOC, with observed and estimated values being close, that is, with small deviations.
Between 85 and 110 cm, the TOC distribution is erratic, with greater deviations
between measured and estimated values. From 110 cm onwards, TOC contents are
significantly smaller than up to 85 cm and estimates return to few deviations. The
results indicate that there is a lithological (or sedimentary) discontinuity, and fluvic
qualifier (IUSS Working Group WRB, 2015) in the profile from 85 cm in depth.
Spatial Dependence of Organic Carbon and Granulometry in Archaeological. . . 69

Fig. 4 Depth distribution of Total organic carbon (g/kg)


observed and estimated 0.00 1.00 2.00 3.00
organic carbon contents in 0
soil by kriging
-20
-40
-60

Deth (cm)
-80
-100
-120
-140
-160 Measured
-180 Predicted
-200

In the layer 85–90 cm deep, there is a clear increase in TOC in relation to the
80–85 cm layer, indicating that the 85–90 cm layer was possibly a paleosurface,
with the presence of a probable buried surface A Horizon. In the 80–85 cm layer,
carbon-14 dating was carried out, giving 3934 years BP, a period that coincides with
significant reductions in rainfall volumes in the Parnaíba River basin, with alteration
of the vegetation cover and an increase in the processes of transport and deposition
of sediments (Mendes et al., 2019; Campos et al., 2022). Therefore, the carbon
contents would be reflecting the sedimentary discontinuity and a climatic framework
in the study area. It should be noted that the age of approximately 4000 years BP for
a depth of 80 cm coincides with samplings carried out in other similar landforms in
the Parnaíba River basin, indicating a period of reduction in rainfall volumes similar
to current ones (Mendes, 2016).
The analysis of the data referring to the granulometry of the soils through
descriptive statistics (Table 1) showed average values for sand of 149.07, for silt
501.85 and for clay 348.52 g kg−1 . With variance 14948.54; 7519.79; 30427.88,
respectively for sand, silt and clay. The standard deviation was smaller for the silt
variable 86.72, indicating less variability in the silt values. The results indicate a
predominance of silt and clay in the studied soil, and low degree of weathering,
with high silt:clay ratio (Valladares et al., 2017, 2020).
The coefficients of variation presented values of 19.8% for the silt variable,
33.0% for clay and 47.4% for sand, representing moderate to high variability in
the studied profile. In addition to descriptive statistics, the data were analyzed using
geostatistics by ordinary kriging, with the adjustment of semivariograms (Fig. 5).
The semivariogram for the sand fraction showed determination coefficient
(R2 = 0.76), the nugget effect 0 (C0 = 0), the plateau value (C0 + C1), coincided
with the structural variance (C1), both being 5242.6, the range was 220 cm, the best
model to represent the data of this variable was the circular.
70 R. G. G. Marques et al.

Fig. 5 Semivariograms of soil particle size fractions, Lagoa Grande das Queimadas, Fluvic
Gleysol profile. Sand (a), silt (b) and clay (c)
Spatial Dependence of Organic Carbon and Granulometry in Archaeological. . . 71

A Sand (g/kg) B Silt (g/kg) C Clay (g/kg)


0 50 100 150 200 350 450 550 650 100 200 300 400 500 600

-10 -10 -10


Measured Measured
-30 -30 -30
Predicted -50 Predicted -50

Depth (cm)

Deth (cm)
-50
Depth (cm)

-70 -70
-70
-90 -90
-90
-110 -110
-110
-130
-130
-130 Measured
-150
-150
-150 Predicted
-170
-170
-170

Fig. 6 Depth distribution of sand (a), silt (b) and clay (c) contents observed and estimated by
kriging

As for the silt variable, the semivariogram showed a nugget effect in the value of
(C0 = 1308.2), structural variance was (C1 = 13964), plateau (C0 + C1 = 15272)
range 88.62 cm. The model that best fit the data was the Gaussian. For clay, as
well as the sand fraction, the nugget effect was zero (C0 = 0), therefore structural
and level variance had identical values being (C0 + C1 = 45219), (C1 = 45219).
The range was 180 cm, the semivariogram model that best represented the data was
spherical.
Assessing the granulometry of the soils, it is possible to see lithological
discontinuity occurring around 90 to 110 cm in depth. Since the sand and silt
contents increase and the clay contents decrease significantly (Fig. 6), and in the
transition layer the content distribution of the particles are erratic.
Total organic carbon records from archaeological soils directly imply the con-
tribution of plant biomass incorporated into these systems through biogeochemical
processes, but it can also reflect land use and management and agricultural practices
(Melo & Schaefer, 2009; Oliveira et al., 2015). According to Kern (2009), the
accumulation of carbon in depth indicates that the A horizon of the soil was formed
in a short period of time resulting from intense human activities, being later modified
by pedogenetic processes.

5 Conclusions

The TOC contents in the profile studied had a good adjustment to the semivariogram
and a good estimate can be affirmed by the kriging, being able to employ the method
for interpolation. In view of the observed results, it was possible to make inferences
about the origin of the parent material of the studied soil.
The different semivariogram models are great tools for understanding the spatial
distribution of granulometric data, emphasizing the need for adjustment and choice
of appropriate models according to the specificity of the data.
The results indicated lithological/sedimentary discontinuity at about 80-110 cm
deep in the soil profile, being indicated by the differentiation of soil color, TOC and
72 R. G. G. Marques et al.

granulometry keep corroborating by inferences of the human occupation layer, in


which a lithic tool was found at a depth of 82 cm, whose organic sediments from
the archaeological record date back to 4000 years BP.
The multi proxy micro-archaeobotanical, geoarchaeological and archeometric
research at Lagoa Grande das Queimadas archaeological Site is part of a project that
intends to understand the processes of human occupations, subsistence practices by
the plant resources and the insertion of the archaeological sites in the landscape,
over the last four millennia.

References

Butzer, K. (1989). Archaeology of human ecology. Cambridge University Press.


Campos, M. C., Chiessi, C. M., Novello, V. F., Crivellari, S., Campos, J. L., Albuquerque, A. L.
S., et al. (2022). South American precipitation dipole forced by interhemispheric temperature
gradient. Scientific Reports, 12(1), 1–9.
Cprm. (2004). Projeto cadastro de fontes de abastecimento por água subterrânea, estado do
Piauí: diagnóstico do município de Várzea Branca - Fortaleza. http://rigeo.cprm.gov.br/xmlui/
bitstream/handle/doc/16459/Rel_VarzeaBranca.pdf?sequence=1
Embrapa. (2018). Centro Nacional de Pesquisas de Solos. Sistema brasileiro de classificação de
solos.
Emperaire, L. (1989). Vegétation et gestión des ressources naturells dans la caatinga du Sud-Est
du Piauí-Brésil. Travaux et Documents microédités. ORSTOM.
Ferraz, E. M. N., Rodal, M. J. N., Sampaio, E. V. S. B., & Pereira, R. C. A. (1998). Composição
florística em trechos de vegetação de caatinga e brejo de altitude na região do Vale do Pajeú.
Pernambuco. Revista Brasileira de Botânica, 21(1), 7–15.
Gladfelter, B. G. (1997). Geoarchaeology: the geomorphologist and archaeology. American
Antiquity, 41(4), 519–538.
Goovaerts, P. (1977). Geostatistics for natural resources evaluation. Oxford University Press.
483p.
Grego, C. R., Coelho, R. M., & Vieira, S. R. (2011). Critérios morfológicos e taxonômicos de
Latossolo e Nitossolo validados por propriedades físicas mensuráveis analisadas em parte pela
geoestatística. Revista Brasileira de Ciência do Solo, 35, 337–350.
Guerín, C., Faure, M., Simões, P. R., Hugueney, M., & Mourer-Chauvire, C. (1999). The
Pleistocene Palaeontological Site of Toca da Janela da Barra do Antonião (São Raimundo
Nonato, Piauí state). In C. Schobbenhaus, D. A. Campos, E. T. Queiroz, M. Winge, & M.
Berbert-Born (Eds.), Sítios Geológicos e Paleontológicos do Brasil. http://sigep.cprm.gov.br/
sitio069/sitio069english.htm
Guidon, N., Felice, G., & de Lima, C. F. (2007). Salvamento arqueológico na área da adutora do
Garrincho. Fumdhamentos, 1(VI), 125–167.
Guidon, N., Felice, G. D., Lourdeau, A., Macedo, A. O., De Luz, M. F., Da Valls, M. P., & de
Aquino, C. C. (2018). A Lagoa dos Porcos: Escavações Arqueológicas e Paleontológicas no
Sudeste do Piauí - Brasil. FUNDHAMentos, XV(2), 03–31.
Isaaks, E. H., & Srivastava, M. R. (1989). Applied geostatistics (No. 551.72 ISA).
IUSS Working Group WRB. (2015). World reference base for soil resources 2014, update 2015.
International soil classification system for naming.
Kern, D. C. (2009). Analyses and interpretation of the soils and/or sediments in the archeological
researches. Revista do Museu de Arqueologia e Etnologia, São Paulo, Suplemento, 8, 21–35.
Lahaye, C., Hernandez, M., Böeda, E., Felice, G. D., Guidon, N., Hoeltz, S., Lourdeau, A., Pagli,
M., Pessis, A. M., Rasse, M., & Viana, S. (2013). Human occupation in South America by
Spatial Dependence of Organic Carbon and Granulometry in Archaeological. . . 73

20,000 BC: The Toca da Tira Peia site, Piauí, Brazil. Journal of Archaeological Science, 40(6),
2840–2847.
Lemos, J. R., & Rodal, M. J. N. (2002). Fitossociologia do componente lenhoso de um trecho da
vegetação de caatinga. Acta Botanica Brasilica, 16(1), 23–42.
Marques, J., Jr., Siqueira, D. S., Camargo, L. A., Teixeira, D. D. B., Barrón, V., & Torrent, J.
(2014). Magnetic susceptibility and diffuse reflectance spectroscopy to characterize the spatial
variability of soil properties in a Brazilian Haplustalf. Geoderma, 219, 63–71.
Martin, G. (1998). Pré-História Do Nordeste Do Brasil (p. 445p). EDUFPE.
Melo, V. F., & Schaefer, C. E. G. R. (2009). Matéria orgânica em solos desenvolvidos de rochas
máficas no nordeste de Roraima. Acta Amazonica, 39, 53–60.
Melo, A. A. B., Valladares, G. S., Ceddia, M. B., Pereira, M. G., & Soares, I. (2016). Spatial dis-
tribution of organic carbon and humic substances in irrigated soils under different management
systems in a semi-arid zone in Ceará, Brazil. Semina: Ciências Agrárias, 37(4), 1845–1855.
Mendes, V. R. (2016). Registro sedimentar quaternário na Bacia do Rio Parnaíba, Piauí: um
estudo multi-indicadores voltado à investigação de mudanças climáticas (Doctoral dissertation,
Universidade de São Paulo).
Mendes, R. M., et al. (2019). Thermoluminescence and optically stimulated luminescence
measured in marine sediments indicate precipitation changes over northeastern Brazil. Pale-
oceanography and Paleoclimatology, 34, 1476–1486. https://doi.org/10.1029/2019PA003691
Oliveira, I. A., Campos, M. C. C., Freitas, L., & Soares, M. D. R. (2015). Caracterização de solos
sob diferentes usos na região sul do Amazonas. Acta Amazonica, 45(1), 1–12.
Oliveira, M. L. J., Valladares, G. S., Vieira, J. S., & Coelho, R. M. (2018). Availability and spatial
variability of copper, iron, manganese and zinc in soils of the State of Ceará, Brazil. Revista
Ciência Agronômica, 49, 371–380.
Peyre, E. (1993). Nouvelle découverte d’un homme préhistorique américain: une femme de 9700
ans au Brasil. C.R. Acad. Sci. Paris, sér. II, 316, 839–842.
Renfrew, C. (1976). Archaeology and the earth sciences. In D. A. Davidson & M. L. Shackley
(Eds.), Geoarchaeology: Earth science and the past. Duckworth.
Sampietro, J. A., & da Silva Lopes, E. (2016). Compactação de um cambissolo causada por
máquinas de colheita florestal espacializada com geoestatística. Floresta, 46(3), 307–314.
Schiffer, M. B. (1972). Archaeological context and systemic context. American Antiquity, 37, 156–
165.
Shahack-Gross, R. (2017). Archaeological formation theory and geoarchaeology: State-of-the-art
in 2016. Journal of Archaeological Science, 79, 3–43.
Soil Survey Staff. (2014). Keys to soil taxonomy (12th ed.). NRCS.
Stein, J. (1987). Deposits for archaeologists. M. Schiffer (ed.). Advances in Archaeological Method
and Theory, 11, 337–395.
Teixeira, P. C., Donagema, G. K., Fontana, A., & Teixeira, W. G. (Eds.). (2017). Manual de
métodos de análise de solo, 3. ed. rev. e ampl. Embrapa, 575p.
Valladares, G. S., de Aquino, C. M. S., de Aquino, R. P., & Beirigo, R. M. (2017). Solos frágeis do
Parque Nacional da Serra da Capivara, Piauí. GEOgraphia, 19(41), 123–134.
Valladares, G. S., Azevedo, E. C. D., Camargo, O. A. D., Grego, C. R., & Rastoldo, A. M. C.
S. (2019). Spatial variability and copper and zinc availability in vineyards and nearby soils.
Bragantia, 68, 733–742.
Valladares, G. S., Júnior, A. F. R., & De Aquino, C. M. S. (2020). Caracterização de solos no núcleo
de desertificação de Gilbués, Piauí, Brasil, e sua relação com os processos de degradação.
Physis Terrae-Revista Ibero-Afro-Americana de Geografia Física e Ambiente, 2(1), 115–135.
Vieira, S. R. (1977). Variabilidade espacial de argila, silte e atributos químicos em uma parcela
experimental de um Latossolo Roxo de Campinas (SP). Bragantia, 56, 181–119.
Vieira, S. R. (2000). Geoestatística em estudos de variabilidade espacial do solo. In R. F. Novais,
V. H. Alvares, & C. E. G. R. Schaefer (Eds.), Tópicos em ciência do solo (pp. 1–54). Viçosa,
MG.
74 R. G. G. Marques et al.

Vieira, S. R., Xavier, M. A., & Grego, C. R. (2008). Aplicaçúes de geoestatística em pesquisas
com cana-de-açúcar. In: L. L. Dinardo-miranda, A.C. M. de Vasconcelos, M. G. de A. Landell.
(Eds.) Cana-de-açúcar. Ribeirão Preto: instituto agronômico, 1(1), 839–852.
Waters, M. R. (1992). Principles of Geoarchaeology: A north american perspective. University of
Arizona Press.
Zanão Júnior, L. A., Lana, R. M. Q., Guimarães, E. C., & Pereira, J. M. A. (2010). Variabilidade
espacial dos teores de macronutrientes em Latossolos sob sistema de plantio direto. Revista
Brasileira de Ciência do Solo, Viçosa, MG, 34(2), 389–400.
Zimback, C. R. L. (2001). Análise espacial de atributos químicos de solos para fins de mapeamento
da fertilidade. 2001. Tese (Livre Docência em Levantamento do Solo e Fotopedologia)—
Faculdade de Ciências Agronômicas, Universidade Estadual Paulista.
Application of Electrical Conductivity
Profiling for the Characterization
and Textural Discretization
of a Technosol

Alexandre Muselli Barbosa and Camila Camolesi Guimarães

1 Introduction

The characterization of the behavior of subsurface contaminants is directly related to


the soil and its properties. The storage and transport of contaminants are associated
with the stratigraphic and hydrogeological characteristics of the environment since
the speed and direction of the contamination flow are determined by the hydraulic
conductivity, the water level, and the lithology of the media. One of the great
challenges of environmental studies in urban centers is the access to soil profiles
to characterize their properties. Due to the presence of intense technological
interventions in cities, a large part of the landscape is modified, either by soil
addition, removal, or sealing, hampering the access to and collection of samples.
Access limitation directly affects the acquisition of information, affecting the
ability to properly understand the dynamics between contaminants and the physical
environment. This becomes worse when dealing with sites of great heterogeneity,
where parameters such as contaminant’s retention and transport are altered along
the spatial distribution of the soil. Most of the soil matrices are significantly
heterogeneous and anisotropic, which can be attributed to the inherent spatial
variability of the pedogenetic formation process, and scaling-up is critically affected

A. M. Barbosa (✉)
Institute for Technological Research of the State of São Paulo (IPT), São Paulo, São Paulo State,
Brazil
Department of Chemical Engineering, University of São Paulo (USP), São Paulo, São Paulo
State, Brazil
e-mail: [email protected]
C. C. Guimarães
Institute for Technological Research of the State of São Paulo (IPT), São Paulo, São Paulo State,
Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 75


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_6
76 A. M. Barbosa and C. C. Guimarães

by the vertical and horizontal variability of the media (Kardanpour et al., 2014).
Factors such as climate and geology, in addition to chemical, physical and biological
processes, control the processes of local interactions with the contaminant’s source,
occurrence, and transport (Engelmann et al., 2019), turning this type of study into a
challenge for many knowledge areas.
According to Wadous et al. (2021), one of the challenges for the future study
of pedometrics is linked to the development of methods capable of predicting soil
characteristics at appropriate scales for modeling and decision-making, reducing
observations in space and time. Studies related to the determination and movement
of pollutants in the soil are more accurate when observed in situ. Conklin (2013)
demonstrates the importance of a detailed description of the heterogeneity of the
subsoil in field studies and in modeling the transport of contaminants.
There are several techniques based on indirect and non-destructive determination
methods applied in chemical, physical and biological characterization of the soil,
such as X-ray diffraction, X-ray fluorescence, broadband infrared spectroscopy,
visible microscopy, fluorescence microscopy, electron microscopy, electrical con-
ductivity and electromagnetism (Conklin, 2013). However, one of the disadvantages
of some of these methods is the need to acquire and prepare the samples for the
analyses, which is a limitation in studies in urban centers.
The constant advance in electronic and computing technology makes miniature
components increasingly available, and the need to solve problems such as the
mobility of equipment motivates and creates opportunities for developing new tools.
In this scenario, the penetrometers, which are probes developed to acquire data
along a subsurface profile, are methods that aim to overcome the challenge of
acquiring data in vertical high-resolution (acquisition in the centimetric space),
quickly and simply, without mobilizing large analytical equipment in the field.
They also eliminate the need for sampling, as they provide real-time analysis of the
subsurface, resulting in lower costs, by reducing the number of activities, materials,
and labor to obtain the same parameters as the laboratory analysis.
Among the methods applied for continuous and detailed data collection are
the indirect acquisition methods, which can be invasive or non-invasive, and are
based on the application of sensors that convert subsurface variations into an
electrical signal in correlation with soil properties and their potential changes
(Ko et al., 2010; McCall et al., 2018; Guirelli Netto et al., 2020), being able to
provide estimates of the subsurface soil conditions (Stenberg et al., 2010). The
penetrometer methods provide a great vertical detailing of the subsurface and allow
the obtainment of data with minimal disturbance, thus reducing the creation of
preferential paths for mobile contaminants. The use of these methods helps to
make geotechnical and environmental studies less invasive while subsidizing the
geological, hydrogeological, and environmental characterization of the study area.
The intrinsic characteristic of penetrometer methods is their high specialization,
in which different tools are developed for specific purposes. There are methods
developed for the characterization of soil textural distribution, such as electrical con-
ductivity (EC) profiling (Christy et al., 1994; Beck et al., 2000; Schulmeister et al.,
2004). Electrical conductivity is a property of materials that describes how well they
Application of Electrical Conductivity Profiling for the Characterization. . . 77

allow electrical current to flow through their structures. According to Ohm’s law,
for a conductor at a given temperature, the resistivity is the constant value resulting
from the ratio between the potential difference and the current intensity that crosses
it. The electrical conductivity profiling method applies this principle to measure
the electrical conductivity of the subsurface in a single point, with great vertical
representation (readings interval of 1.6 cm), enabling the identification of lenses
and areas of preferential flow, as well as low permeability zones (Schulmeister et al.,
2003). However, this technique presents a low horizontal spatial representation, with
an influence radius varying between 5 cm and 10 cm (Beck et al., 2000), requiring
multiple loggings to obtain a horizontally representative section of the study area.
Stratigraphic characterization is a critical activity, especially in areas contami-
nated by hydrocarbons. According to Keuper et al. (2003), the mobility and physical
retention of these contaminants are controlled by environmental characteristics, in
which residual contamination can represent from 5% to 20% of the pore volume and
can reach 70% in cases of the presence of free phase. In situations where the physical
environment is composed of intercalated layers of fine and coarser fractions, such as
soils in an alluvial environment, the preferential displacement flow will follow the
structure of the soil. Due to the high viscosity of hydrocarbons, their displacement
will be preferentially lateral to vertical in layers with higher porosity (Poulsen &
Kueper, 1992). In this way, the application of a technique capable of obtaining
information about the distribution of soil fractions with a high sampling density
and the ability to identify preferential layers and lenses, without major disturbances
in the environment, is an interesting strategy for subsoil characterization.
However, pedogenic characteristics such as mineralogical composition, moisture
content, presence of organic matter, salts, and pH can affect the electrical properties
of soils (Olhoeft, 1984; Corwin & Lesch, 2005; Mertens et al., 2008). Thus, studies
are needed for a better understanding of the relationship between the properties
of soils in tropical environments and the acquisition of indirect data by electrical
conductivity profiling, for its application as an indicator of the displacement
behavior of subsurface contaminations.
This research presents the results of geoenvironmental investigations developed
in a former wood treatment plant, located near one of the main rivers of the city
of São Paulo, Brazil. The aim was to apply electrical conductivity profiling for the
discretization of textural variations in a Technosol, to evaluate the applicability of
a minimally invasive method in the stratigraphic characterization of a hydrocarbon-
contaminated site.

2 Study Area

The Wood Treatment Unit of Jaguaré (WTU Jaguaré) was the third wood preser-
vation plant established in Brazil (Fig. 1). It was initially set up in the Institute
for Technological Research of the State of São Paulo in 1947, as a pilot plant for
wood impregnation under vacuum, resulting from a partnership with the former
78 A. M. Barbosa and C. C. Guimarães

Fig. 1 Wood Treatment Unit of Jaguaré location and regional drainage

Sorocabana Railroad. In 1974, the activities were transferred to an area in the region
of Avenida Jaguaré, in the western zone of the city of São Paulo. WTU Jaguaré was
one of the main centers for the development of wood treatment techniques, aiming
to prevent rotting and reduce maintenance costs for poles and railway lines. The
operations were supplied by freight trains, and the ground was covered with gravel
beds resulting from the installation of railroad tracks.
The chemical treatment of wood used the method of applying preservatives under
pressure, in autoclave systems, in which all the preservative not absorbed by the
wood was transferred to a storage tank and reused in the process. The main chemical
products used for wood preservation were creosote, chromated copper arsenate
(CCA), and pentachlorophenol (PCP). The activities resulted in environmental
contamination mainly due to the presence of creosote in the soil and hydrocarbon
plumes in groundwater.
WTU Jaguaré is in the Pinheiros River floodplain region, which influences
the lithology of the area, presenting geological features typical of Pleistocene
and Holocene quaternary deposits, superimposed on tertiary sediments of the
Itaquaquecetuba Formation (Riccomini & Coimbra, 1992). The Holocene deposits
are represented by colluviums and alluviums, with a thickness of less than 10 m.
The colluviums are discontinuous and may contain gravel pits (stone lines) at the
base, while the alluviums are deposited in floodplains and low terraces, consisting
of sandy and clay layers rich in organic matter, usually presenting gravel at the base
(Monteiro et al., 2012).
According to Boettinger (2005), soils formed in alluvial regions are highly
heterogeneous and anisotropic, with spatial variability of properties resulting from
the historical conditions of depositions and pedogenetic processes (Elkateb et al.,
2003). Due to periodic disturbances by floods, soils in floodplains usually develop
organic horizons (Monteiro et al., 2012), a typical formation of the alluvial plains
of Pinheiros River. Another determining factor in the alteration of soil properties is
the anthropic action in urban centers; in regions such as the floodplain of Pinheiros
Application of Electrical Conductivity Profiling for the Characterization. . . 79

River, soil groundings and impermeabilizations are common, due to the swampy
terrain. According to Morel et al. (2005), urban soils are a class of anthropic
soils influenced by human activities, by construction, material import and export,
waste disposal, and contaminations. These actions alter the natural conditions and
expected behavior of soils, influencing edaphological dynamics, and turning the
study and characterization of urban soils into a topic of great relevance within the
field of pedology.

3 Methods

The research was based on the information collected in surveys of the history of use
of the area, and in the application of non-invasive geoenvironmental investigation
tools. The data were compared and interpreted to obtain a detailed stratigraphic
characterization of the study area and validated by sample collections and soil
characterization laboratory tests. Figure 2 illustrates the location of samplings and
electrical profiling tests. In Fig. 2, the aerial image of WTU Jaguaré in 2004
(available at http://geosampa.prefeitura.sp.gov.br/), shows dark spots in the area
resulting from the spillage of chemical products on the soil, as well as the aerial
tanks that were used for storage of wood preservatives.

3.1 Soil Sampling

The selection of the control sample (B-01) was based on the site’s historical use, and
it was collected upstream of the old plant structures, in a non-contaminated area.
The deformed samples were collected by the method of direct percussive drilling
system (ASTM, 2014), using jacketed rods of 1.4 m, in a depth of up to 8 m. The
selection of deformed samples was based on a visual-manual description method
(EMBRAPA, 2011), with samples collected at each stratigraphic variation, totaling
17 samples sent for physical tests. The collection of undeformed soil samples
used the method described by ABNT (2018), using a stainless-steel sampler. The
undeformed samples were collected in soil layers that measured at least 10 cm,
resulting in the collection of 9 samples for laboratory tests.

3.2 Soil Characterization

The grain size analysis was performed using the sedimentation method to determine
the fine fraction, and sieving to determine the coarse fraction, with particles
classified into seven classes, according to their dimensions (clay, silt, sand, and
particles larger than 2 mm), using method ABNT (2016). The aim was to evaluate
80 A. M. Barbosa and C. C. Guimarães

Fig. 2 Location of soil sampling and electrical conductivity profiling points, background sample
B-01 (green), electrical conductivity profile log (yellow) and transect (red)

Table 1 Soil physical characterization tests and respective methodologies


Test Method
Soil sampling: deformed samples ASTM D6282 (ASTM, 2014)
Soil sampling: undeformed samples NBR 9820 (ABNT, 1997)
Grain size analysis NBR 7181 (ABNT, 2016)
Electrical conductivity EMBRAPA (2011)
Bulk density NBR 9813 (ABNT, 2016)
Porosity KLUTE (1986)
Hydraulic conductivity for rigid wall KLUTE (1986)
Hydraulic conductivity for flexible wall ASTM D5084 (ASTM, 2016)
Fraction of organic carbon (FOC) EMBRAPA (2011)

texture distribution, as well as validate the visual-manual description and the


differentiation of stratigraphic layers, which were discretized by the results of this
test.
The collected samples were sent to the laboratory for the following tests: grain
size analysis, electrical conductivity, and fraction of organic carbon for the deformed
samples; and bulk density, porosity (microporosity, total and effective porosity), and
rigid and flexible wall hydraulic conductivity for the undeformed samples (Table 1).
Application of Electrical Conductivity Profiling for the Characterization. . . 81

3.3 Electrical Conductivity Profiling

The electrical conductivity profiling data (ECP) was obtained using a Geoprobe®
MH6534, and a Geoprobe® 6620DT direct push machine, with a horizontally
positioned dipole-type electrode arrangement. Before the tests, the drilling site was
prepared by removing the cobblestones and the gravel layer in the first 50 cm of
the soil surface. Initial and final response tests were performed for data integrity
and control, to verify the response patterns. The tests were performed according to
ASTM (2018).
The rods were driven percussively to a depth of 8 m, with reading intervals
of 0.016 m, totaling approximately 490 readings per profile. A control test was
carried out at point B-01, and its results were compared with the soil description
and characterization obtained in the laboratory, aiming to evaluate the electrical
variation and its relationship with the clay content and potential soil contamination.
To evaluate the capacity of spatial characterization of the soil stratigraphy, tests
in five points aligned in a transect were carried out (ECP-01, ECP-02, ECP-03,
ECP-04, and ECP-05), with a spacing of 5 m between points, positioned in the main
equipment operation area in WTU Jaguaré (Fig. 2). The results were interpolated
applying the inverse distance weighted (IDW) method, which assumes that there
is a correlation and similarity between neighboring points. This correlation is
proportional to the distance between the points and can be defined as a distance
reverse function of every point from neighboring points (Sentianto & Triandini,
2013).
The IDW method is described as:
∑N
i=1 zi.di−n
Z0 = ∑
.
N −n
i=1 di

Where:
Z0 = The estimation value of variable z in point I.
zi = The sample value in point I.
di = The distance of the sample point to the estimated point.
N = The coefficient that determines weight based on distance.
n = The total number of predictions for each validation case.

4 Results and Discussion

The studies within the investigated area were developed by the application of
geoenvironmental investigation technique. Thus, the results are first presented
individually, and then the interpretations of the observed physical properties are
82 A. M. Barbosa and C. C. Guimarães

described. For this, an in-depth investigation range was established provided data
up to 8 meters deep.

4.1 Soil Characterization

According to the World Reference Base for Soil Resources (IUSS Working Group
WRB, 2014), the soil in the study area is classified in the Technosols group. The
properties and pedogenesis of these soils are dominated by anthropic influence,
presenting technic hard materials starting at 5 cm from the soil surface, evidenced
by the presence of exogenous materials such as cobblestones and gravel, resulting
from the industrial operations in the former WTU Jaguaré. Another characteristic
is the presence of clear stratification caused by fluvial deposition processes, as well
as the presence of an organic carbon horizon underlying the technological material,
thus classifying the soil in the area as an Ekranic Fluvic Technosol.
The results of the laboratory soil characterization are presented in Tables 2 and
3. No samples were collected from U horizon due to the presence of technic hard
materials up to 0.5 m from the soil surface, and its description was used exclusively
for the pedological classification of the soil profile.

Table 2 Results of soil characterization tests for deformed samples collected at control point B-01
Electrical
Top Base Clay Silt Sand >2 mm conductivity FOC
Horizon (m) (%) Texture (μs/cm) (g/kg)
U 0.00 0.50 – – – 100 Coarse – –
A 0.50 1.20 51.2 3.2 45.6 0.0 Sandy clay 50.3 3.05
H 1.20 1.50 – – – – – 85.07 385.95
B 1.50 3.40 73.0 18.3 8.7 0.0 Clay 53.29 19.5
BC 3.40 3.87 6.5 19.5 67.5 0.0 Loam 50.15 0.91
3.87 4.30 23.0 42.0 35.0 0.0 52.32 1.22
2C 4.30 4.40 0.0 2.0 95.0 3.0 Sand 49.85 1.20
4.40 4.90 0.0 0.0 98.0 2.0 48.62 1.76
4.90 5.00 6.0 20.0 74.0 0.0 Loamy sand 48.05 1.16
5.00 5.27 0.0 23.0 77.0 0.0 48.23 1.15
5.27 5.40 8.0 16.0 76.0 0.0 – 1.82
5.40 5.68 5.0 2.0 92.0 1.0 Sand 50.4 1.82
5.68 5.85 0.0 0.0 98.0 2.0 52.81 0.70
5.87 6.00 57.0 14.0 24.0 5.1 Clay 54.6 1.96
6.00 6.18 0.0 1.4 41.4 58.2 Silt loam 50.52 1.65
6.18 7.05 3.0 2.0 93.0 2.0 Sand 51.78 2.08
7.05 7.20 39.4 34.5 26.1 0.0 Clay loam – –
7.20 8.00 0.0 2.0 94.5 3.5 Sand 57.63 14.66
Table 3 Results of soil characterization tests for undeformed samples collected at control point B-01
Bulk density Porosity
Horizon Topo Base Natural (g/cm3 ) Dry (g/cm3 ) Total (%) Microporosity (%) Effective (%) Hydraulic conductivity (cm/s)
U 0.00 0.50 – – – – – –
A 0.50 1.20 1.77 1.44 40.8 33.9 6.9 2.13 E-08
H 1.20 1.50 1.10 0.53 65.2 57.2 8.0 6.66 E-07
B 1.60 1.70 1.61 0.97 71.7 64.7 7.0 –
1.70 3.00 – – – – – 1.06 E-07
BC 3.00 3.50 1.94 1.66 43.5 29.3 14.1 1.22 E-06
3.50 3.90 1.99 1.61 43.7 37.1 6.6 1.93 E-07
2C 3.90 4.20 1.94 1.61 43.3 28.4 14.9 1.99 E-08
5.00 5.50 2.09 1.77 37.9 30.3 7.7 2.15 E-06
5.50 5.80 1.82 1.57 40.9 15.3 25.6 4.01 E-07
7.15 8.00 1.79 1.51 43.0 23.0 19.9 8.88 E-03
Application of Electrical Conductivity Profiling for the Characterization. . .
83
84 A. M. Barbosa and C. C. Guimarães

The integrated evaluation of stratigraphic variation of the soil profile and the
results of the laboratory characterization indicated that the soil deposition in A
horizon (0.5 m to 1.2 m) has different characteristics from the rest of the profile,
presenting lower FOC and porosity values, as it is a material originated from
subsurface horizons, transported to the study area to fill the swampy soil. The FOC
values are higher in the first layers, especially in horizon H (385.95 g/kg), being
an organic carbon horizon, also observed between 7 m and 8 m, evidencing past
activity of organic matter deposition.
Horizon B presents high silt and clay contents, which are drastically reduced
from 3.4 m, becoming primarily sandy and resulting in an increase in the effective
porosity in deeper horizons, which facilitates the vertical percolation of solutes. The
profile shows several abrupt textural transitions, typical of alluvial bodies, which
result in a large variation of the hydraulic conductivity values along the profile.
In this sense, the identification of these layers has great relevance, as clay layers
in the subsurface can act as a natural barrier, delaying the potential distribution
of contaminants. In horizon 2C (6.0 m to 6.18 m) a layer of pebbles (stone line)
was observed, with the presence of 58.2% of materials with a diameter greater than
2 mm, typical of the Holocene deposits of the alluvial plain of Pinheiros River.
It was observed that profile B-01 samples represent the expected characteristics
for soils formed in the floodplains of Pinheiros River, corroborating the descriptions
of Rodriguez (1998), Carvalho (2006) and Monteiro et al. (2012). The integrated
data analysis indicates that the soil profile presents a great stratigraphic complexity,
with an abrupt variation of the grain sizes between horizons. This characterizes
an anisotropic profile with typical features of a fluvial regime, with the presence
of technogenic deposits on the surface, in addition to the presence of an organic
horizon resulting from the alluvial depositions of Quaternary sediments in the
metropolitan region of São Paulo.

4.2 Electrical Conductivity Profiling

For a better interpretation of the electrical conductivity profiling test, the results
of soil electrical conductivity were analysed in integration with the laboratory soil
characterization data (Fig. 3).
The electrical conductivity profile obtained in the field corresponds to the textural
variations of the clay fraction, not showing parity with the other soil fractions. This
behavior was sensitive even in thin layers, having a direct correlation with the clay
content, which was also observed by McCall et al. (2014, 2017). The dispersion of
the electrical conductivity results as a function of the presence of clay showed a
strong relationship when analysing the sampling points individually (R2 = 0.9164).
When evaluated along the horizon textural classification, the data are distributed
in three groups: coarser textures, with a predominance of the sand fraction (sand,
loamy sand, and silty loam), with results below 15 mS/m; media textures (sand clay,
clay loam, and loam), with values between 15 mS/m and 25 mS/m; and fine texture
Application of Electrical Conductivity Profiling for the Characterization. . . 85

Fig. 3 Vertical representation of the distribution of the parameters obtained for control point B-01,
in horizons U (blue), A (red), H (black), B (dark gray), BC (light gray), and 2C (yellow)

(clay), with values above 25 mS/m (Figs. 4 and 5). Thus, it was possible to assign a
generic classification of horizons relating the electrical behavior of the soil and its
textural composition.
Regarding the organic horizon, low electrical conductivity values were observed,
with the same behavior as the sandy horizons. However, when compared with the
values obtained in the laboratory, the behavior is the opposite, with the organic layer
being much more conductive than the clay horizons. In the laboratory method, the
electrical conductivity in the solid-liquid mixture is evaluated via the exchange of
cations associated with clay minerals and organic matter, whereas in the field the
electrical conductivity is evaluated by a sensor in a solid media, with direct contact
with soil solid particles and its moisture, where the organic mass presents resistive
behavior.
This result is intricately linked to the form of data acquisition. As the samples
collected for grain size analysis are discrete, they are not able to describe the entire
spatial variation of grain distribution. On the other hand, continuous data collection
methods, such as penetrometric systems, can record information at small intervals
and are sensitive to small variations in the soil profile. The electric behavior of
86 A. M. Barbosa and C. C. Guimarães

Fig. 4 Correlation of clay content and electrical conductivity profiling values, for different depths
of control point B-01

Fig. 5 Distribution of electrical conductivity values in B-01 point as a function of textural


composition classification

the soil is a result of several factors (Engelmann et al., 2019; Kardanpour et al.,
2014; Schulmeister et al., 2003), and it is necessary to explore prediction techniques
integrated with other variables to increase the accuracy of the discretization of soil
horizons.
Langmuir (1997) shows that the electrical behavior of the soil is controlled by
the soil’s fine fractions, such as phyllosilicates, humic substances, and oxides and
oxyhydroxides of iron and manganese, which tend to be highly conductive due to
their size, surface area, and electric charge. Thus, electrical methods are sensitive
to textural and mineralogical alterations in the soil, enabling the differentiation of
regions with different source materials.
Application of Electrical Conductivity Profiling for the Characterization. . . 87

Fig. 6 Interpretation and classification of the horizons of transect A through the responses of the
electrical conductivity profile, in horizons U (blue), A (red), H (black), B (dark gray), BC (light
gray), and 2C (yellow)

The electrical conductivity profiling test performed at point B-01 discretized thin
layers, with comparable results presented by Christy et al. (1994) and McCall et
al. (2017), who observed that the dipole-array sensor array was able to identify
thin soil layers in the field. Sensors in the horizontal arrangement are sensitive
to small stratigraphic variations; however, they are also susceptible to the lack of
lateral contact during the test, in addition to having little spatial representation, with
a radius influence varying between 0.05 m and 0.10 m (Beck et al., 2000).
The advantage of the electrical profiling method is the possibility of correlation
with other investigation techniques, whether focused on physical, chemical, con-
tinuous, or discrete parameters, using depth as the common variable. According to
Guirelli Netto et al. (2020), this level of detail obtained by the probe is interesting
for indirect stratigraphic characterization in environmental and geotechnical studies,
for understanding the flow behavior of contamination plumes. However, in indirect
systems for the determination of textural features, this behavior might lead to
interpretation errors, thence soil samplings are necessary for the elaboration of
calibration models.
The distribution model obtained by the IDW interpolator, using a 2 cm high and
50 cm wide spacing for electrical conductivity values of the five tests performed
in transect A are presented in Figs. 6 and 7. It was observed that the technogenic
superficial layer presents irregular values, which are linked to the origin of the
material that composes this layer. The figures also show that the clay layer, identified
in profile B-01, is typical of the entire area, presenting a continuous behavior along
the transect.
The model shows a low electrical conductivity anomaly between ECP-02 and
ECP-03, with values below 10 mS/m, and a higher propagation below 4.5 m
(saturated zone), being these values inferior to the ones observed to control point
B-01. This result suggests that the presence of PAH contamination on local soils
88 A. M. Barbosa and C. C. Guimarães

Fig. 7 Interpolation of the electrical conductivity profiling of transect A by IDW method

is influencing the electrical behavior of the subsurface, generating low conductivity


anomalies. The observed horizontal distribution behavior of the low conductivity
anomaly corroborates with Poulsen and Kueper (1992), showing that the presence of
clay lenses in the subsurface creates layers of low permeability and the displacement
is preferentially lateral in layers with higher porosity (5.5 m to 8 m).
Kress and Teeple (2003) obtained equivalent results in the study of creosote
contamination in a wood treatment unit in Texas. Other studies also shown that the
presence of PAH compounds generates low conductivity anomalies when compared
to natural soil values (Cardarelli et al., 2009; Arato et al., 2014). In contaminated
sites, resistivity changes are mainly caused by changes in groundwater and soil
chemistry, water saturation, porosity, and temperature, and can be described by a
modified version of Archie’s law (Samouelian et al., 2005). The process of soil
contamination by compounds derived from hydrocarbons generates changes in the
geochemical balance and changing soil response signatures, Moraes et al. (2022)
identified magnetic mineralogy and grain sizes changes related to the contaminant
impact on the geochemical environment.
The results obtained by the IDW interpolation show that the electrical conduc-
tivity responses in the area have an intense similarity with the grain size results for
control point B-01, presenting spatial correlation, and indicating that this method is
applicable for the visualization of stratigraphic data. The method was sensitive to
data variations along the soil profile, allowing the identification and spatialization
of the anomalies caused by the presence of contaminants in the soil.
The application of an indirect method for the characterization and discretization
of a technosol presented satisfactory results, being able to differentiate the textural
layers in the soil profile and soil sections. However, more studies are necessary
to understand the anthropic influence on soil behavior, especially in places with
intense technological interventions. Anthropic disturbances can cause extremely
complex spatial patterns for electrical conductivity assessments, and local sampling
is required to validate data and calibrate their interpretations.
Application of Electrical Conductivity Profiling for the Characterization. . . 89

5 Conclusion

The stratigraphic characteristics of the soil influence the internal dynamics of flow
in the soil profile, and the identification and differentiation of these layers are
extremely important, especially in a scenario of exposure to contamination. The
electrical conductivity profiling method was sensitive to the textural variations of
the profile, allowing the identification of thin layers. However, an ambiguity of
electrical behavior between organic and sandy layers was observed, which may lead
to misconceptions of data interpretation and classification, highlighting the need for
the collection of samples in the field and laboratory analysis for the validation and
calibration of the data obtained by the indirect method applied.
Electrical conductivity profiling was able to identify low conductivity anomalies,
indicating the presence of PAH in the soil, both in vadose and saturated zones,
resulting from the operations of the former WTU Jaguaré. The non-invasive
data acquisition method was efficient for the application in urban areas, with
difficult access to the collection of samples in the field. Indirect data acquisition
methods are interesting alternatives in these areas, requiring further studies for a
better understanding and expansion of their potential to characterize the pedogenic
processes in tropical environments, especially in areas under intense anthropic
technological influence.

References

ABNT. (1997). NBR 9820: Coleta de amostras indeformadas de solos de baixa consistência em
furos de sondagem—Procedimento. ABNT - Associação Brasileira de Normas Técnicas, 5 p.
ABNT. (2016). NBR 9813—Soil - “in situ” determination of the apparent specific gravity using a
core cutter. ABNT - Associação Brasileira de Normas Técnicas, 5 p.
ABNT. (2018). NBR 7181—Soil- grain size analysis. ABNT - Associação Brasileira de Normas
Técnicas, Org. 12 p.
ASTM. (2014). D6282/D6282M-14 standard guide for direct push soil sampling for environmental
site characterizations. ASTM International. https://doi.org/10.1520/D6282_D6282M-14
ASTM. (2016). D5084-16a. Standard test methods for measurement of hydraulic conductivity of
saturated porous materials using a Flexible Wall Permeameter. ASTM International.
ASTM. (2018). D7352–18 standard practice for volatile contaminant logging using a Membrane
Interface Probe (MIP) in unconsolidated formations with direct push methods. ASTM Interna-
tional.
Arato, A., Wehrer, M., Biró, B., & Godio, A. (2014). Integration of geophysical, geochemical
and microbiological data for a comprehensive small-scale characterization of an aged LNAPL-
contaminated site. Environmental Science and Pollution Research, 21(15), p. 8948–8963.
https://doi.org/10.1007/s11356-013-2171-2.
Beck, F. P., Clark, P. J., & Puls, R. W. (2000). Location and characterization of subsurface
anomalies using a soil conductivity probe. Ground Water Monitoring and Remediation, 20(2),
55–59. https://doi.org/10.1111/j.1745-6592.2000.tb00265.x
Boettinger, J. L. (2005). Alluvium and alluvial soils. Encyclopedia of Soils in the Environment,
45–46.
90 A. M. Barbosa and C. C. Guimarães

Cardarelli, E., Filippo, D. I., & G. (2009). Electrical resistivity and induced polarization tomog-
raphy in identifying the plume of chlorinated hydrocarbons in sedimentary formation: A case
study in Rho (Milan - Italy). Waste Management and Research, 27(6), 595–602. https://doi.org/
10.1177/0734242X09102524
Carvalho, D. L. R. de. (2006). Indicadores geomorfológicos de mudanças ambientais no sistema
fluvial do Alto Tietê (município de São Paulo): pesquisa documental. Dissertação (mestrado em
Geografia Física) - Dept. Geografia - FFLCH - USP. São Paulo, 2006.
Christy, C. D., Christy, T. M., & Wittig, V. (1994). A percussion probing tool for the direct sensing
of soil conductivity. Technical Paper n◦ 94–100. Geoprobe Systems. 16 p.
Conklin, A. (2013). Introduction to soil chemistry: Analysis and instrumentation (2nd ed.). Wiley.
Corwin, D. L., & Lesch, S. M. (2005). Characterizing soil spatial variability with apparent soil
electrical conductivity: I. Survey protocols. Computers and Electronics in Agriculture, 46(1–3
SPEC. ISS), 103–133. https://doi.org/10.1016/j.compag.2004.11.002
Elkateb, T., Chalaturnyk, R., & Robertson, P. K. (2003). An overview of soil heterogeneity: Quan-
tification and implications on geotechnical field problems. Canadian Geotechnical Journal,
40(1), 1–15. https://doi.org/10.1139/t02-090
EMBRAPA. (2011). Manual de métodos de análise de solo. 2 ed rev ed. Rio de Janeiro. v. online
disponível em: http://www.cnps.embrapa.br/publicacoes
Engelmann, C., Handel, F., Binder, M., Yadav, P. K., Dietrich, P., Liedl, R., & Walther, M. (2019).
The fate of DNAPL contaminants in non-consolidated subsurface systems—Discussion on the
relevance of effective source zone geometries for plume propagation. Journal of Hazardous
Materials, 375(April), 233–240. https://doi.org/10.1016/j.jhazmat.2019.04.083
Guirelli Netto, L. G., Barbosa, A. M., Galli, V. L., Pereira, J. P. S., Gandolfo, O. C. B., &
Birelli, C. A. (2020). Application of invasive and non-invasive methods of geo-environmental
investigation for determination of the contamination behavior by organic compounds. Journal
of Applied Geophysics, 178, 104049. https://doi.org/10.1016/j.jappgeo.2020.104049
IUSS Working Group WRB. (2014). World Reference Base for Soil Resources 2014. International
soil classification system for naming soils and creating legends for soil maps (4th ed.). World
Soil Resources Reports, FAO.
Kardanpour, Z., Jacobsen, O. S., & Esbensen, K. H. (2014). Soil heterogeneity characterization
using PCA (Xvariogram) - multivariate analysis of spatial signatures for optimal sampling
purposes. Chemometrics and Intelligent Laboratory Systems, 136, 24–35. https://doi.org/
10.1016/j.chemolab.2014.04.020
Keuper, B. H., Wealthall, G. P., Smith, J. W. N., Leharne, S. A., & Lerner, D. N. (2003). An
illustrated handbook of DNAPL transport and fate in the subsurface. Environment Agency
R&D Publication.
Klute, A. (1986). Methods of soil analysis, part 1, physical and mineralogical methods (2nd
ed.). American Society of Agronomy, Agronomy Monographsv. https://doi.org/10.1002/
gea.3340050110
Ko, E. J., Kim, K. W., Park, K., Kim, J. Y., Kim, J., Hamm, S. Y., Lee, J. H., & Wachsmuth, U.
(2010). Spectroscopic interpretation of PAH-spectra in minerals and its possible application to
soil monitoring. Sensors, 10(4), 3868–3881. https://doi.org/10.3390/s100403868
Kress, W. H., & Teeple, A. P. (2003). Two-dimensional resistivity investigation of the north
cavalcade street site. Scientific Investigations Report, 2005–5205, 28.
Langmuir, D. (1997). Aqueous environmental geochemistry (600 p). Upper Saddle River: Prentice-
Hall.
McCall, W., Christy, T. M., Pipp, D., Terkelsen, M., Christensen, A., Weber, K., & Engelsen, P.
(2014). Field application of the combined membrane-interface probe and hydraulic profiling
tool (MiHPT). Groundwater Monit R, 34(2), 85–95. https://doi.org/10.1111/gwmr.12051
McCall, W., Christy, T. M., & Evald, M. K. (2017). Applying the HPT-GWS for Hydrostratigraphy,
water quality and aquifer recharge investigations. Groundwater Monit R, 37, 78–91. https://
doi.org/10.1111/gwmr.12193
McCall, W., Christy, T. M., Pipp, D. A., Jaster, B., White, J., Goodrich, J., Fontana, J., & Doxtader,
S. (2018). Evaluation and application of the optical image profiler (OIP) a direct push probe
Application of Electrical Conductivity Profiling for the Characterization. . . 91

for photo-logging UV-induced fluorescence of petroleum hydrocarbons. Environmental Earth


Sciences, 77(10), 1–15. https://doi.org/10.1007/s12665-018-7442-2
Mertens, F. M., Patzold, S., & Welp, G. (2008). Spatial heterogeneity of soil properties and its
mapping with apparent electrical conductivity. Journal of Plant Nutrition and Soil Science,
171(2), 146–154. https://doi.org/10.1002/jpln.200625130
Monteiro, M. D., Gurgueira, M. D., & Rocha, H. C. (2012). Geologia da Região Metropolitana de
São Paulo. In Twin Cities: Solos das regiões metropolitanas de São Paulo e Curitiba. NEGRO,
Ars ed (p. 512). São Paulo.
Moraes, C. S., Ustra, A. T., Barbosa, A. M., Imbernon, R. A. L., & Tengan, C. M. U. (2022).
Magnetic signatures of a creosote oil contaminated site: Case study in São Paulo, Brazil.
Scientific Reports, 12, 21853. https://doi.org/10.1038/s41598-022-23493-2
Morel, J. L., Schwartz, C., & Florentin, L. (2005). Urban Soils. Encyclopedia of Soils in the
Environment, 202–208.
Olhoeft, G. R. (1984). Clay-organic reactions measured with complex resistivity. SEG technical
program expanded abstracts. Society of Exploration Geophysicists, 356–358. https://doi.org/
10.1190/1.1894009
Poulsen, M. M., & Kueper, B. H. (1992). A field experiment to study the behavior of tetra-
chloroethylene in unsaturated porous media. Environmental Science and Technology, 26(5),
889–895. https://doi.org/10.1021/es00029a003
Riccomini, C., & Coimbra, A. M. (1992). Geologia da bacia sedimentar. In A. A. Ferreira, U. R.
Alonso, & P. L. Luz (Eds.), Solos da cidade de São Paulo (pp. 37–94). São Paulo.
Rodriguez, S. K. (1998). Geologia Urbana da Região Metropolitana de São Paulo. 171 f. Tese
(Doutorado) - Programa de Geologia Sedimentar, Instituto de Geociências, Universidade de
São Paulo, São Paulo, 1998.
Samouelian, A., Cousin, I., Tabbagh, A., Bruand, A., & Richard, G. (2005). Electrical resistivity
survey in soil science: A review. Soil and Tillage Research, 83(2), 173–193. https://doi.org/
10.1016/j.still.2004.10.004
Schulmeister, M. K., Butler, J. J., Healey, J. M., Zheng, L., Wysocki, D. A., & McCall, G. W.
(2003). Direct-push electrical conductivity logging for high-resolution hydrostratigraphic char-
acterization. Groundwater Monitoring & Remediation, 23(3), 52–62. https://doi.org/10.1111/
j.1745-6592.2003.tb00683.x
Schulmeister, M. K., Butler, J. J., Franseen, E. K., Wysocki, D. A., & Doolittle, J. A. (2004). High-
resolution stratigraphic characterization of unconsolidated deposits using direct-push electrical
conductivity logging: A floodplain-margin example. In J. S. Bridge, & D. W. Hyndman (Eds.),
Aquifer Characterization (pp. 67–78). Oklahoma: Society for Sedimentary Geology. https://
doi.org/10.2110/pec.04.80.0067
Sentianto, A., & Triandini, T. (2013). Comparison of kriging and inverse distance weighted (IDW)
interpolation methods in lineament extraction and analysis. Journal of Southeast Asian Applied
Geology, 5(1), 21–29.
Stenberg, B., Rossel, R. A. V., Mouazen, A. M., & Wetterlind, J. (2010). Visible and near infrared
spectroscopy in soil science. Advances in Agronomy, 107, 163–215. https://doi.org/10.1016/
S0065-2113(10)07005-7
Wadous, A. M. J. C., Heuvelink, G. B. M., Lark, R. M., Lagacherie, P., Bouma, J., Mulder,
V. L., Libohova, Z., Yang, L., & McBratney, A. B. (2021). Ten challenges for the future of
pedometrics. Geoderma, 401, 115155. https://doi.org/10.1016/j.geoderma.2021.115155
Soil Porosity Differences Among
Grass-Covered and Exposed Soils
Measured by High Resolution X-Ray
Computed Microtomography (microCT)

Marcelo Wermelinger Lemes, Alessandra Silveira Machado,


Gustavo Mattos Vasques, Hugo Machado Rodrigues, Ricardo Tadeu Lopes,
and Reiner Olíbano Rosas

1 Introduction

The pores of the soil are represented by cavities with different sizes and shapes,
determined by the arrangement of solid particles. They constitute a volumetric
fraction of the soil and can be filled with air, water and nutrients solution (Hillel,
1972). The soil porosity controls the soil aeration, water retention, root development
and branching and, consequently, the transport and availability of water and
nutrients for plant growth (Tavares Filho & Tessier, 2009; Stumpf et al., 2014).
Several techniques can be used to measure soil porosity. The most common
method is to calculate the soil total porosity from soil bulk density and soil particle
density values using Eq. (1). Soil bulk density is usually measured from undisturbed
soil samples using the core method (Blake & Hartge, 1986), whereas the soil
particle density is usually measured by the excluded volume method (Flint & Flint,
2002). These and other methods for measuring bulk density and particle density

M. W. Lemes (✉)
Estácio de Sá University, Rio de Janeiro, Brazil
A. S. Machado · R. T. Lopes
Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
e-mail: [email protected]
G. M. Vasques
Embrapa Soils, Rio de Janeiro, Brazil
e-mail: [email protected]
H. M. Rodrigues
Federal Rural University of Rio de Janeiro, Rio de Janeiro, Brazil
R. O. Rosas
Federal Fluminense University, Rio de Janeiro, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 93


W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_7
94 M. W. Lemes et al.

provide a single value for these properties and for soil porosity, but do not allow
visualizing the pore structure and connectivity. High resolution X-ray computed
microtomography (microCT) appears as a non-destructive analytical technique to
visualize and measure soil structure (Luo et al., 2008; Carducci et al., 2017), pore
network and connectivity (Katuwal et al., 2015; Hamamoto et al., 2016; Singh et
al., 2020, 2021), and measure other soil properties including bulk density (Petrovic
et al., 1982) and water content (Crestana et al., 1985).

p = 1 − (bd/pd)
. (1)

where: p is total porosity, bd is bulk density, and pd is particle density.


The microCT provides high resolution images of the soil sample in the form
of 3D volume data that can be processed to obtain the volume, distribution and
connectivity of pores and aggregates, with little sample preparation. Its physical
principle is based on the attenuation of the X-rays when they interact with the
different sample constituents. The attenuation of the X-rays depends on physical
characteristics of the constituents. To obtain microCT 3D images, it is necessary
to acquire many images at different projections at constant angular steps, and then
integrate the images in 3D using appropriate algorithms.
The objective of this work is to use microCT to compare the pore size distribution
and connectivity density between a soil without vegetation cover and a soil with
grass cover.

2 Materials and Methods

The work was conducted in the municipality of Silva Jardim, Rio de Janeiro state,
southeastern Brazil. The study was done on Ferralsols on a slope within the limits
of the Córrego do Espinho catchment, with a total area of 9.14 km2 .
Two erosion plots (Wischmeier & Smith, 1978) were installed at the midslope
position of a dissected hill. At one plot, the soil was covered with grass, and at
the other the grass was removed leaving the soil exposed to erosion. At each plot,
four undisturbed soil samples were collected using acrylic tubes measuring 50 mm
in height and 32 mm in diameter, and transported to the laboratory for microCT
scanning.
The samples were scanned using the SkyScan 1173 High-Energy Spiral Scan
microCT system coupled with the CTVox software for 3D volume rendering (Bruker
Corp., Billerica, USA). The system operates with voltage and current of 130 kV and
61 μA, respectively. A flat panel detector with 2240 × 2240 pixels, with a pixel size
of 15.3 μm and 12-bit resolution, was used to register the transmission of the X-ray
beam and make 3D images of the soil samples. For that, a rotation step of 0.5◦ and an
averaging of 5 frames per scan were used. After acquisition, the 3D representation of
the sample was produced using the cone-beam algorithm reconstruction algorithm
Soil Porosity Differences Among Grass-Covered and Exposed Soils Measured. . . 95

from the 2D images taken at different projections (Feldkamp et al., 1984). The
acquisition and image processing parameters were adjusted in order to obtain a
clear view and 3D representation of the soil pore structure and include as much
as possible the micropores with diameters smaller than 0.05 mm.
An adaptive segmentation method was used to produce binary 3D images of
the soil samples. By this method, for each voxel a threshold is calculated as the
mean of all voxel values (grayscales) within a selected radius. A binary image is
obtained by classifying each voxel based on the threshold, with the solid objects
(soil matrix) plotted in white and the background (pores) plotted in black. From the
black and white binary 3D images, the soil total, macro- and microporosities and the
pore connectivity density were calculated using the CTAn software (Bruker Corp.,
Billerica, USA).
The samples were then taken to the laboratory and analyzed for water retention
at 6 kPa, bulk density and particle density by conventional laboratory methods,
described in Teixeira et al. (2017). Soil total porosity was calculated using Eq.
1. Soil microporosity was estimated as the fraction of pores that retain water at
6 kPa tension, supposedly corresponding to pores with diameters smaller than
0.05 mm. Macroporosity was calculated as the difference between total porosity and
microporosity, and represent pores with diameters larger than 0.05 mm (Teixeira et
al., 2017).

3 Results and Discussion

From microCT analysis, grass-covered soils (CS) had total porosities of 30.9,
24.6, 27.3 and 36.7%, with a mean of 29.9% (Table 1). Exposed soils (ES) had
total porosities of 13.6, 20.6, 21.9 and 21.1%, with a mean of 19.3%. The pore
connectivity densities in CS were 511.1, 76.7, 155.4 and 508.6 mm−3 (connections
per mm3 ) (mean of 313.0 mm−3 ), while in ES they were 19.9, 45.8, 76.8 and
46.0 mm−3 (mean of 47.1 mm−3 ). Total soil porosity was higher in CS than in ES,
and the same was true for pore connectivity density (Table 1). The main difference
in porosity between CS and ES was in the micropores (mean of 18.0% in CS versus
10.4% in ES), although the volume of macropores was higher in CS (mean of
12.0%) than ES (mean of 8.9%). The denser pore connectivity in CS, especially
in samples 1 and 4 with pore connectivity densities over 500 mm−3 , lies in the
micropores of these samples. Accordingly, the microporosity that is greatly reduced
in ES couples with a remarkable reduction in pore connectivity density.
Visually, the presence of macropores is evident in sample 2 with the largest
macroporosity among all samples (Fig. 1a), whereas the micropores dominate the
pore structure in sample 4 with the largest microporosity (Fig. 1b). In contrast, few
micropores are seen in sample 5 with the smallest microporosity (Fig. 1c) and few
macropores are observed in sample 8 with the smallest macroporosity (Fig. 1d),
though the 3D images are not very clear when printed in 2D. The microporosity of
sample 2 (Fig. 1a) is the smallest among CS samples (Table 1), which explains
96

Table 1 Soil properties measured using x-ray computed microtomography


Treatment Sample Total porosity (%) Macropores (%) Micropores (%) Pore connectivity density (mm−3 )
Grass-covered soils 1 30.9 12.5 18.4 511.1
2 24.7 13.8 10.9 76.8
3 27.4 9.0 18.4 155.4
4 36.7 12.5 24.2 508.6
Mean 29.9 12.0 18.0 313.0
Exposed soils 5 13.6 7.7 5.9 19.9
6 20.6 7.5 13.1 45.8
7 21.9 13.2 8.7 76.8
8 21.1 7.4 13.8 46.0
Mean 19.3 8.9 10.4 47.1
M. W. Lemes et al.
Soil Porosity Differences Among Grass-Covered and Exposed Soils Measured. . . 97

Fig. 1 High resolution X-ray computed microtomography 3D images of soil samples 2 (a), 4 (b),
5 (c) and 8 (d) showing the soil pore structure in blue and the soil particles (soil matrix) as white
background
98 M. W. Lemes et al.

Fig. 1 (continued)

its similarity with sample 5 (Fig. 1c), as both samples have more macro- than
micropores (Table 1). Along the same lines, samples 4 (Fig. 1b) and 8 (Fig. 1d)
have more micro- than macropores (Table 1). The predominance of micro- over
macropores is clear in their 3D images (Figs. 1b, d) compared to the other samples
with predominantly macropores (Figs. 1a, c).
From conventional laboratory analysis, on average the CS total porosity was
51.7%, the macroporosity was 12.1% and the microporosity was 40.0%. The ES
porosity values were lower than those from CS, with averages of 45.4, 5.4 and
39.6% for total, macro- and microporosity, respectively. In this case, the difference
in porosity between CS and ES occurred in the volume of macropores, while
the microporosity was similar between them (~40% on average). The discrepancy
between microCT and conventional laboratory results was anticipated for two
main reasons. First, the measurement approaches are considerably different. While
microCT gives a direct measure of the porosity by counting the observed soil pores
with different sizes in the 3D images, the conventional laboratory method calculates
the total porosity indirectly from the bulk density and particle density values, and
estimates the macro- and microporosity by associating them with the water retention
capacity at certain tensions. Second, by definition soil micropores have diameters
smaller than 0.05 mm and the resolution of the microCT limits the observation
of very small, microscopic pores in the 3D images, underestimating the soil total
volume of micropores from microCT. Nonetheless, from both methods (microCT
and conventional laboratory), the soil total porosity was higher in CS than ES.
As such, the capacity of microCT to identify pores, especially those larger than
0.05 mm (macropores), is evident (Table 2).
Soil Porosity Differences Among Grass-Covered and Exposed Soils Measured. . . 99

Table 2 Soil properties measured using conventional laboratory methods


Treatment Sample Total porosity (%) Macropores (%) Micropores (%)
Grass-covered soils 1 49.5 9.0 40.9
2 52.9 15.2 42.1
3 50.8 10.0 35.8
4 53.7 14.4 41.1
Mean 51.7 12.1 40.0
Exposed soils 5 46.2 5.3 40.5
6 45.1 3.0 37.7
7 47.0 11.2 40.8
8 43.2 2.1 39.3
Mean 45.4 5.4 39.6

The higher porosity in CS compared to ES is explained by the fact that the


vegetation cover keeps the root system in the soil, increases organic matter and
assists in the stability of soil aggregates and soil structure, increasing the number
and connectivity of pores. In Paraná, Brazil, this effect was observed when native
forest soils were compared to soils under pasture, cassava and sugar cane fields
(Viana et al., 2011). Soils under forest vegetation, with a more diverse and complex
root system, had higher organic matter content, lower bulk density and higher
porosity compared to the cultivated soils. Accordingly, Singh et al. (2021) observed
an increase in microCT-measured porosity mediated by an increase in soil organic
matter from manure application in soils under corn and soybean rotation in South
Dakota, USA. And Singh et al. (2020) observed higher CT-measured porosity,
saturated hydraulic conductivity and water infiltration rate, and lower bulk density
in soils under no-till crop rotations including winter cover crops compared to soils
under fallow also in South Dakota.

4 Conclusions

Vegetation cover increases the porosity of the soil compared to an exposed soil,
as observed and measured in high resolution X-ray computed microtomography
(microCT) images and confirmed by conventional laboratory methods. The lack of
vegetation cover exposes the soil surface to erosion. The direct impact of raindrops
disaggregates the soil matrix and clogs the pores, while erosion removes soil
nutrients and organic matter, and thus, the capacity of the soil to sustain production
and provide ecosystem services. The microCT non-destructively characterizes the
soil pore structure, allowing 3D visualization and quantification of pores with
different shapes and sizes and pore connectivity density.
100 M. W. Lemes et al.

References

Blake, G. R., & Hartge, K. H. (1986). Bulk density. In A. Klute (Ed.), Methods of soil analysis:
Part 1. Physical and mineralogical methods (2nd ed.). American Society of Agronomy, Soil
Science Society of America.
Carducci, C. E., Zinn, Y. L., Rossoni, D. F., Heck, R. J., & Oliveira, G. C. (2017). Visual analysis
and X-ray computed tomography for assessing the spatial variability of soil structure in a
cultivated Oxisol. Soil and Tillage Research, 173, 15–23.
Crestana, S., Mascarenhas, S., & Pozzi-Mucelli, R. S. (1985). Static and dynamic three-
dimensional studies of water in soil using computed tomographic scanning. Soil Science, 140,
326–332.
Feldkamp, L. A., Davis, L. C., & Kress, J. C. (1984). Practical cone-beam algorithm. Journal of
the Optical Society of America, 1, 612–619.
Flint, A. L., & Flint, L. E. (2002). Particle density. In J. H. Dane & G. C. Topp (Eds.), Methods of
soil analysis. Part 4. Physical methods. Soil Science Society of America.
Hamamoto, S., Moldrup, P., Kawamoto, K., Sakaki, T., Nishimura, T., & Komatsu, T. (2016). Pore
network structure linked by X-ray CT to particle characteristics and transport parameters. Soils
and Foundations, 56, 676–690.
Hillel, D. (1972). Soil and water: Physical principles and processes (3rd ed.). Academic.
Katuwal, S., Arthur, E., Tuller, M., Moldrup, P., & Jonge, L. (2015). Quantification of soil pore
network complexity with x-ray computed tomography and gas transport measurements. Soil
Science Society of America Journal, 79, 1577–1589.
Luo, L., Lin, H., & Halleck, P. (2008). Quantifying soil structure and preferential flow in intact soil
using x-ray computed tomography. Soil Science Society of America Journal, 72, 1058–1069.
Petrovic, A. M., Siebert, J. E., & Rieke, P. E. (1982). Soil bulk density analysis in three dimensions
by computed tomographic scanning. Soil Science Society of America Journal, 46, 445–450.
Singh, J., Singh, N., & Kumar, S. (2020). X-ray computed tomography–measured soil pore
parameters as influenced by crop rotations and cover crops. Soil Science Society of America
Journal, 84, 1267–1279.
Singh, N., Kumar, S., Udawatta, R. P., Anderson, S. H., Jonge, L. W., & Katuwal, S. (2021). X-
ray micro-computed tomography characterized soil pore network as influenced by long-term
application of manure and fertilizer. Geoderma, 385, 114872.
Stumpf, L., Pauletto, E. A., Fernandes, F. F., Suzuki, L. E. A. S., Silva, T. S., Pinto, L. F. S., & Lima,
C. L. R. (2014). Perennial grasses for recovery of the aggregation capacity of a reconstructed
soil in a coal mining area in Southern Brazil. Revista Brasileira de Ciência do Solo, 38, 327–
335.
Tavares Filho, J., & Tessier, D. (2009). Characterization of soil structure and porosity under long-
term conventional tillage and no-tillage systems. Revista Brasileira de Ciência do Solo, 33,
1837–1844.
Teixeira, P. C., Donagemma, G. K., Fontana, A., & Teixeira, W. G. (Eds.). (2017). Manual de
métodos de análise de solo (3rd ed.). Embrapa.
Viana, E. T., Batista, M. A., Tormena, C. A., Costa, A. C. S., & Inoue, T. T. (2011). Atributos
físicos e carbono orgânico em Latossolo Vermelho sob diferentes sistemas de uso e manejo.
Revista Brasileira de Ciência do Solo, 35, 2105–2114.
Wischmeier, W. H., & Smith, D. D. (1978). Predicting rainfall erosion losses—A guide to
conservation planning. Agriculture Handbook 537. United States Department of Agriculture.
Using Legacy Soil Data to Plan New Data
Collection: Study Case of Rio de Janeiro
State: Brazil

Elias Mendes Costa, Hugo Machado Rodrigues, Ana Carolina de Ferreira,


Marcos Bacis Ceddia , and Douglath Alves Corrêa Fernandes

1 Introduction

There is an increasing demand for spatial information on soil types and their
properties by scientists and decision-makers to better understand the effect of a
growing population, increased demand for food in a climate-changing environment.
In many places in the world, soil information is difficult to obtain and can be non-
existent and in Brazil is not different. When no detailed map or soil observation
is available in a region of interest, we have to extrapolate from other parts with
similar characteristics (Mallavan et al., 2010). Considering that funding to obtain
new soil surveys is very scarce in Brazil and we have to optimize it by organizing the
available existing data (legacy data) and planning the soil survey in representative
areas where we can use the idea of transferability model to map other regions
(Grunwald et al., 2018).
A large quantity of soil data has already been produced in Brazil as part of
soil surveys and research projects on the various aspects of soil science (Samuel-
Rosa et al., 2020), however there is no standardization of this collection, not a
single base where you can consult and have a quick answer about level of survey,

E. M. Costa (✉)
Federal Institute of Tocantins (IFTO), Pedro Afonso, TO, Brazil
H. M. Rodrigues · D. A. C. Fernandes
Laboratory of Water and Soils in Agroecosystem (LASA), Federal Rural University of Rio de
Janeiro (UFRRJ), Seropédica, Brazil
A. C. d. Ferreira
Laboratory of Soil Organic Matter (LMOS), Federal Rural University of Rio de Janeiro (UFRRJ),
Seropédica, Brazil
M. B. Ceddia
Department of Agro Technologies and Sustainability (DATS), Institute of Agronomy, Federal
Rural University of Rio de Janeiro (UFRRJ), Seropédica, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 101
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_8
102 E. M. Costa et al.

scale, format, title, authorship and where if you find such information (Santos et al.,
2013). Realizing this need, the National Soil Program of Brazil (PronaSolos) was
created, an ambitious project to investigate the Brazilian soil that will consolidate
data integration and collaborate with the advancement of knowledge of the land in
Brazil.
There will be several initiatives across the country with partnerships between
the states and the union. One of the states that have already started the activities
of planning and executing PronaSolos is the state of Rio de Janeiro (RJ), the state
where the program was born. Seeking to optimize resources of time, money and
people this study aimed to analyze and discuss how the use of legacy soil data and
landscape similarity analysis can help in planning the allocation of preferred areas
for detailed soil surveys in the scope of the PronaSolos-RJ project.
To meet the objectives of this study, we used two approaches: one called
landscape similarity analysis by the Gower index, which analyzes the similarity
between landscapes of a reference area and an area of interest, as done by (Ferreira
et al., 2022) and another, called area of applicability, which analyzes the ability of a
predictive model trained in a region to be applied/extrapolated to the area of interest
using a dissimilarity index based on the distance between points, as done by Meyer
and Pebesma (2021).

2 Materials and Methods

2.1 Study Area

The study area is the State of Rio de Janeiro (RJ) located between the geographical
coordinates 41◦ and 45◦ W and 20◦ 30' and 23◦ 30' S and is about 44,000 km2
in the Southeast of Brazil (Fig. 1). The area is divided into six geopolitical
mesoregions known as Baixadas, Centro Fluminense, Metropolitana do Rio de
Janeiro, Noroeste Fluminense, Norte Fluminense and Sul Fluminense. The state
also is characterized by eight large landscape types known as Serra da Bocaina,
Coastal Plains, Mountainous Area, North-Northwest Fluminense, Paraíba do Sul
River (Middle Valley), Serra Mantiqueira, Serra dos Órgãos, and Upper Itabapoana
River (Plateau), as described in (Mendonça-Santos et al., 2008).

2.2 Soil Survey in the State of Rio de Janeiro

The state of Rio de Janeiro, like the rest of Brazil, lacks detailed soil information
with satisfactory coverage to meet the demand. Most works that cover a large area
such as: Mapa semidetalhado de solos do município do Rio de Janeiro, Mapa de
reconhecimento de alta intensidade dos solos—quadrículas de Silva Jardim e Rio
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 103

Fig. 1 The study area location and elevation map, extracted from the SRTM DEM

das Ostras, estado do Rio de Janeiro and Levantamento de reconhecimento de alta


intensidade dos solos das bacias hidrográficas dos rios Guapi-Macacu e Caceribu
have a coarser scale than 1:50,000 (Table 1 and Fig. 2a). An exception is the Projir
survey (in the northern region) which covers an area of 2500 km2 on a general scale
of 1:10,000, which can be even more detailed depending on the thematic map, but
the dataset is not available online available as can be seen in Fig. 2b.
Of the surveys at a scale more detailed than 1:10,000, few cover an area larger
than 10 km2 , except the survey: Mapa de solos do assentamento e acampamento
Sebastião Lan I e II which covers an area of 20 km2 (Table 1).

2.3 Environmental Covariates and Dissimilarity Index

Covariates that represent the soil formation factors were raised for the entire state.
They were: Relief (elevation, slope, multiresolution index of valley bottom flatness),
which were derived from the Digital Elevation Model (DEM) Shuttle Radar
Topography Mission (SRTM) at 30 m spatial resolution; Climate (precipitation and
average temperature), which were obtained from https://www.worldclim.org/ with
original spatial resolution of 1 km (Fick & Hijmans, 2017); Organism (SAVI index),
derived from bands 4 (red) and 8 (NIR) of sentinel2 satellite. Geology and soil maps
despite being available were not used, because not all classes are included in the
reference area or where soil data is available.
104 E. M. Costa et al.

Table 1 Main soil surveys in the state of Rio de Janeiro, year of publication, scale, covered area
and mesoregion
Name Year Scale Area (km2 ) Mesoregion
Mapa de reconhecimento de baixa 2003 1:250,000 43,797.5 –
intensidade dos solos do estado do
Rio de Janeiro
Mapa semidetalhado de solos do 1980 1:75,000 1220.32 Metropolitana do RJ
município do Rio de Janeiro
Mapa de reconhecimento de alta 2001 1:100,000 2660 Baixadas
intensidade dos solos—quadrículas
de Silva Jardim e Rio das Ostras,
estado do Rio de Janeiro
Levantamento de reconhecimento 2015 1:50,000 2072 Metropolitana do RJ
de alta intensidade dos solos das
bacias hidrográficas dos rios
Guapi-Macacu e Caceribu
Mapa de solos do assentamento e 2010 1:10,000 20 Baixadas
acampamento Sebastião Lan I e II
Mapa semidetalhado de solos do 1998 1:20,000 359.45 Metropolitana do RJ
municipio de Paty do Alferes -RJ
Mapa semidetalhado de solos da 2012 1:10,000 5.5 Noroeste Fluminense
microbacia do ribeirão barro
branco município de São José do
Ubá—RJ
Mapa de solos do médio alto curso 2012 1:100,000 484,69 Centro Fluminense
do rio grande, região serrana do
Estado do Rio de Janeiro
Mapa semidetalhado dos solos da 2012 1:10,000 5 Centro Fluminense
microbacia do pito aceso,
município de Bom Jardim—RJ
Mapa semidetalhado de solos da 2004 1:10,000 6.65 Norte Fluminense
microbacia do córrego da
Tábua—município de São Fidelis,
Rio de Janeiro
Mapa semidetalhado de solos da 2008 1:10,000 – Noroeste Fluminense
microbacia nossa Senhora das
Graças município de Itaperuna,
Rio de Janeiro**
Mapa semidetalhado de solos da 2008 1:10,000 – Noroeste Fluminense
microbacia Pau-Ferro município
de Itaperuna, Rio de Janeiro**
Mapa semidetalhado de solos da 2008 1:10,000 – Noroeste Fluminense
microbacia Santa Rosa, município
de Miracema, RJ**
Mapa de reconhecimento de média 2011 1:50,000 289.22 Noroeste Fluminense
intensidade dos solos da bacia
hidrográfica do Rio São
Domingos, RJ
(continued)
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 105

Table 1 (continued)
Name Year Scale Area (km2 ) Mesoregion
Levantamento semidetalhado dos 2015 1:25,000 14.62 Sul Fluminense
solos da antiga Fazenda São José
do Pinheiro, Pinheiral-RJ
Classes de solos e de aptidão das 1983 1:10,000 2500 Norte Fluminense
terras para irrigação- PROJIR**
Levantamento semidetalhado dos 1999 1:10,000 0.70 Metropolitana do RJ
solos da área do Sistema Integrado
de Produção Agroecológica (SIPA)
Km 47 Seropédica-RJ

2.3.1 Gower Similarity Index (GI)

The GI proposed by Gower (1971) as outlined by Mallavan et al. (2010), was


employed to measure the similarity among fields (legacy data-reference area).
p ⎛ ⎞
1⎲ |xik − xj k|
Sij =
. 1−
p range k
k=1

Where Sij is the GI between sites i and j; k represents each covariate; p is the number
of covariates; range k is the value range of covariate k in the whole study area. Thus,
Sij ranges between 0 and 1; a value of 1 means that the two individuals differ in
no character whereas 0 means they differ maximally in all their characters. In our
case, the interpretation is the opposite of the one presented above, that is, values
of 1-Sij equal to 0 means that the two individuals differ in no character whereas
1 means they differ maximally in all their characters. All legacy soil survey from
RJ state were downloaded from the Geoinfo platform of the Brazilian Agricultural
Research Corporation (Geoinfo-Embrapa). They were used as reference areas (RA)
for computing the GI, one each time. The final value with the dissimilarity classes
is an average of all maps that were classified following the GI criterion, greater
than 0.12, “Dissimilar” and less than or equal to 0.12 “Similar”. The 0.12 criterion
was chosen because it is equivalent to the 75th quantile. A detailed scheme of the
methodology can be seen in Fig. 3.

2.3.2 Area of Applicability of Spatial Prediction Models

In the methodology “area of applicability” (AOA), a machine learning algorithm


model is applied to learn the relationships between covariates and target variable, in
this case, clay (n = 652).
In our case a Random Forest is applied as machine learning algorithm (others can
be used as well, as long as variable importance is returned). The model was validated
by cross-validation to estimate the covariates error. The estimation of the AOA will
106 E. M. Costa et al.

Fig. 2 (a) Survey of soils available on the geoinfo portal for the state of Rio de Janeiro; (b) Soil
samples available in the Febr project database for the state of Rio de Janeiro
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 107

require the importance of the individual predictor variables (Meyer & Pebesma,
2021). The AOA calculation takes the model as inputs to extract the importance
of the covariates, used as weights in multidimensional distance calculation for
calculating the dissimilarity index (DI) (Fig. 4).
To ensure that all covariates are treated equally, the predictor variables are scaled
by dividing mean-centred values by their respective standard deviations,
( )
s
Xij
. = Xij − X.j /σj

Fig. 3 Methodology flowchart for selection of priority areas for soil survey in the state of Rio de
Janeiro using Gower index approach

Fig. 4 Methodology flowchart for selection of priority areas for soil survey in the state of Rio de
Janeiro using AOA approach
108 E. M. Costa et al.

s refers to the scaled value of the jth covariate corresponding to the ith
where .Xij
observation, .X .j to the mean and σ j to the standard deviation of the jth covariate.
When calculating distances based on standardized covariates, all variables are
given equal importance. However, this approach doesn’t account for the varying
relevance of different variables within the predictor space. In machine learning
models, some covariates are more important than others in shaping prediction
patterns (Meyer & Pebesma, 2021). To reflect the variable importance in the
computation of distances in the covariates space the authors propose to multiply the
scaled variables with the unstandardized importance estimate wj for each variable j
before calculating the distance. As done:
⎛ ⎞
.
sw
Xij = Wj Xij
S

for the multivariate distance calculation is used Euclidean distance. For example,
The Euclidean distance between two arbitrary points a and b in the covariates space
is calculated as:

|⎲
| p ⎛ sw ⎞2
.d (a, b) = ⏌ Xaj − Xbj
sw

j =1

For a new prediction location k, the distance to the nearest training data point i
dk = argi min d(k, i).
To facilitate interpretation and enable model comparison, distances in the
predictor space for new prediction locations (k) are standardized by dividing the
minimum distance to the nearest training data point (dk ) by the average distance in
the training data (d). This quotient is referred to as the dissimilarity index (DIk ).

DI = dk /d
.

With .d the average of all pairwise distances between the n training data.
The AOA is derived from the DI by using a threshold. The threshold is the
(outlier-removed) maximum DI observed in the training data where the DI of
the training data is calculated by considering the cross-validation folds (Meyer
& Pebesma, 2021). the parameters for the case study were: Covariates weights:
precipitation (30.75); temperature (31.31); SAVI (19.40); DEM (32.15); slope
(20.35) and MRVBF (7.82) and the AOA threshold: 0.34.

3 Results

As can be seen, none of the areas (taken as a reference) is capable of representing


very well the entire state of Rio de Janeiro (Fig. 5) and, as expected, they represent
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 109

Fig. 5 Gower index maps using as reference different regions of Rio de Janeiro where soil surveys
were carried out

their surroundings where the environmental characteristics are similar. For example,
the Região Serrana RA represents very well the entire Serrana region of Rio, with
GI less than 0.1, but does not represent very well the coastal regions or the north
and northwest Fluminese regions.
For example, the Projir region is representative of the northern region and some
parts of the coast, with a GI less than 0.1, but it represents very well neither the
Serrana region nor the mountainous region of the southern Fluminense in the Serra
do Mar and Serra da Mantiqueira regions.
Following the same pattern, when we analyze soil data, which are spread across
all regions of the state, and the AOA approach, the result is similar, that is, there are
still regions that are poorly represented, as shown in the DI (Fig. 6).
When combining all GI maps, it is possible to see that the areas considered
dissimilar are mainly the areas with rugged terrain in the Serra dos Órgãos, Serra
da Mantiqueira and Bocaina regions. And some parts of the Coastal Plains region
(Fig. 7).
These were also the areas with the greatest dissimilarity when a threshold of
0.34 was applied on DI in the AOA approach. However, as there are more data from
soils that soil surveys (some are from theses, dissertations, or isolated works) the
dissimilarity by the AOA approach was slightly smaller (Fig. 8).
Following the idea of using the areas where soil surveys have already been
carried out as representative RA, it is possible to define model extrapolation limits
(transferability) for a given region. Combining this information, it is possible
110 E. M. Costa et al.

Fig. 6 (a) Clay content of the topsoil layer for the state of Rio de Janeiro; (b) Tthe Dissimilarity
Index (DI) calculated based on the importance of the covariates used in the predictive model and
the distance between the points
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 111

Fig. 7 Dissimilarity class map based on average Gower index

Fig. 8 Dissimilarity class map based on AOA

to define areas that have little similarity to any existent RA (GI approach) or
that there is limitation in the transferability (applicability) of a predictive model
(AOA approach), that is, priority areas for soil survey following the demands of
PronaSolos.

4 Discussion

In the context of planning new soil surveys in Brazil, our results provide an initial
approximation for the efficient use of legacy soil data. According to Hendriks et
112 E. M. Costa et al.

al. (2019), many of the reviewed studies could have reduced their sampling effort
by making better use of available data, tools, and techniques. We believe that areas
with less soil information or representation in existing data (legacy data) should be
given priority unless there is a specific objective. By focusing our efforts on a small
but representative area with similar environmental characteristics, we can optimize
time, financial, and human resources. This approach can serve as a reference for a
larger area with similar characteristics, as suggested by Lagacherie et al. (2001).
For instance, if we carry out a soil survey in Serra dos Órgãos, can we use
it as a basis for Serra da Mantiqueira or Bocaína? GI can help simulate this
scenario based on the transferability model, which maps other regions based on
the similarity of their environmental characteristics, as proposed by Grunwald et al.
(2018). Additionally, the area of applicability (AOA) of spatial predictive models
can be used to validate this result, as proposed by Meyer and Pebesma (2021). The
use of GI to analyze the similarity of landscapes to determine the transferability of
predictive models is discussed in detail by Ferreira et al. (2023).

5 Conclusions

In conclusion, the territory of Rio de Janeiro currently lacks satisfactory pedological


maps to meet present and future demands. To address this issue, a reference area
approach can be adopted, whereby a small but representative area with similar
environmental characteristics is surveyed in detail and used as a reference for a
larger region. This approach can save time, money, and personnel resources, as
demonstrated by the Gower index similarity analysis and validated by the area of
applicability approach.
By considering the already mapped areas as a reference and the soil data
available, the areas with the greatest dissimilarity were identified as I-Bocaina,
VI-Serra da Mantiqueira, VII-Serra dos Órgãos, and II-Coastal Plains (especially
the northern part of Rio de Janeiro). Therefore, future soil survey efforts should
prioritize these regions to improve the coverage of pedological maps in the territory
of Rio de Janeiro.

References

Ferreira, A. C. de, S., Ceddia, M. B., Costa, E. M., Pinheiro, É. F. M., do Nascimento, M. M., &
Vasques, G. M. (2022). Use of airborne radar images and machine learning algorithms to map
soil clay, silt, and sand contents in remote areas under the Amazon rainforest. Remote Sensing,
14, 5711. https://doi.org/10.3390/rs14225711
Ferreira, A. C. S., Pinheiro, É. F. M., Costa, E. M., & Ceddia, M. B. (2023). Predicting soil
carbon stock in remote areas of the Central Amazon region using machine learning techniques.
Geoderma Regional, 32, e00614. https://doi.org/10.1016/j.geodrs.2023.e00614
Using Legacy Soil Data to Plan New Data Collection: Study Case of Rio de. . . 113

Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1-km spatial resolution climate surfaces
for global land areas. International Journal of Climatology, 37, 4302–4315. https://doi.org/
10.1002/joc.5086
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27,
857. https://doi.org/10.2307/2528823
Grunwald, S., Yu, C., & Xiong, X. (2018). Transferability and scalability of soil Total carbon
prediction models in Florida, USA. Pedosphere, 28, 856–872. https://doi.org/10.1016/S1002-
0160(18)60048-7
Hendriks, C. M. J., Stoorvogel, J. J., Lutz, F., & Claessens, L. (2019). When can legacy soil data
be used, and when should new data be collected instead? Geoderma, 348, 181–188. https://
doi.org/10.1016/j.geoderma.2019.04.026
Lagacherie, P., Robbez-Masson, J. M., Nguyen-The, N., & Barthès, J. P. (2001). Mapping of
reference area representativity using a mathematical soilscape distance. Geoderma, 101, 105–
118. https://doi.org/10.1016/S0016-7061(00)00101-4
Mallavan, B. P., Minasny, B., & McBratney, A. B. (2010). Homosoil, a methodology for
quantitative extrapolation of soil information across the globe. In Digital soil mapping (pp.
137–150). Springer.
Mendonça-Santos, M. L., Santos, H. G., Dart, R. O., & Pares, J. G. (2008). Digital mapping of soil
classes in Rio de Janeiro State, Brazil: Data, modelling and prediction. In A. E. Hartemink, A.
McBratney, & M. L. de Mendonça-Santos (Eds.), Digital soil mapping with limited data (pp.
381–396). Springer Netherlands. https://doi.org/10.1007/978-1-4020-8592-5_34
Meyer, H., & Pebesma, E. (2021). Predicting into unknown space? Estimating the area of
applicability of spatial prediction models. Methods in Ecology and Evolution, 12, 1620–1633.
https://doi.org/10.1111/2041-210X.13650
Samuel-Rosa, A., Dalmolin, R. S. D., Moura-Bueno, J. M., Teixeira, W. G., & Alba, J. M. F. (2020).
Open legacy soil survey data in Brazil: Geospatial data quality and how to improve it. Scientia
Agricola (Piracicaba, Braz.), 77, e20170430. https://doi.org/10.1590/1678-992x-2017-0430
Santos, H. G., Aglio, M. L. D., de Dart, R. O., Mendonça-de Santos, M. L., Souza, J. S., &
Mendonça, L. R. (2013). Distribuição espacial dos níveis de levantamento de solos no Brasil.
In XXXIV Congresso Brasileiro de Ciência Do Solo. Florianópolis, Santa Catarina (pp. 1–4).
Exploratory Analysis from Harmonized
Legacy Soil Data to Support Digital Soil
Mapping in Brazilian Midwest

Waldir de Carvalho Junior , Nilson Rendeiro Pereira ,


Silvio Barge Bhering , Braz Calderano Filho, Cesar da Silva Chagas,
Helena Saraiva Koenow Pinheiro , José Ronaldo Pereira,
Carlos Henrique Lemos Lopes, and Renan Borges Leal

1 Introduction

Legacy soil data refers to pedological information obtained in the course of past
soil surveys in a certain country or region (Waltner et al., 2014) and since these
data were often created through different methodologies, it has become a necessity
to filter, analyze and harmonize them to be useful for new interpretation. Legacy
data are useful sources of information on the spatial variation of soil properties and
classes. However, there are well-known the problems to use legacy data (Mayr et al.,
2010). Lagacherie and McBratney (2007) pointed out that legacy data is the input
of the spatial soil information system (SSINFO) and could establish reliable spatial
soil inference systems (SSINFERSs). This is the requirements to produce digital
soil maps (DSMs). Attend the input information necessary for the SSINFO requires
a large economic budget and intense fieldwork to obtain sufficient soil samples
(Minasny & McBratney, 2015), highlighting the importance of the legacy data. The
use of legacy data for soil surveys is important to reducing primary data collection,
and enhance available data that could otherwise be neglected. Soil legacy data are
basic input data for digital soil mapping (Sulaeman et al., 2013).

W. de Carvalho Junior (✉) · N. R. Pereira · S. B. Bhering · B. Calderano Filho · C. d. S. Chagas ·


J. R. Pereira
Embrapa Solos, Rio de Janeiro, RJ, Brazil
e-mail: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]
H. S. K. Pinheiro
Soil Science Department, Rio de Janeiro Federal Rural University, Seropédica, Brazil
C. H. L. Lopes · R. B. Leal
Secretaria de Estado de Meio Ambiente, Desenvolvimento, Ciência, Tecnologia e Inovação –
SEMADESC, Campo Grande, Brazil
e-mail: [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 115
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_9
116 W. de Carvalho Junior et al.

The accuracy of DSMs is influenced by the quality of the input data. The use
of legacy data obtained from past soil surveys was considered a potential method
to improve DSM input data. However, changes in soil classification systems have
occurred in many countries, and legacy data are therefore inappropriate or not
directly applicable in the generation of the latest DSMs (Yang et al., 2022; Dobos
et al., 2010). According to Mayr et al. (2010) numerous problems could arise when
using legacy data. For example, data whose original sampling designs may not be
appropriate for news goals. Other practical issues that arise when legacy data are
used are: the mixture of quantity, quality and types of data (ordinal, continuous and
categorical); mapping problems such as noncontiguous coverages, differences in
taxonomic systems, scales and others.
Over the last few decades, digital soil mapping (DSM) approaches have been
developed to provide soil information and inferences in a continuous manner
(McBratney et al., 2003; Malone et al., 2009, Minasny et al., 2013) using mainly
available harmonized legacy data. There are some soils database available over
the world and they have been used to map soil attributes and classes (Sorenson
et al., 2021; Reddy & Das, 2023; Reddy et al., 2021; Rasaei & Bogaert, 2019; and
others). One of them is The World Soil Information Service (WoSIS) that provides
quality-assessed and standardized soil profile data to support digital soil mapping
and environmental applications at broadscale levels (Batjes et al., 2020).
This paper proposes a methodology to verify the use of legacy data and some
predictor covariates, with an exploratory spatial analysis of harmonized legacy data
over the Paraguay river basin, in the state of Mato Grosso do Sul, Brazil (Fig. 1). The
objective is to verify the adequacy of the spatial distribution of the legacy data as
a function of covariates such as altimetry, slope, lithology, biome types, vegetation
cover and soil map in the study area.

2 Methodology

The dataset comprises 1430 soil profiles collected in previous works from Embrapa
and IBGE, without use statistical sampling techniques, important step in digital
soil mapping. These data belong from different sources and was useful to the
environmental planning project of Embrapa Solos and will soon be available in the
institution’s database. Numerical altimetry and slope covariates were obtained from
NASA JPL (2020). Thematic covariates representing Biomes, Lithology, Soils and
Vegetation were obtained from BDiA – IBGE (2021).
To become appropriate to use in digital soil mapping and to be analyzed, the
legacy data of soil profiles were harmonized into the same taxonomic system. We
used the WRB (IUSS Working Group, 2006) because it makes possible reclassify
any taxonomic order from systems around the world. Also, the legacy soil map
legend was harmonized to be compared with the soil profiles.
The study area is the Paraguay River watershed in the Mato Grosso do Sul state.
It represents approximately 53% of the state’s area, or 187,443 km2 . Figure 1 shows
Exploratory Analysis from Harmonized Legacy Soil Data. . . 117

the study area and thematic covariates used in this study, namely, lithology, soils and
vegetation. Continuous covariates analyzed was the digital elevation model (DEM)
and slope (in percentage).
To compute the spatial analyzes were performed spatial operations between
points and polygons (thematic) and rasters (continuous) to obtain the values of
the covariates to each soil point. The analyzes compare soil classes of the samples
against the soil map, biomes map, vegetation map, lithology map, DEM and slope.
All the spatial analyses were executed in ArcGIS Desktop 10.5, with support of R
(R Core Team, 2022) and table sheets.

Fig. 1 (a) Study area and soil samples distribution; (b) Soil map; (c) Vegetation map; (d)
Lithology map
118 W. de Carvalho Junior et al.

Fig. 1 (continued)

3 Results and Discussion

The statistical analysis of DEM and slope over the soil samples and study area, show
the same trend (Fig. 2). The DEM values are close between study area and soil
samples with a minimum difference. With the exception of the maximum values,
the slope values between the soil samples and the study area were very similar. The
areas with slope above 45% are approximately 01% of the study area (Fig. 2a), and
so we consider that they follow the same trend. There were no significant differences
between the DEM and SLOPE over the study area and soil samples.
Table 1 presents 1430 soil profiles grouped by class according to FAO (2021). We
can note an unbalanced distribution of soil classes, with predominancy of Ferralsols,
Arenosols and Acrisols. These three soil classes represent 68% of the soil samples,
Exploratory Analysis from Harmonized Legacy Soil Data. . . 119

Fig. 1 (continued)

and the Ferralsols represents 37.9% of the total soil samples. This distribution was
analyzed according with the distribution into the soil map units (Table 2).
The result of spatial join of soil samples and soil map shows that the soil samples
distribution has a quite relation with soils map units distributions (Table 2). Only
the soil units map of Cambisols and Histosols doesn’t have soil samples and they
have together less than 0.5% of the study area.
The greatest soil map unit comprises the Planosols, with 27% of the study area
and 7.1% (101) of soil samples. The second soil map unit comprises the Arenosols
with 24.7% of the study area and 21.5% of the soil samples. The third one represents
the Ferralsol with 13.6% in area and 34.5% of soil samples. This comparison of
others soil units is showed in Fig. 3. Despite the great difference between Ferralsol
map unit and number of soil samples, we can suppose that it occurs because of the
120 W. de Carvalho Junior et al.

Fig. 1 (continued)

importance of this soil class to agricultural use and the needed to better describe and
sample this soil type to attend the final users.
In terms of area, the principal soil map unit is the Planosols (with 52,228 km2 ).
Of the 101 classes of soil samples, 46 are planosols and 14 are gleysols, the others
occur with values less than or equal to 7 (in the case of Plinthsols and Podzois) (Fig.
4). The soil samples, other than Planosols, may be present in the second component
of the soil unit, but we do not have access to this information from BDIA (IBGE,
2021). We can see that the majority of soil samples class in this map unit referred to
poorly drainage soils that is according with Planosols´ environment.
The soil map unit of Arenosol represents 46,293 km2 and have 307 soil samples.
Into these soil samples, 148 are classified as Arenosols, 89 as Ferralsols, 27 as
Acrisols and others (Fig. 5). Into these map unit the classes of Arenosols and
Exploratory Analysis from Harmonized Legacy Soil Data. . . 121

Fig. 2 (a) Distribution of DEM (meters); (b) Slope (%) Statistics values of study area and legacy
soil samples

Ferralsols show some similarity in terms of depth, granulometry and drainage


conditions. So, we can consider that about 50% of the soil samples have the same
class of the soil unit map, and can increase the number considering the Ferralsol soil
samples as second member in soil unit map.
Considering the third soil unit in area, the Ferralsol with 25,501 km2 have 493
soil samples or 34.5% of total soil samples. Into this soil unit, we found that the
majority was classified as Ferralsol (65%) or Arenosols (17%). The samples with
others soil classes are distributed as in Fig. 6. Again, in this soil unit, the majority of
soil samples belongs to the same soil class than the map unit. It can be considered a
right classification according to soil map.
122 W. de Carvalho Junior et al.

Table 1 Class distribution of WRB Class Samples %


soil samples according to
WRB soil taxonomy (FAO, Acrisols 163 11.4
2021) Arenosols 271 19.0
Cambisols 54 3.8
Chernozems 43 3.0
Ferralsols 542 37.9
Fluvisols 1 0.1
Gleysols 49 3.4
Leptosols 68 4.8
Luvisols 7 0.5
Nitisols 34 2.4
Planosols 80 5.6
Plinthosols 47 3.3
Podzols 20 1.4
Regosols 32 2.2
Vertisols 19 1.3

Table 2 Soil units map, percent and km2 distribution and soil samples within each soil unit
Soil map legend WRB km2 % Soil samples % Soil samples
Acrisols 13,098 6.99 218 15.2
Arenosols 46,293 24.70 307 21.5
Cambisols 755 0.40 0 0.0
Chernozems 3931 2.10 26 1.8
Ferralsols 25,501 13.61 493 34.5
Gleysols 9702 5.18 27 1.9
Histosols 8 0.00 0 0.0
Leptosols 10,752 5.74 71 5.0
Nitisols 3366 1.80 62 4.3
Planosols 52,228 27.87 101 7.1
Plinthosols 5775 3.08 28 2.0
Regosols 8853 4.72 66 4.6
Rock outcrops 672 0.36 3 0.2
Vertisols 5466 2.92 28 2.0

The spatial relation between vegetation covariate and soil samples was done,
founding that the largest vegetation map unit is Savannah with 64% the total area.
Contact vegetation map unit has 24% of the total area. Others vegetation map units
occurs but they are less expressive in terms of extension (Fig. 7).
The soil samples distribution in relation to vegetation covariate are show in Table
3. The largest vegetation map unit (Savannah) is also the one with the largest number
of soil samples with 69.9%. The second vegetation map unit (Contact) keep 27% of
soil samples. Together these two types comprises 97% of soil samples.
Considering Lithology covariate, the largest mapping unit is Clay Deposits with
47% of the total area. This unit contains 23.5% of the soil samples (Fig. 8). The
Exploratory Analysis from Harmonized Legacy Soil Data. . . 123

Fig. 3 Graphical comparison between % of soil map units and of soil samples

Fig. 4 Distribution of samples soil class into Planosol map unit

second unit considering the largest area is Sandstone, who contains the largest
number of soil samples (37%). Basic Rocks and Sand Deposits together contains
17% of soil samples. The soil samples distribution in relation to Lithology map
units is more spread than the others covariates but maintain following the area
distribution.
124 W. de Carvalho Junior et al.

Fig. 5 Distribution of soil samples class into Arenosol map unit

Fig. 6 Distribution of soil samples class into Ferralsol map unit

4 Conclusions

The use of analyzed soil samples (1430 soil samples), which come from different
studies, requires prior harmonization. The spatial analysis gathering the soil samples
and the covariates, in general, showed a similar distribution.
Exploratory Analysis from Harmonized Legacy Soil Data. . . 125

Fig. 7 Area distribution of vegetation map unit

Table 3 Distribution of vegetation map units and its soil samples


Vegetation class km2 % soil samples % soil of samples
Contact area 45,213.16 24 388 27.1
Deciduous Seasonal Forest 5332.44 3 19 1.3
Seasonal Semideciduous Forest 1785.59 1 7 0.5
Savannah 120,749.72 64 999 69.9
Savannah-estepe 13,036.91 7 17 1.2

Comparing elevation and slope from the soil samples to the study area showed
an interesting relation when both do not have significant difference in distribution.
They follow the same distribution trend, between soil samples and altimetry and
slope values.
The soil samples also follow the thematic covariates distribution, when the largest
areas of each covariate map unit has the greater number of soil samples.
The performed analysis attested the quality of both dataset (soil data and
covariates).
These conclusions showed that even though coming from different sources, they
present a similar distribution. It is believed that the spatial distribution of soil
samples has correspondence with the covariates of the entire study area and can
be used to produce digital soil maps.
126 W. de Carvalho Junior et al.

Fig. 8 Graphical comparison between % of Lithology map units and soil samples

References

Batjes, N. H., Ribeiro, E., & Van Oostrum, A. (2020). Standardised soil profile data to support
global mapping and modelling (WoSIS snapshot 2019). Earth System Science Data, 12, 299–
320.
BDia – IBGE. (2021). Banco de Informações Ambientais. From https://bdiaweb.ibge.gov.br/#/
home. Accessed 18 Aug 2021.
Dobos, E., Bialkó, T., Micheli, E., & Kobza, J. (2010). Legacy soil data harmonization and database
development. In J. L. Boettinger, D. W. Howell, A. C. Moore, A. E. Hartemink, & S. Kienast-
Brown (Eds.), Digital soil mapping. Progress in soil science (Vol. 2). Springer. https://doi.org/
10.1007/978-90-481-8863-5_25
FAO. (2021). World reference base for soil resources 2014. http://www.fao.org/soils-portal/data-
hub/soil-classification/world-reference-base. Accessed 19 Aug 2021.
IUSS Working Group WRB. (2006). World reference base for soil resources 2014 (World soil
resources reports no. 106). FAO.
Lagacherie, P., & McBratney, A. B. (2007). Spatial soil information systems and spatial soil
inference systems: Perspectives for digital soil mapping. Developments in Soil Science, 31,
3–22.
Malone, B. P., McBratney, A. B., Minasny, B., & Laslett, G. M. (2009). Mapping continuous depth
functions of soil carbon storage and available water capacity. Geoderma, 154(1–2), 138–152.
Mayr, T., Rivas-Casado, M., Bellamy, P., Palmer, R., Zawadzka, J., & Corstanje, R. (2010). Two
methods for using legacy data in digital soil mapping. In J. L. Boettinger, D. W. Howell, A.
C. Moore, A. E. Hartemink, & S. Kienast-Brown (Eds.), Digital soil mapping. Progress in soil
science (Vol. 2). Springer. https://doi.org/10.1007/978-90-481-8863-5_16
McBratney, A. B., Mendonça Santos, M. L., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117(1–2), 3–52.
Minasny, B., & McBratney, A. B. (2015). Digital soil mapping: a brief history and some lessons.
Geoderma, 264, 301–311.
Exploratory Analysis from Harmonized Legacy Soil Data. . . 127

Minasny, B., McBratney, A. B., Malone, B. P., & Wheeler, I. (2013). Digital mapping of soil
carbon. Advances in Agronomy, 118, 1–47.
NASA JPL. (2020). NASADEM merged DEM global 1 arc second nc V001. NASA EOSDIS Land
Processes DAAC. https://doi.org/10.5067/MEaSUREs/NASADEM/NASADEM_NC.001.
Accessed 28 Aug 2020.
R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for
Statistical Computing. https://www.R-project.org/
Rasaei, Z., & Bogaert, P. (2019). Spatial filtering and Bayesian data fusion for mapping soil
properties: A case study combining legacy and remotely sensed data in Iran. Geoderma, 344,
50–62. https://doi.org/10.1016/j.geoderma.2019.02.031. ISSN 0016-7061.
Reddy, N. N., & Das, B. S. (2023). Digital soil mapping of key secondary soil properties using
pedotransfer functions and Indian legacy soil data. Geoderma, 429, 116265. https://doi.org/
10.1016/j.geoderma.2022.116265. ISSN 0016-7061.
Reddy, N. N., Chakraborty, P., Roy, S., Singh, K., Minasny, B., McBratney, A. B., Biswas, A., &
Das, B. S. (2021). Legacy data-based national-scale digital mapping of key soil properties in
India. Geoderma, 381, 114684. https://doi.org/10.1016/j.geoderma.2020.114684. ISSN 0016-
7061.
Sorenson, P. T., Shirtliffe, S. J., & Bedard-Haughn, A. K. (2021). Predictive soil mapping using
historic bare soil composite imagery and legacy soil survey data. Geoderma, 401, 115316.
https://doi.org/10.1016/j.geoderma.2021.115316. ISSN 0016-7061.
Sulaeman, Y., Minasny, B., McBratney, A. B., Sarwani, M., & Sutandi, A. (2013). Harmonizing
legacy soil data for digital soil mapping in Indonesia. Geoderma, 192, 77–85. https://doi.org/
10.1016/j.geoderma.2012.08.005. ISSN 0016-7061.
Waltner, I., Csakiné Michéli, E., Fuchs, M., Láng, V., Pásztor, L., Bakacsi, Z., Laborczi, A., &
Szabo, J. (2014). Digital mapping of selected WRB units based on vast and diverse legacy data.
In D. Arrouays, N. McKenzie, J. Hempel, A. C. R. DeForges, & A. McBratney (Eds.), Global
soil map (pp. 313–317). CRC Press/Taylor and Francis Group.
Yang, J., Guan, X., Luo, M., & Wang, T. (2022). Cross-system legacy data applied to digital soil
mapping: A case study of second National Soil Survey data in China. Geoderma Regional, 28.
https://doi.org/10.1016/j.geodrs.2022.e00489. ISSN 2352-0094.
Soil Organic Carbon Stock Estimation
Using Legacy Data: A Case Study
of North Fluminense Region—BR

Marcos Bacis Ceddia , Hugo Machado Rodrigues, Ana Carolina de Souza


Ferreira, Elias Mendes Costa, Érika Flávia Machado Pinheiro,
and Douglath Alves Corrêa Fernandes

1 Introduction

The world community becomes more concerned about global warming yearly
because of the growing CO2 emissions associated with human activities. According
to Smith et al. (2020), soil organic carbon stock (SOCS) represents around 1500–
2400 Gt of stock of carbon (C) (~5500–8800 Gt CO2 ) in the top meter of soils
globally. The lower estimate in the range is approximately three times the C
in vegetation and twice the stock of C in the atmosphere (Smith et al., 2020).
Therefore, a slight change in the SOCS can significantly impact the atmosphere and
climate change globally. Also, there is growing interest in estimating and mapping
the SOCS pool and its potential for change to sequester carbon at finer spatial
resolutions and larger geographic locations extents.
However, estimating and mapping the carbon stock is difficult due to the need
for data on organic carbon and soil bulk density (BD), which demands undisturbed
samples in different soil depths. Besides, SOCS and BD vary along space and
time, demanding high fieldwork and laboratory analysis. Moreover, BD is an

M. B. Ceddia (✉) · É. F. M. Pinheiro


Department of Agro Technologies and Sustainability (DATS), Institute of Agronomy, Federal
Rural University of Rio de Janeiro (UFRRJ), Seropédica, Brazil
H. M. Rodrigues · D. A. C. Fernandes
Laboratory of Water and Soils in Agroecosystem (LASA), Federal Rural University of Rio de
Janeiro (UFRRJ), Seropédica, Brazil
A. C. de Souza Ferreira
Laboratory of Soil Organic Matter (LMOS), Federal Rural University of Rio de Janeiro (UFRRJ),
Seropédica, Brazil
E. M. Costa
Federal Institute of Tocantins (IFTO), Pedro Afonso, TO, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 129
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_10
130 M. B. Ceddia et al.

attribute that is generally lacking in soil databases, preventing accurate calculation.


Therefore, many studies seek to overcome the limited availability of BD data by
using pedotransfer functions to estimate BD from other more readily available data
(e.g., sand, silt, clay, and carbon, among others). On the other hand, the choice
and use of pedotransfer functions must be made with care. For example, (Gomes et
al., 2017), working with soils in the Amazon region, demonstrated that the use of
different pedotransfer functions to estimate BD, which have not been validated for
a given location, can introduce an error in the final SOCS estimate of between 1.06
and 1.23 Mg C ha−1 (15 and 17%, respectively).
Legacy data are essential to generate maps of SOCS, especially for mapping large
areas. Therefore, these data must be digitized and stored in a database following
FAIR principles (Findable, Accessible, Interoperable, and Reusable). For example,
the North region of Rio de Janeiro State has a vibrant collection of soil data covering
different soil attributes. This region is traditionally associated with sugar cane
production and witnessed a rapid change in economic activities and, consequently,
in land use and cover (reduction of sugar cane plantations). Due to its strategic
importance, between 1981 and 1983, the extinct Institute of Sugar and Alcohol
developed a project to survey the physical environment called Project of Irrigation
and Drainage of the North Fluminense—PROJIR.
Throughout the project, a detailed soil map (scale 1:10,000) was produced
covering an area of 250,000 ha, where 218 complete profiles were surveyed. In
161 of the 218 soil profiles, it was possible to calculate the carbon stock up to 1
meter (m) of soil depth (using measured data of total organic carbon and BD). These
legacy data are being organized in a database system (platform www.multisoils.org),
allowing the assessment of the soil carbon stock for that period (1981–1983), which
can be used as a baseline for future studies on the variation of soil carbon stock
according to changes in land use and cover.
From SOCS data with geographic coordinates, it is possible to develop algo-
rithms to map the spatial variability of SOCS using different digital soil mapping
(DSM) techniques. DSM approaches range from simple linear statistical models,
geostatistical, and hybrid techniques to advanced and complex machine learning
(ML) ones (Lamichhane et al., 2019). Traditionally, geostatistical interpolators
(univariate—Ordinary Kriging, bivariate—Ordinary Cokriging, and multivariate—
Regression Kriging) have been widely used to generate spatial variability maps of
soil attributes, including SOCS.
Geostatistical interpolation methods assume that the spatial variation of any
continuous attribute is often too irregular to be modeled by a simple and smooth
mathematical function. It is considered that the variable can be best described by a
stochastic surface (Burrough et al., 1998). Besides, applying geostatistical methods
requires the stationarity of the random function. The existence of an experimental
semivariogram (intrinsic stationarity) is a central tool in geostatistics (Webster &
Oliver, 2007). The reliability or accuracy of an experimental semivariogram depends
on the density and the distribution of the dataset through the study site. In large
areas, commonly, the dataset presents low density and irregular distribution, which
hinders the use of geostatistics methods.
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 131

More recently, Machine Learning (ML) techniques have increased to map soils
and their attributes. ML techniques refer to a large class of data-driven algorithms
employed primarily for data mining and pattern recognition. The advantage of
using ML is that it is not conditioned to follow any statistical assumptions. The
algorithms can also handle many cross-correlated covariates (collinearity) as a
predictor (Wadoux et al., 2020).
Among ML algorithms, the most used and with the best performance for mapping
carbon in the soil is Random Forest (RF). However, although RF was better than
other algorithms in most comparative studies, no single model was the strongest in
all circumstances (Lamichhane et al., 2019). Besides, according to Wadoux et al.
(2020), the choice of the best prediction ML algorithm should not just be based
on its accuracy but also to attend the principles of plausibility (be valid in light
of the current knowledge and scientific theories), interpretability (the translation
of an abstract model or model output into terms understandable by humans) and
explainability (models that both predict and explain).
This work aimed to map the spatial variability of the SOCS of soils up
to 1 m of an area of 250,000 ha (≈ 30% of the North Fluminense region)
using the PROJIR dataset. As specific objectives, two approaches were compared:
geostatistical (Ordinary Kriging—OK) and ML (Random Forest—RF). The RF
algorithms were developed using three Digital Soil Mapping Approaches: (1) All
covariates; covariates selection; (2) Recursive Feature Elimination (RFE); and (3)
Expert Knowledge (EK).

2 Materials and Methods

2.1 Study Site

The study area is located in the North Fluminense region of the Rio de Janeiro
state (Brazil), between latitudes 21◦ 18' S and 22◦ 12' S and longitudes 41◦ 42' W and
40◦ 48' W (Fig. 1). The area covers approximately 250,000 hectares (~ 30%) of the
northern region of Rio de Janeiro. The climate is classified as Aw, according to
Köppen, with an average temperature of 23.9 ◦ C and average annual rainfall of
1112 mm.

2.2 The PROJIR Soil Database

To carry out this work, all detailed soil maps (1:10,000–125 soil charts) and
respective soil reports were digitized. The soil charts were georeferenced, allowing
the extraction of the coordinates of each soil observation (soil profiles and trenches).
The dataset identified each soil observation providing UTM coordinates and the
132 M. B. Ceddia et al.

Fig. 1 Study site and the spatial distribution of SOCS (161 points) and Texture data (660 points)

respective values of the 218 soil profiles with morphological descriptions and
physical and chemical analyses. In 161 soil profiles, soil organic carbon (SOC) data
and soil bulk density (BD) were found in horizons/layers up to 1 m of soil depth.
Besides, the sand, silt, and clay content of 660 soil observations (soil profile and
trenches) was digitized, considering the same soil depth (1 meter). Such information
was stored on the MultiSoils platform (www.multisoils.org).

2.3 Calculation of Soil Organic Carbon Stock—SOCS

For each of the 161 soil profiles, the SOCS was calculated at 0–100 cm depth. The
classical way of calculating SOC stock (kg C m−2 ) for a given depth consists of
summing C stocks by the horizon, determined as a product of BD, SOC content,
and horizon thickness (Eq. 1), according to Bernoux et al. (2002):

.SOC stock = (SOC × BD × T) (1)

Where:
SOCS is Soil Organic Carbon Stock (kg C m−2 );
SOC is the soil organic carbon (g kg−1 );
BD is the soil bulk density (Mg m−3 ) and,
T is the horizon thickness (m).
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 133

2.4 Soil Particle Size Fractions

To assist in the mapping of SOCS, the sand, silt, and clay content of the 660 soil
observations were used. The distribution of the 660 texture points is shown in Fig. 2.
As the values of the particle fraction are related to the horizons/layers (A, AB, BA,
B, C, AC, and CB, for example), which were separated and morphological described
during the soil survey, it was necessary to calculate the weighted average of these
fractions considering a soil depth of 100 cm (Eq. 2).

⎲n ⎲n
P100cm =
. P SF i ∗ Ti / Ti (2)
i=1 i=1

Where: P100cm is the particle size (clay, sand, or silt content) in the soil depth up to
100 cm, in g kg−1 ; PSFi is the PSF at the horizon i, in g kg−1 ; Ti is the thickness,

Fig. 2 (a–e) Maps of Sand, Silt, and Clay in classification approach and the geomorphological
features maps
134 M. B. Ceddia et al.

in meters, of the portion of the horizon i that lies within the desired layer, and n is
the number of horizons that have a portion within the desired layer.
This dataset with 660 georeferenced points of P100cm was used to map sand, clay,
and silt using Ordinary Kriging. The final maps were exported in a resolution of
30 meters. These attributes represent the soil (SL ) factor and were incorporated as
covariates.

2.5 Covariates Tested to Predict SOCS Using ML Techniques

For the development of prediction models using the RF algorithm, 21 covariates


were evaluated, which are listed in Table 1. These covariates are organized
according to the SCORPAN model (Eq. 3).

S = f (SL , C, O, R, P , A, N)
. (3)

Where:
S = soil classes or attributes (to be modeled);
SL = soil, other, previously or legacy measured properties of the soil at a point;
C = climate, climatic properties of the environment at a point;
O = organisms, including land cover and natural vegetation or fauna or human
activity;
R = relief, topography, landscape attributes;
P = parent material, lithology;
A = age, the time factor;
N = spatial or geographic position.
Thirteen covariates refer to the relief formation factor, the first twelve of Table
1 (continuous variables), derived from a digital elevation model with a spatial
resolution of 30 meters—Shuttle Radar Topography Mission (SRTM). The thir-
teenth and fourteenth covariates are categorical and represent the relief and parent
material, geomorphological and geological classes, respectively, and are presented
in Fig. 2d, e. Tables 2 and 3 describe the characteristics of the geomorphology
and geology classes, respectively. The fifteenth and sixteenth covariates represent
the climate formation factor (temperature and precipitation, continuous covariates).
Two covariates (the seventeenth and eighteenth) represent the space factor (N,
coordinates position—latitude, and longitude). Finally, the three last covariates
represent the previously measured properties of the soil (SL ) factor (discrete
covariates, sand, silt, and clay classes). The maps of sand, silt, and clay (Fig. 2a–c),
with a resolution of 30 meters, present four classes. These classes were created
based on the ranges of variations between the minimum value and the first quartile
(class 1), first quartile and mean value (class 2), mean value and third quartile
(class 3), third quartile, and maximum value (class 4). Despite the importance of
Table 1 Covariates used by the RF model to predict SOCS
SCORPAN
Factor Covariates Description Unity Reference
R 1. Digital Elevation Model (DEM) Represents each cell’s elevation in the model. m Hutchinson and Gallant
(2000)
R 2. Convergence Index (CI) The convergence/divergence index concerning d Conrad et al. (2018)
the second derivative.
R 3. Topographic Wetness Index (TWI) The function of slope and contributing area per d Boehner et al. (2002)
unit width orthogonal towards the flow direction.
R 4. Relative Slope Position (RSP) The position of a point relative to the summit and m Boehner and Selige
valley of a slope, with a value of 0 for the bottom (2006)
of the valley and 1 for the topa
R 5. Channel Network Distance (CND) Distance from the channel level of the local m Grimaldi et al. (2007)
drainage network to the groundb
R 6. Channel Network Base Level (CNBL) Vertical distance to the base level of the channel m Grimaldi et al. (2007)
networkc
R 7. LS factor (LSf) Equivalent to the topographic factor of the d Boehner and Selige
Revised Universal Land Loss Equation (RUSLE) (2006)
R 8. Multiresolution Index of Valley Bottom Identifies valley bottoms using a slope d Gallant and Dowling
Flatness (MRVBF) classification restricted to convergent areasd (2003)
R 9. Multiresolution index of the ridge top Indicates flat positions in high-elevation areas. d Gallant and Dowling
flatness (MRRTF) (2003)
R 10. Slope (S) The gradient of elevation changes between % Thompson et al. (2001)
neighboring cells
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . .

R 11. Generalized Surface (GS) Topology of a cell partition through its set of Mallet (2002)
Vertex Views and their associated involutions
R 12. Terrain Ruggedness Index (TRI) An index that can quantify surface roughness by d Conrad et al. (2018)
considering absolute elevations in the
surroundings of a given raster cell
R 13. Geomorphology map class (GEOM) Geomorphological classese d IBGE (2019)
PM 14. Geology map class (GEOL) Geology classes CPRM (2001)
135

(continued)
136

Table 1 (continued)
SCORPAN
Factor Covariates Description Unity Reference
Cl 15. Average Annual Temperature (AAT) Observation data (from 1950 to 2000), ◦ C x 10 Global Climate Data
interpolated to 30 arc seconds (~1 km) resolution (2008)
Cl 16. Average Annual Precipitation (AAP) Observation data (from 1950 to 2000), mm Global Climate Data
interpolated to 30 arc seconds (~1 km) resolution (2008)
N 17. Latitude (LAT) Latitude (y) m –
N 18. Longitude (LONG) Longitude (x) m –
Sa 19. Clay map class (CLAY) Clay spatial variability map generated by g.kg−1 www.multisoils.org
ordinary krigingf
Sa 20. Sand map class (SAND) Sand spatial variability map generated by g.kg−1 www.multisoils.org
ordinary krigingf
Sa 21. Silt map class (SILT) Silt spatial variability map generated by ordinary g.kg−1 www.multisoils.org
krigingf
a Function of slope and contributing area per unit width orthogonal towards the flow direction
b Similar to elevation and is defined as the difference in elevation between the cell and the nearest channel network
c The value of each cell is the spatially interpolated elevation of the channel networks
d Indicates flat surfaces at the bottom of the valley
e The mapping considers the ordering of geomorphological facts in a hierarchical taxonomy, identifying, according to the order of magnitude, subsets that

include: Morphostructural Domains, Geomorphological Regions, Geomorphological Units, Modeled, and Symbolized Relief Forms. 1:250,000
f Using 660 points collected from the PROJIR study area

d dimensionless, R Relief factor, Cl Climate factor, PM Parent Material factor, N Spatial or Geographic position, Sa Soil attribute
M. B. Ceddia et al.
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 137

Table 2 Geomorphological features, its description, and symbology


Geomorphological
features Description
Homogeneous Sharp It is based on a set of landforms of narrow and elongated tops
(H.S) carved into sediments, denoting structural control, defined by
embedded valleys
Homogeneous Convex It generates landforms of convex tops, sometimes denoting
(H.C) structural control, defined by shallow valleys and slopes of a
gentle slope carved by grooves and channels of the first order
Homogeneous Tabular It generates landforms of tabular tops, forming features of gently
(H.T) inclined ramps and humps carved in unconsolidated sedimentary
covers, denoting eventual structural control
River Plain (R.P) Flat areas resulting from fluvial accumulation are subject to
periodic flooding, including the current floodplains.
Consequently, they may contain meandering lakes, boreholes,
and alluvial dikes parallel to the current riverbed. In addition,
they occur in valleys with alluvial filling
Lacustrine Fluvial Plain A cluster of geomorphological typologies linked to the river and
(L.R.P) lake influence formations
Fluviomarine Plain (F.P) They are formed by the transport of sediments through marine
waters
Lagoon Plain (L.P) The flat area resulting from the combination of various processes
forms the lagoon bodies associated with coastal barriers
River Terrace (R.T) Fluvial accumulation of a flat shape, slightly inclined, presenting
rupture of slope concerning the riverbed and the recent
floodplains located at the lower level, carved due to flow
conditions and consequent resumption of erosion
Marine Terrace (M.T) Marine accumulation of a flat shape, slightly inclined to the sea,
presenting a rupture of the slope with the recent marine plain,
carved as a result of variation in the sea level, by erosive
processes, or even by neotectonics

the organism factor in predicting SOCS, neither satellite images nor maps of soil
use and coverage were found that could represent this factor at the time of the soil
survey (1982–1984).

2.6 Digital Soil Mapping Approaches

Along this work, two common digital soil mapping approaches were compared,
namely: Geostatistical (OK) and Machine Learning (RF). The flowchart with the
steps adopted is shown in Fig. 3. The geostatistical approach is traditionally used to
generate spatial variability maps of soil attributes, while RF algorithms have stood
out more recently for presenting good performance. The first (OK) is relatively
simpler to develop, as it works only with the target variable dataset (in this case,
SOCS).
138 M. B. Ceddia et al.

Table 3 Geological features, its description, and symbology


Geological features Description
Barreiras (B) Designates the sandy-clay layers of variegated colors that emerge
on the cliffs along the Brazilian coast
Bela Joana (B.J) Comprises a domain of rocks with hyperstenium, exhibiting
locally plutonic characteristics. It is presented as elongated bodies
with SW-NE direction in the northwest and north region of Rio de
Janeiro
Cordeiro (C) It comprises migmatite and various gneisses and can be interpreted
as a result of the fusion of the metasediments of the São Fidelis
group
Continental Water Bodies Inland waters are those present in rivers, lakes, and glaciers. Inland
(C.W.B) waters are those present on the surface of the Earth
Holocene Alluvial They present characteristics associated with lithological material
Deposits (A.D) of crystalline origin and ephemeral flows, such as spatial
discontinuity along the drainage channel, depositional architecture
indicative of lateral migration and bed grating by laminar flows
Holocene Fluviolagunar Sands and muds overlying layers of biodetritic sands and muddy
Plains Deposits (F.P.D) sediments with a lagoon bottom and occurrences of peats
Holocene Coastal Transitional deposits are arranged in areas near the shoreline or the
Deposits (C.D) paleo lines of coasts, essentially sandy sediments typical of beachy
and wind marine environments, as well as clayey and silty
sediments, with varying levels of organic matter
Desengano (D) It is characterized by a porphyritic granite gneiss with
recrystallized microline mega crystals. Orthoclase and plagioclase
Itaoca Granite (G) It is part of the Nova Friburgo suite and is located in Campos dos
Goytacazes, north of the State of Rio de Janeiro. It is a plutonic
body, approximately elliptical, 5 kilometers in diameter, intruded
into the metasedimentary rocks of the São Fidélis Group—Eastern
Terrain
Paraíba do Sul (P.S) Um set of rocks containing lenticular layers of Magnesian
limestones
São Fidélis (S.F) It is Constituted by granatiferous gneiss biotite, with sillimanite
and cordierite, and may present lenses of calcisilicatic rocks,
amphibolites, and feldspathic quartzites
São Pedro Hills (S.P) Erosive origin from the excavation of the Peripheral Depression of
São Paulo by the hydrographic network installed in the large old
structural lines reactivated with the Gondwana fragmentation

At the same time, the second (RF) requires more processing time, especially in
preparing and organizing maps of various covariates used as predictors. Thus, in this
work, the evaluation of the generated maps is done using not only the conventional
metrics (R², MAE, MSE, RMSE, and Bias) but also the relative improvement in
these metrics when using a more laborious technique (RF) concerning another
relatively more straightforward (OK). Specifically, in the case of the RF algorithm,
three methods of covariates selection were also evaluated: (1) All covariates; (2)
Recursive Feature Elimination (RFE); and (3) Expert selection.
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 139

Fig. 3 Flowchart of the mapping methodology; T Training; V Validation; Database Partition 75%
Training and 25% Validation; OK Ordinary Kriging; RF Random Forest

Finally, an analysis of the results is made considering the principles of plausibil-


ity (be valid in light of the current knowledge and scientific theories), interpretability
(the translation of an abstract model or model output into terms understandable by
humans), and explainability (models that both predict and explain).

2.7 Geostatistical Approach—Ordinary Kriging (OK)

Fundamental for using geostatistical techniques is calculating and modeling the


semivariogram. A spatially dependent semivariogram (structural semivariance with
a range and sill) allows the interpolation process to continue using kriging tech-
niques. The procedures for generating the SOCS map (calculating and modeling
the semivariogram and ordinary kriging) are detailed below. The entire process
of modeling and interpolation by kriging was done using the R software (gstat
package).
140 M. B. Ceddia et al.

2.8 The Semivariogram and Its Modeling

From the 121 SOCS data (spatial observations), experimental semivariograms were
calculated for spatial dependence evaluation and set a theoretical model that best
represented the data variability. The experimental semivariogram, (h), of n spatial
observations .Zxi , i = 1, ... n, was calculated using Eq. 4:

1 ⎲N (h) ⎾ ⏋
γ (h) =
. Zxi − Zxi+h (4)
2N(h) i=1

Where:
N(h) Is the Number of observations separated by a distance of h.
.Zxi Is the soil attribute (SOCS) value measured at a specific point (x1 ) of the grid.
.Zxi+h Is the soil attribute (SOCS) value measured at a specific neighborhood point

apart by a distance h distance xi + h .


The best semivariogram model (Exponential, Gaussian, Spherical) was chosen by
comparing the metrics (R², MAE, RMSE, MSE, and Bias) of the maps generated
with these models.

2.9 The Ordinary Kriging (OK)

The OK interpolation method only utilizes primary data measured at sampled


locations u to estimate the same primary at unsampled locations. For the study
site, SOCS stock is the primary variable Zi (u), measured at sampled locations u,
to estimate SOCS at unsampled locations (Z* OK (u)). The stationarity of the mean
is assumed only within a local neighborhood W(u), centered at the location u being
estimated.
The mean is deemed a constant but unknown value, i.e., m(u’) = constant but
unknown, ∀u' ∈ W(u) (Wackernagel, 2003). The OK estimator (Eq. 5) is written
as a linear combination of the n(u) data Zi (u) with a single unbiasedness constraint
(Eq. 6) as below:
⎲n(u)

ZOK
. (u) = λα (u) [Zi (u)] (5)
α=1

⎲n(u)
. with λOK =1 (6)
α=1 α

The unknown local mean m(u) is filtered from the linear estimator by forcing the
kriging weights (λ) to sum to 1 (Eq. 6). The weights λ are chosen so that the estimate
∗ 2
.Z
OK (u) is unbiased and that the estimation variance .σOK (u0 ) (Eqs. 7 and 8) is less
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 141

than for any other linear combination of the observed values. The minimum variance
∗ (u) is given by:
of .ZOK

⎲N
2
σOK
. (u0 ) = λα γ (ui , u0 ) + μ (7)
i=1

Furthermore, it is obtained when:


⎲N ( )
. λα γ ui , uj + μ = γ (ui , u0 ) (8)
i=1

Where:
γ (ui , uj ) Is the semivariance of z between the sampling points ui and uj ;
γ (u1 , u0 ) Is the semivariance of z between the sampling point ui and the unvisited
point u0 .
Both these quantities, γ (ui , uj ) and γ (u1 , u0 ), are obtained from the theoretical
model fitted to the experimental semivariogram, and the μ—is the Lagrange
multiplier required for the minimization.

2.10 Random Forest

The Random Forest algorithm seeks to relate the target variable with the different
information planes (covariates) through several attempts promoted by independent
decision trees. The performance of these models is influenced by the parameters
shown in Table 4. Three groups of environmental covariates were tested as
SOCS prediction models: (1) All covariates; (2) Selection via Recursive Feature
Elimination (RFE); and (3) Selection of covariates by expert knowledge.

Table 4 Parameters used in Random Forest models


Algorithm Parameters Definition Adjust
Random Forest mtry the number of variables used to produce each 2
tree.
ntree the number of trees. 500
nodesize the minimum number of data points in each Default
terminal node.
mtry the number of variables randomly sampled as candidates for each split; ntree the number of
trees to grow, nodesize the minimum size of terminal nodes
142 M. B. Ceddia et al.

2.11 Model 1—All covariates

In this case, no one selection process was applied, and the 21 covariates were
processed by the algorithm RF using the caret (Kuhn & Dormann, 2012) package
in R software.

2.12 Model 2—Recursive Feature Elimination—RFE

RFE is a reverse selection algorithm that iteratively eliminates the least essential
predictors from the model based on an initial measure of predictor importance
(Kuhn & Dormann, 2012). When the complete model is created, a measure of
varying importance is calculated and shows the predictor rankings in descending
order (Kuhn & Johnson, 2013). For internal evaluation of the model created during
the RFE process, cross-validation was repeated five times, removing ten variables
at random for each iteration. Of the 21 covariates used, the combination with the
lowest root mean square error (RMSE) value was chosen using the rfe function
presented in the caret package (Kuhn & Dormann, 2012).

2.13 Model 3—Expert Knowledge (EK)

The EK method previously reduced the covariates available for the algorithm to
less than 21 covariates presented. The choice of covariates was made based on the
relationships between the covariates and the primary/target variable (SOCS).
For continuous variables, choosing the covariates was based on Pearson’s
correlation analysis (5% significance). For categorical and discrete variables, the
distribution of SOCS was analyzed in two steps. Step 1: Evaluation of Boxplots of
SOCS data according to the classes present in each categorical variable (maps of
geology, geomorphology, sand, clay, and silt). To ensure that the tested categorical
variables were significant, step 2 was performed. An analysis of variance model
(ANOVA) was adjusted to the SOCS data as a function of each categorical covariate.
The variable chosen for the EK model was the one that presented significant
differences at the 10% level of significance throughout ANOVA.

2.14 Evaluation of the Accuracy of the Maps

The coefficient of determination (R²; Eq. 9) was used to evaluate the goodness-of-
fit of the RF models for SOCS, and the mean absolute error (MAE; Eq. 10), and
the root mean square error (RMSE; Eq. 11) and mean squared error (MSE; Eq. 12)
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 143

were used to assess their prediction accuracy. Also, the Bias (Eq. 13) was calculated
to compute the average amount by which the actual value is higher or lower than
predicted.
∑n
(Oi − Pi )2
R = 1 − ∑i=1 (
.
2
)2 (9)
n
i=1 Oi − O

1 ⎲n
MAE =
. |Oi − Pi | (10)
n i=1

/
1 ⎲n
.RMSE = (Oi − Pi )2 (11)
n i=1

1
.MSE = (Oi − P i)2 (12)
n

1 ⎲n
. Bias = (Oi − P i) (13)
n i=1

Where n is the number of observations, Oi and Pi are the observed and predicted
values, respectively, and .O is the mean of observed values.

2.15 Evaluation of the Relative Improvement

In order to evaluate how much improvement was obtained in the performance of


the maps generated by the machine learning (RF) algorithm concerning the OK
technique, the relative improvement index was calculated (Eq. 14).

AccuracyRF − Accuracy OK
RI =
. × 100 (14)
AccuracyOK

Where: RI is the relative improvement, in %, accuracy is the R2 , MAE, MSE,


RMSE, and Bias, respectively. RF is the error value using the Random Forest, and
OK is the error value using ordinary kriging.
144 M. B. Ceddia et al.

3 Results and Discussion

3.1 Descriptive Statistics of Organic Carbon Stock Data

The data referring to the SOCS statistics are organized in Table 5. The SOCS values
up to 1 meter of depth present great amplitude (minimum of 2.18 kg C.m−2 and a
maximum of 104.47 kg C.m−2 ). This large amplitude explains the high value of the
Coefficient of Variation (CV of almost 100%), which is expected due to the great
diversity of soils (organic and mineral soils with the presence of clays with activity
ranging from high to low) along the 250,000 hectares of the study area.
It is noted that the data of SOCS present a positively skewed distribution with
both mode (8.63 kg C.m−2 ) and the median (12.73 kg C.m−2 ) inferior to the mean
value (17.73 kg C.m−2 ). The skewness is very high (2.88) compared to the expected
value for a normal distribution (0.0). Furthermore, considering the excess kurtosis
value (9.27), it can be considered that the frequency distribution has not only a
log-normal but also a leptokurtic shape (high concentrations of SOCS concentrated
in data intervals between 0 to 20 kg C.m−2 ). Finally, still analyzing the frequency
distribution (Fig. 4a), it is noted that some data values ≥80 kg C.m−2 suggests the
possibility of being outliers. However, after checking the data, this hypothesis was
rejected because these values are associated with organic soil in the region.
Still evaluating the data in Table 5 and Fig. 4b, it is clear that the log
transformation in base 10 was enough to make the distribution similar to the normal
frequency distribution pattern. The transformed data were then used to calculate
the semivariance and generate the SOCS map through ordinary kriging. In some
cases, authors suggest no need to transform the data. However, the transformation
is important to reduce the effect of these extreme data (with a low frequency) can
cause in the interpolation process (Yamamoto & Landim, 2015).

Table 5 Descriptive statistics of the SOCS (1 meter soil depth)


Variable Set n Min Max Mean Median Mode S.D. Sk k CV%
SOCS (kg T 121 2.18 104.47 17.73 12.73 8.63 16.87 2.88 9.27 95.12
C.m−2 )
V 40 3.06 91.62 20.63 16.13 4.67 19.66 2.28 5.17 95.31
log SOCS T 121 0.34 2.02 1.13 1.10 0.94 0.29 0.70 0.84 25.77
(kg C.m−2 )
V 40 0.49 1.96 1.18 1.21 0.67 0.33 0.34 −0.24 28.33
T Train, V Validation, n Number of observations, Min minimum, Max maximum, SD standard
deviation, Sk Skewness, K Excess Kurtosis, CV% Coefficient of variance, Log SOCS logarithm at
base 10 for the training and validation sets
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 145

Fig. 4 Histograms of soil organic carbon stock data, (a) original format; (b) format transformed
to logarithm in base 10

Table 6 Validation metrics Ordinary Kriging


of semivariograms using the
Models R2 MAE RMSE MSE Bias
40 points of external
validation Gaussian 0.41 10.34 17.92 321.22 7.01
Spherical 0.38 10.38 18.03 325.12 7.06
Exponencial 0.35 10.46 18.18 330.34 6.99

3.2 Geostatistical Approach

The summary of the SOCS semivariogram’s calculation and modeling are shown
in Table 6 and Fig. 5. Among the models tested (Gaussian, Spherical, and Expo-
nential), the Gaussian showed the best performance (highest R2 , and lowest values
of MAE, MSE, and RMSE). The Gaussian model differs from the other models by
presenting a parabolic behavior of the semivariance at smaller distances. In this way,
for the SOCS variable in the region, the semivariance increases little within a radius
of ~250 meters (Fig. 5).
The parameters of the adjusted Gaussian model point to structural semivariance.
That is, from an initial value of 0.051 (C0 – nugget effect), the semivariance
increases (C1 – 0.030) until reaching the sill (C0 + C1 – 0.081) within a distance
of 6000 meters (range or spatial dependence distance). It is also noteworthy that
the nugget effect represents approximately 63% (C0 /C0 + C1 ) of the total variance
and that the Gaussian model explains 37% of the total semivariance of the data.
146 M. B. Ceddia et al.

Fig. 5 SOC semivariogram for Ordinary Kriging mapping

The relatively high nugget effect values means that the semivariance of SOCS is
predominantly random.
The estimated nugget effect value represents all noise associated with the
semivariance of SOCS, which is derived from the spacing between data observations
(sampling grid). The nugget effect is adjusted and not measured at zero distance.
Consequently, the density of the legacy data available influences its estimation.
Another component of the nugget effect is the error associated with the sample
collection and analytical determination of soil carbon content and soil bulk density.
The variography analysis shows that the SOCS attribute presents spatial depen-
dence and that the Gaussian model is the best fit to explain 37% of the structural
semivariance. Furthermore, according to the range of the model, pairs of points
within a radius of 6000 meters show spatial correlation and can be used to
interpolate SOCS values in unsampled locations. In the sequence of this work, the
maps of spatial variability of SOCS generated by ordinary kriging and the respective
kriging variance are presented (Fig. 6).

3.2.1 SOCS Ordinary Kriging map

Figures 6a, b show the ordinary kriging maps and their respective kriging variance,
respectively. Note that the values estimated by ordinary kriging vary from a
minimum of 6 kg C.m−2 to a maximum of 81 kg C.m−2 . There is a reduction in
amplitude compared with the minimum data and SOCS data maximums (2.18 and
104.47 kg C.m−2 ).
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 147

Fig. 6 (a) SOCS map via Ordinary Kriging; (b) Variance of ordinary kriging

This smoothing of the SOCS values in the kriged map is an expected result since
kriging minimizes the variance through the Lagrange Multiplier (kriging behaves
like a low-pass filter). Still looking at Fig. 6a, it can be seen that the highest SOCS
kriged values are found in the southwest part. Higher and lower SOCS values are
interleaved in the other parts of the map. This behavior reflects the moving average
weighting that ordinary kriging performs (greater weight for values closest to the
point to be estimated).
The results shown in Fig. 6b (kriging variance) are essential to evaluate the
regions where the SOCS estimate presents possible higher errors. The highest
kriging variance values are found in the regions with the lowest density of points
(edges and voids of SOCS points). These regions with higher kriging variance could
be used to decide where new soil collection could be done to improve the SOCS
map.

3.3 Random Forest Approach

The use of ML tools, as in the case of the Random Forest (RF) algorithm (a data-
oriented method), as highlighted by Wadoux et al. (2020), requires evaluating the
process of choosing the covariates for the prediction. This work tested three ways the
algorithm uses the covariates (RF Model 1—All covariates, RF Model 2— REF, and
RF Model 3—EK). Thus, some statistical analyses of the covariates are presented
before evaluating the metrics of the maps generated by these three procedures.
148 M. B. Ceddia et al.

Notably, ML tools are not based on statistical assumptions such as data normality
and multicollinearity. Therefore, the statistical analysis of the covariates aims
to support the understanding of how they behave in the study area, providing
information on their association with the target variable (SOCS) and supporting the
discussion on the assumptions of plausibility, interpretability, and explainability of
the models. The explanation about evaluating the covariates used in the algorithms
is crucial for the selection procedure via EK (Model 3) since the RFE (Model 2)
procedure is already established in the next section, and the All covariates (Model
1) method does not perform a selection.

3.3.1 Variables Selected by Recursive Feature Elimination—Model 2

From the selection method via RFE, the 21 covariates were tested for automatic
selection. As it was decided to select the group of covariates with lower RMSE,
the group of covariates had four: GS, Longitude (LONG), Precipitation, and CNBL
(Table 7).

Table 7 Results of the RFE method using the 21 covariates


Outer resampling method: Cross-Validated (10 fold, repeated five times)
Resampling performance over subset size:
Variables RMSE R2 MAE RMSESD R2 SD MAESD Selected
1 16.97 0.1832 11.366 5.818 0.2209 3.427
2 12.71 0.3659 8.606 3.929 0.3191 2.337
3 12.70 0.3706 8.626 4.094 0.3247 2.425
4 12.40 0.3846 8.500 3.797 0.3360 2.282 *
5 12.77 0.3638 8.776 3.725 0.3328 2.273
6 13.17 0.3319 8.970 3.626 0.3217 2.311
7 13.17 0.3336 9.072 3.717 0.3094 2.265
8 13.17 0.3363 9.079 3.669 0.3103 2.173
9 13.26 0.3273 9.166 3.748 0.3109 2.230
10 13.08 0.3376 9.085 3.707 0.3137 2.181
11 12.90 0.3480 9.002 3.814 0.3229 2.175
12 13.04 0.3376 9.106 3.893 0.3229 2.212
13 12.89 0.3475 9.017 3.882 0.3230 2.196
14 13.00 0.3396 9.118 3.785 0.3185 2.130
15 13.13 0.3339 9.185 3.851 0.3157 2.124
16 13.16 0.3344 9.222 3.841 0.3187 2.118
17 13.10 0.3387 9.175 3.976 0.3169 2.242
18 13.25 0.3272 9.305 3.964 0.3146 2.287
19 13.17 0.3308 9.305 3.995 0.3171 2.261
20 13.18 0.3332 9.294 3.955 0.3200 2.179
21 13.33 0.3222 9.424 4.128 0.3132 2.329
Note: Variables = number of covariates tested in each step of the RFE
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 149

Table 8 Pearson’s Covariates SOCS


correlation table between
continuous covariates and TWI 0.30a
SOCS target variable LAT −0.25a
LONG −0.23a
Precipitation 0.21a
TRI −0.20a
CND −0.19a
DEM −0.18a
Slope −0.18a
GS −0.18a
RSP −0.16
LS factor −0.15
CNBL −0.15
Temperature −0.13
CI −0.12
MRRTF −0.11
CLAY −0.09
SILT −0.03
SAND 0.02
MRVBF 0.02
a The covariates present

significant correlation
with the target variable
(SOCS)

3.3.2 Variables Selected by EK—Model 3

Since the covariates are different (continuous, categorical and discrete), evaluating
their association with the target variable (SOCS) uses different statistical analyses.
In the case of continuous covariates, Pearson’s correlation was used (Table 8).
The correlation coefficient values and respective significance are presented in
descending order. Among the 16 covariates, nine significantly correlated with
SOCS (TWI, Latitude, Longitude, Precipitation, TRI, CND, Elevation, Slope, and
GS) were selected to be made available for the Random Forest algorithm in the
EK method. Two correlate directly with SOCS (Topographic Wetness Index-TWI
and Precipitation). This correlation is expected since, in general, more humid
regions tend to promote more significant development of plants and biomass that
is deposited on the soil surface (leaves, trunks, and branches of plants) and in
the subsurface (roots). The SOCS values decrease as the values of the covariables
Latitude, Longitude, TRI, CND, Elevation, Slope, and GS increase. The soil texture
fractions (sand, silt, and clay) correlate lowest with SOCS values. However, their use
as possible covariates in the RF algorithm is evaluated as a discrete covariate (class
map of sand, silt, and clay) and the geological and geomorphological categorical
covariates.
150 M. B. Ceddia et al.

Fig. 7 Boxplot of SOCS values as a function of geomorphological classes

Fig. 8 Boxplot of SOCS values as a function of geological classes

The importance of categorical and discrete variables was based on the Boxplot
visualization (Figs. 7, 8, 9, 10 and 11). A Boxplot shows the data distribution of the
continuous variable for each category. If the distribution for each category is similar,
which means the boxes are aligned, then it indicates no correlation. When the data
distribution is different for each category, which means the boxes are far from each
other, then it indicates that there is a correlation between the two variables.
From the Boxplot presented in Fig. 7, it is possible to observe that the boxes
related to the geomorphology classes River Terraces (R.T) and Lagoon Plains (L.P)
are not aligned with the others. These categorical geomorphological classes (R.T
and L.P) are associated with regions with higher values of TWI and SOCS. Further-
more, from ANOVA, there was a significant variance between the geomorphological
classes tested, which justifies the insertion of this covariate in the EK model.
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 151

Fig. 9 Boxplot of SOCS values as a function of Sand classes

The Boxplot of the geology variable is shown in Fig. 8. The distribution of SOCS
values is concentrated in the Barreiras (B), Alluvial Deposits (A.D), Fluviolagunar
Plains Deposits (F.P.D), Coastal Deposits (C.D) and Granite (G), besides the boxes
are relatively aligned. It did not present any difference from ANOVA, and therefore,
it was not selected to compose model 3 (EK model).
For the sand classes (Fig. 9) it seems that the boxes are relatively aligned,
however, the ANOVA indicated a significant difference between the tested classes,
justifying its insertion for the EK model.
Figure 10 shows the distribution of SOCS values as a function of silt classes.
Similar to the case of sand, the boxes of silt classes seem to be relatively aligned
with a slightly higher box related to class 3 (between the mean and upper third).
Moreover, the ANOVA indicated a significant difference between the silt classes
tested, justifying its insertion into the EK model.
The Boxplot of clay classes as a function of SOCS data is presented in Fig. 11.
The visual pattern indicates a homogeneous distribution of SOCS data in defined
categories. In this case, both the aligned boxes and the ANOVA analysis confirm
that there was no significant variance between the tested clay classes. Therefore, the
clay map was excluded from the EK model.
152 M. B. Ceddia et al.

Fig. 10 Boxplot of SOCS values as a function of Silt classes

Fig. 11 Boxplot of SOCS values as a function of Silt classes

Considering the analyzed covariates (continuous and categorical), the EK method


selected 12 covariates to apply to the RF algorithm, namely: DEM, S, TWI, CNBL,
TRI, GS, LONG, LAT, AAP, GEOM, SAND, and SILT.
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 153

Fig. 12 Variable Importance in each RF algorithm approach

3.3.3 Variable Importance in Each RF Algorithm Approach

Based on the analysis of the importance of the variables (Fig. 12), it is possible to
observe that some covariates are repeated in the three models as top five, such as
AAP, and Longitude. The covariate LAT stands out in Models 1 and 3. On the other
hand, the covariate RSP stands out in model 1 (without previous selection), and TWI
and GS stand out in Model 3 (EK) and Model 2 (RFE), respectively.
No categorical covariate made up the group of the 50% most important. In model
1 (All covariates), the texture classes Silt3 (165–250 g.kg−1 of silt) and Sand2
(220–372 g.kg−1 of sand), and geomorphological F.P. (Fluviomarine Plain) showed
importance between 12.5% and 20%.
In model 3 (EK), the classes that stood out in importance were: geomorphology
L.P. (Lagoon Plain - ≈46%), Silt3 (165–250 g.kg−1 of silt - ≈15%) and Sand2
(220–372 g.kg−1 ≈ 12.5%). It should be noted that the L.P. and Silt3 classes are
those that, in the Boxplot analysis, presented the easiest visualization of misalign-
ment (indication of correlation with SOCS) concerning the other geomorphological
and silt classes, respectively.

3.3.4 The Evaluation of the Maps

Figure 13 shows the three SOC stock maps predicted using the three RF Models
(model 1—All Covariates; model 2—RFE; and model 3—EK). The metrics of
the accuracy evaluation of these maps, including the OK map, are found in
Table 9. The map generated using RF model 2 (RFE) presented the best metrics
for the spatialization of SOCS (R2 equal to 0.32, MAE = 10.71, RMSE of
16.59, MSE equal to 275.16, and Bias = 4.07). Among those from the group of
RF algorithms, the map with the worst accuracy was model 1 (All covariates),
presenting the metrics: R2 = 0.24, MAE = 10.96, RMSE of 17.58, MSE = 309.20,
and Bias = 3.37. The RF model 3 (EK) was considered intermediate with R2 =0.26,
MAE = 10.60, RMSE = 17.10, MSE = 292.56, and Bias = 3.37.
154 M. B. Ceddia et al.

Fig. 13 SOCS maps via Random Forest using the three approaches

When we compare the results with that obtained by the OK method, some metrics
were superior in kriging and others inferior. For example, OK was better considering
R2 (0.42) and MAE (10.28). However, the OK kriging map presented the worst
values of the metrics RMSE (18.83), MSE (317.94), and Bias (7.05). About the
relative improvement (RI %), the most significant gain of the RF models in relation
to the OK was found with the Bias metrics (more than 50%). On the other hand,
regarding R2 , the RF models presented a reduction of the RI % of ~43, 24, and 38%
for Models 1, 2, and 3, respectively.
The spatial distribution of SOCS values is relatively similar among the three
RF models, but Model 2 (RFE) has more prominent areas of high and low
values (greater amplitude). This approach also produced artifacts (vertical lines),
probably associated with the covariate longitude (LONG), which presents a higher
importance in this predictive model (Fig. 12). The artifacts (vertical lines) were not
noted prominently in RF Model 1 and 3, although the longitude covariate also had
high importance (first and second place, respectively). The OK map, as presented in
Fig. 6a, tended to smooth the higher values of SOCS, especially in the southwestern
part of the study area.
When evaluating the criteria of plausibility (be valid in light of the current
knowledge and scientific theories), interpretability (the translation of an abstract
model or model output into terms understandable by humans) and explainability
(models that both predict and explain), it is noted that the pure and simple adoption
of the validation metric is limited to judge the choice of a map generated by ML
algorithms.
For example, the RF model 1 (All covariates) presents not only the worst
accuracy in relation to the other RF approaches but also fails in plausibility and
explainability. The question that arises is, “What is the reason for listing the
covariates CI, CNBL, RSP, MRRTF, and MRVBF with significant importance for
the model (Fig. 12) if their Pearson’s correlation is not significant with SOCS (Table
8)?” Even knowing that Pearson’s linear correlation may not be sufficient or unique
Table 9 Accuracy of prediction and interpolation models using external validation data
Ordinary kriging
MAE (Kg RMSE (Kg
Approach R2 R2 RI (%) C.m−2 ) MAE RI (%) C.m−2 ) RMSE RI (%) MSE (Kg C.m−2 ) MSE RI (%) Bias Bias RI (%)
OK—Gaussian 0.42 Reference 10.28 Reference 17.83 Reference 317.94 Reference 7.05 Reference
RF Model 1—All 0.24 −42.86 10.96 −6.61 17.58 1.40 309.20 2.75 3.37 52.20
covariates
RF Model 0.32 −23.81 10.71 −4.18 16.59 6.95 275.16 13.46 4.07 42.27
2—RFE
RF Model 3—EK 0.26 −38.10 10.6 −3.11 17.1 4.09 292.56 7.98 3.37 52.20
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . .
155
156 M. B. Ceddia et al.

to explain associations between such diverse covariates and the target variable (in
this case, SOCS), the fact is that the absence of a covariate selection procedure
makes mapping a process that does not allow mapmakers to explain how the map is
made (lack of transparency in map development).
The RF model 2 (using RFE as a covariate selection procedure) presented the
best metrics and the highest relative gain in relation to the map generated by OK
(Table 9). However, it failed in the interpretability and explainability criterion
since the vertical lines (systematically found—Fig. 12) do not reflect the reality
of the distribution of SOCS values across the area. This behavior was reported
by (Meyer et al., 2019). The authors show that highly autocorrelated predictors
(such as geolocation variables, e.g., latitude and longitude) can lead to considerable
overfitting and result in models that can reproduce the training data but fail in
making spatial predictions. The problem becomes apparent in the visual assessment
of the spatial predictions that show clear artifacts that can be traced back to
a misinterpretation of the spatially autocorrelated predictors by the algorithm.
Besides, the authors conclude that in addition to spatial validation, a spatial variable
selection must be considered in spatial prediction models of ecological data to
produce reliable results.
The RF model 3 (EK), although it presented better plausibility, interpretability,
and explainability since it tries to control the process of covariates supply to the RF
algorithm based on an explained method of association of covariates with SOCS,
did not surpass the RF model 2 in terms of accuracy metrics. On the other hand, it
balances out a good gain relative to the map generated by OK.
Finally, comparing Model 2 (RFE) with Model 3 (EK), it is noted that the
advantages in metrics are relatively small. For example, model 2 (RFE) presents an
R2 18.8% higher than that of model 3 (EK) but loses practically the same magnitude
in terms of Bias (−17.2%). In the other metrics (MAE, RMSE, and MSE), the
superiority of model 2 to model 3 is 1.02, 3.1 and 6.3%, respectively. Doing a joint
analysis of the metrics together with the interpretability and explainability criterion,
the map generated by model 3 seems more consistent. Therefore, model 3 (EK)
is the best candidate for choosing a map generated using the RF approach. These
results corroborate those found by (Ferreira et al., 2022, 2023), who developed
predictive models for texture and SOCS attributes in the Central Amazon region
(Fig. 13).

4 Conclusion

I. The development of this work allowed the recovery and availability of an


important collection of legacy soil data (PROJIR) so that not only the academic
community can use it in several studies but also as a SOCS baseline in a
strategic area of the Rio de Janeiro State (approximately 30% of the northern
region). The SOCS ranged from a minimum of 2.18 to a maximum of 104.47 kg
C.m−2 , with most soils having SOCS of up to 30 kg C.m−2 .
Soil Organic Carbon Stock Estimation Using Legacy Data: A Case Study. . . 157

II. Among the evaluated approaches to map SOCS, considering only the MSE,
RMSE, and Bias metrics, the RF algorithm with model 2 (covariate selection
using Recursive Feature Elimination—RFE) stood out, followed by RF model
3 (EK) and RF model 1 (without selection of covariates). Although Ordinary
Kriging was the worst concerning some metrics, it was superior regarding R2
and MAE.
III. Finally, our results reinforce the importance of evaluating the results not only
based on traditional metrics (R2 , MAE, MSE, RMSE, and Bias, for example)
and that criterion such as plausibility, interpretability, explainability, and visual
assessment of the spatial predictions can contribute to the mapping process and
choosing a consistent map. In this context, model 3 (Expert) was the one that
best balanced the relative improvement in the metrics to the OK map and best
met the criteria of plausibility, interpretability, and explainability.

References

Bernoux, M., Carvalho, M. S. C., Volkoff, B., et al. (2002). Brazil’s soil carbon stocks. Soil Science
Society of America Journal, 66, 888–896.
Boehner, J., & Selige, T. (2006). Spatial prediction of soil attributes using terrain analysis and
climate regionalization. In J. Boehner, K. R. McCloy, & J. Strobl (Eds.), SAGA – Analysis and
modelling applications (pp. 13–28). Göttinger Geographische Abhandlungen.
Boehner, J., Koethe, R., Conrad, O., et al. (2002). Soil regionalisation by means of terrain analysis
and process parameterisation. In E. Micheli, F. Nachtergaele, & L. Montanarella (Eds.), Soil
classification (European soil bureau, research report 7, Luxembourg) (pp. 213–222).
Burrough, P. A., McDonnell, R. A., & Burrough, P. A. (1998). Principles of geographical
information systems (2nd ed.). Oxford University Press.
Conrad, O., Bechtel, B., Bock, M., et al. (2018). System for automated geoscientific analyses
(SAGA) v. 2.1.4. Geoscientific Model Development, 8, 1991–2007.
CPRM – Serviço Geológico do Brasil. (2001). Mapa geológico do Brasil. Belo Horizonte: CPRM.
Escala 1:5.000.000. Programa Levantamentos Geológicos Básicos do Brasil.
Ferreira, A. C. S., Ceddia, M. B., Costa, E. M., et al. (2022). Use of airborne radar images and
machine learning algorithms to map soil clay, silt, and sand contents in remote areas under the
amazon rainforest. Remote Sensing, 14(5711), 3814.
Ferreira, A. C. S., Pinheiro, É. F. M., Costa, E. M., et al. (2023). Predicting soil carbon stock
in remote areas of the central amazon region using machine learning techniques. Geoderma
Regional, 32(614), e00614.
Gallant, J. C., & Dowling, T. I. (2003). A multiresolution index of valley bottom flatness for
mapping depositional areas. Water Resources Research, 39, 1347.
Global Climate Data. (2008). Global climate change: Evidence. NASA global climate change and
global warming: Vital signs of the planet. Global Climate Data.
Gomes, A. S., Ferreira, A. C. S., Pinheiro, É. F. M., et al. (2017). The use of Pedotransfer functions
and the estimation of carbon stock in the central amazon region. Scientia Agricola, 74, 450–
460.
Grimaldi, S., Nardi, F., Benedetto, F. D., et al. (2007). A physically-based method for removing
pits in digital elevation models. Advances in Water Resources, 30(10), 2151–2158.
Hutchinson, M. F., & Gallant, J. C. (2000). Digital elevation models and representation of terrain
shape. In J. P. Wilson & J. C. Gallant (Eds.), Terrain analysis: Principles and applications.
Willey.
158 M. B. Ceddia et al.

IBGE – Instituto Brasileiro de Geografia e Estatística. (2019). Mapa de compartimentos de relevo


do Brasil (1:250.000). Diretoria de Geociências, Coordenação de Recursos Naturais e Estudos
Ambientais.
Kuhn, I., & Dormann, C. F. (2012) Less than eight (and a half) misconceptions of spatial analysis.
Journal of Biogeography, 39, 995–998.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (600p). Springer Science+Business
Media New York.
Lamichhane, S., Kumar, L., & Wilson, B. (2019). Digital soil mapping algorithms and covariates
for soil organic carbon mapping and their implications: A review. Geoderma, 352, 395–413.
Mallet, J. L. (2002). Geomodeling. New York.
Meyer, H., Reudenbach, C., Wöllauer, S., et al. (2019). Importance of spatial predictor variable
selection in machine learning applications—Moving from data reproduction to spatial predic-
tion. Ecological Modeling, 411(108815), 108815.
Smith, P., Soussana, J., Angers, D., et al. (2020). How to measure, report and verify soil carbon
change to realize the potential of soil carbon sequestration for atmospheric greenhouse gas
removal. Global Change Biology, 26, 219–241.
Thompson, J. A., Bell, J. C., & Butler, C. A. (2001). Digital elevation model resolution: effects
on terrain attribute calculation and quantitative soil-landscape modeling. Geoderma, 100(1),
67–89.
Wackernagel, H. (2003). Ordinary Kriging. In H. Wackernagel (Ed.), Multivariate Geostatistics:
An introduction with applications. Springer.
Wadoux, A. M. J.-C., Minasny, B., & McBratney, A. B. (2020). Machine learning for digital soil
mapping: Applications, challenges and suggested solutions. Earth-Science Reviews, 210.
Webster, R., & Oliver, M. A. (2007). Geostatistics for environmental scientists (Statistics in
practice) (2nd ed.). Wiley.
Yamamoto, J. K., & Landim, P. M. B. (2015). Geoestatística: conceitos e aplicações. Oficina de
Textos.
Aerogeophysical Data to Modeling Soil
Properties: A Study Case in Bom
Jardim—RJ

Blenda Pereira Bastos, Helena Saraiva Koenow Pinheiro ,


Waldir de Carvalho Junior , and Lúcia Helena Cunha dos Anjos

1 Introduction

Soils are products of the interaction of pedogenetic processes (Simonson, 1959) and
soil forming factors (Jenny, 1994), which results in a wide variability of physical
and chemical properties. Due to this variability and the high cost/time required
for conventional soil surveys, the application of digital soil mapping techniques
becomes an alternative to optimize the collection and spatialization of data.
The use of geophysical data, mainly gamma-ray spectrometry, has become
increasingly frequent in digital soil mapping with various applications (Reinhardt
& Herrmann, 2019). This method measures the concentrations of K (potassium),
U (uranium), and Th (thorium) from their radioactive decay series (40 K, 238 U and
232 Th), and through the calibrations and corrections of these data, it is possible to

associate to parent material (mainly acid rocks such as granites and gneisses with
high silica), relief denudation processes, and the relative rates of soil formation and
erosion (Wilford et al., 1997).
The magnetometric data, on the other hand, is less frequent used in DSM, but
showed potential for lateritic regolith studies as we can see in Iza et al. (2018),
ultramafic rocks and associated soils in McCafferty and Van Gosen (2009), and
magnetic susceptibility as covariate in machine learning algorithms to understand
soil attributes in de Mello et al. (2021a), for example. This method measures spatial
variations in fields associated with the magnetism of the local rocks, controlled by a
physical property called magnetic susceptibility. In general, the magnetic response

B. P. Bastos (✉) · H. S. K. Pinheiro · L. H. C. dos Anjos


Federal Rural University of Rio de Janeiro (UFRRJ), Seropédica, RJ, Brazil
e-mail: [email protected]; [email protected]; [email protected]
W. de Carvalho Junior
Embrapa Soils, Rio de Janeiro, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 159
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_11
160 B. P. Bastos et al.

is associated with iron-containing minerals, which are more common in basic rocks
(such as basalt) (Dentith & Mudge, 2014). Its applicability may be valid due to
the possibility of contrasting with gamma-ray data in the type of parent material
highlighted by these methods.
Therefore, the research goal is to predict soil properties using aero geophysical
data (AGD), terrain covariates derived from the Digital Elevation Model (DEM),
and Sentinel-2 images through the Random Forest (RF) model to evaluate the
importance of those covariates to modeling soil attributes and landscape dynamics
in Bom Jardim County.

2 Site Description

According to the Brazilian Territorial Division from IBGE (2021), the Bom Jardim
County is in the Centro Fluminense mesoregion of the State of Rio de Janeiro
(Fig. 1). The county has an area of 385.04 km2 and, according to Kappen’s criteria,
is characterized by Cwa, with a subtropical climate, dry winter (with temperatures
below 18 ◦ C) and hot summer (with temperatures above 22 ◦ C).
According to Calderano Filho et al. (2012), the region has soil associations that
include Cambisols, Ferralsols, Acrisols and Fluvisols (Fig. 2a). Geologically (Fig.
2b), is inserted in Oriental Terrane from Ribeira Belt. Including pre-collisional

Fig. 1 Location of the study area and soil samples. On the left bellow, the Digital Elevation Model
(DEM) derived from Rio de Janeiro cartographic database (original scale 1:25.000) (IBGE, 2018)
Aerogeophysical Data to Modeling Soil Properties: A Study Case in Bom Jardim—RJ 161

Fig. 2 (a) Soil map adapt from Calderano Filho et al. (2012) (original scale 1:100.000), color
standardization according to Sistema Brasileiro de Classificação de Solos SiBCS—Santos et al.
(2018); (b) Geological map (original scale 1:400.000) adapted from CPRM (2016)

to late-collisional deformed plutonic rocks (Rio Negro and Trajano de Moraes


Complexes, Cordeiro and Serra dos Órgãos Suites), paragneisses and metasediments
of high metamorphic grade (São Fidélis Group), and post-collisional non deformed
granitic bodies of Nova Friburgo Suite (Conselheiro Paulino, Sana e São José do
Ribeirão granites) (Tupinanbá et al., 2013).
The soil dataset consists of 208 soil samples collected between 2009 and 2011,
which includes 74 soil profiles, 44 complementary soil profiles, and 90 surface
horizon-only samples. The procedures used to describe and analyze these samples
are detailed by Calderano Filho et al. (2012). In this study the attributes chosen were
Base Sum (BS), Base Saturation (BSat), Aluminum content (Al), and Aluminum
Saturation (AlSat). The basic statistic of the soil attributes chosen for spatial
prediction is represented in Table 1.
162 B. P. Bastos et al.

Table 1 Statistics of soil attributes used in the prediction


Base sum Aluminum
(cmolc/kg) Base saturation (%) Al3+ (cmolc/kg) saturation (%)
Minimum 0.20 2.00 0 0
Maximum 12.30 100.00 2.50 89.00
Median 2.38 26.00 0.60 20.25
Mean 3.01 30.25 0.72 30.21
Standard deviation 2.48 20.97 0.63 28.44

3 Methodology

The DEM was generated through the cartographic base of the State of Rio de Janeiro
(IBGE, 2018), with a spatial resolution of 20 m by the Topo to Raster tool in ArcGIS
Desktop software (v. 10.6). Interpolation errors (spurious depressions—Sinks) have
been filled to improve the DEM by using the Hydrology toolbox. From DEM, the
covariates were generated through the Morphometry and Hydrology modules of the
Terrain Analysis tool in the SAGA-GIS software (saga-gis.org).
The Sentinel-2 mission’s Multispectral Instrument (MSI) sensor images consists
of two satellites (Sentinel 2A and 2B) launched in June 2015 and March 2017 with
images provided by ESA via the Copernicus Open Access Hub website (https://
scihub.copernicus.eu/) (Drusch et al., 2012). In this study, we use free clouds images
collected on 07/12/2021. The Treatment consisted of processing the images from
Level 1C to Level 2A (TOA to BOA) using the Sen2Cor processor (Mueller-Wilm et
al., 2017). The indices were generated by combining spectral bands with a resolution
of 20 m in QGIS software (v. 3.24.1).
The AGD were processed in Oasis Montaj software (v. 9.8) adopting 100 m of
spatial resolution. For gamma-ray data the interpolation was performed through
minimum curvature (Briggs, 1974) method to generate the Total Count (TC),
K (potassium), eU (uranium equivalent), and eTh (thorium equivalent) grids. In
sequence, by combining these channels covariates were generated: ratios between
the elements, Factor F, anomalous potassium (Kd), and anomalous uranium (Ud).
The grids from magnetic data were performed through bidirectional (Geosoft,
2010) interpolation method to generate the Anomalous Magnetic Field (AMF),
that’s represents the magnetic susceptibility of the rocks in the region. From
the AMF, the covariates were generated: Vertical derivative (GZ) and Horizontal
derivatives (GX and GY), Analytic Signal Amplitude (ASA), and the Mafic Index
(MI = ASA/K*eU*eTh) that correlates both data. Furthermore, all aerogeophysical
products were resampled in the Oasis Montaj software using the ‘regrid’ tool to
20 m. In Table 2 are all the covariates used with their respective references.
The methodology developed for modeling was divided into five steps:
(1) Interpolation of the soil properties in fixed depth intervals (0–30 cm) with spline
function from AQP package (Beaudette et al., 2013);
Aerogeophysical Data to Modeling Soil Properties: A Study Case in Bom Jardim—RJ 163

Table 2 Covariates used for prediction and their respective references


Covariates References
DEM –
Convergence index Köthe and Lehmeier (1996)
Gradient Hjerdt et al. (2004)
Morphometric features Wood (1996) and Wood (2009)
Topographic Position Index (TPI) Guisan et al. (1999), Weiss (2000) and Wilson and
Gallant (2000)
Relative Heights and Slope Positions Böhner and Selige (2006)
MRVBF e MRRTF Gallant and Dowling (2003)
Terrain Ruggedness Index (TRI) Riley et al. (1999)
Vector Ruggedness Measure (VRM) Sappington et al. (2007)
Topographic wetness index Böhner and Selige (2006)
TC (μR/h) CPRM (2012)
K, eU and eTh
eTh/K, eU/K, eU/eTh
Factor F Gnojek and Prichystal (1985)
Kd and Ud Pires (1995)
AMF CPRM (2012)
ASA Li (2006)
GZ, GX and GY Blum (1999)
MI Pires and Moraes (2006) cited by Iza et al. (2018)
Grain Size Index (GSI) Perera et al. (2005); Xiao et al. (2006)
Normalized Difference Vegetation Rouse et al. (1974)
Index (NDVI)
Alteration Van der Meer et al. (2014)
Ferric iron Van der Meer et al. (2014)

(2) Elimination of covariates that do not significantly contribute to the construction


of predictive models;
(3) Predictive models training from the set of covariates defined in step 2;
(4) Validation from an unknown set of samples and analysis of the efficiency of the
models; and finally,
(5) Final maps analysis and discussing the spatial variability of the predicted
attributes.
Aiming to eliminate covariates that do not contribute significantly to the predictive
models’ construction, based on works such as Ferreira et al. (2021), Gomes et
al. (2019), and de Mello et al. (2021a), three techniques were used: nearZeroVar,
findCorrelation e rfeControl. All are available in the caret package (Kuhn, 2022)
for R software (available on r-project.org). These procedures aim to simplify the
predictive set and optimize the computational time required to build the models
(Kuhn & Johnson, 2013).
The first procedure was performed using the nearZeroVar function to eliminate
variables with variance values equal to zero or close to zero. The next step
164 B. P. Bastos et al.

was applied the findCorrelation function, which aims to analyze the correlation
between all available pairs of predictor covariates. For a given threshold of pairwise
correlation, the predictor that presented the highest correlation with the other
predictors was removed (Kuhn & Johnson, 2013). In this study, the used threshold
was 95% Spearman’s correlation.
Before performing the rfeControl function or Recursive Feature Elimination
(RFE), the samples were divided into 75% for training and 25% for validation. The
RFE was performed with the training data, including the total set of covariables
resulting from the Spearman’s correlation step. Then, 32 covariates subsets were
tested (1–32 predictors) through model-based selection rfFuncs function (com-
patible with RF model), leave one out method (LOOCV), and accuracy metric
R-squared. In the end, the most optimized set of covariates according to each soil
property was generated, eliminating the less important covariates for the predictive
model.
Finally, the hyperparameters for the RF algorithm were automatically optimized
by caret package for each response variable. The model’s assessment was performed
by using the validation dataset (25% of the samples), through accuracy index as the
R-squared, root mean square error (RMSE), and mean absolute error (MAE).

4 Results and Discussion

None of the environmental covariates were excluded from the processes of elimina-
tion by nearZeroVar function and six were excluded from Spearman’s correlation
(MRVBF, TRI, eTh, eU/eTh, Kd and GZ), resulting in a total of 32 covariates. From
RFE, the optimal set of variables were: 26 for BS, 16 for BSat, 8 for Al content and
8 for AlSat.
Figure 3 shows the Scatter plots of observed and estimated values pertinent to
the predictions for each attribute and their respective R2 , RMSE, and MAE values
of the validation stage. All soil attribute predictions exhibited an R2 greater than 0.2
which is adequate for machine learning algorithms (de Mello et al., 2021a). This is
common in digital soil mapping because of the limitations of the small dataset and
the complex variability of the soil properties. The statistical parameters are sensitive
to the number and the locations of the soil observations (Lagacherie et al., 2019,
2020), which can explain the performance since the study area have a limited dataset
and a concentration of samples (SE of the map—Fig. 1).
The importance rankings generated by the models are represented in Fig. 4 for
BS and BSat and Fig. 5 for Al content and AlSat.
The ratio eTh/K covariate stood out as the most important to predict BS, BSat and
AlSat (Fig. 5). For Al content prediction the eU/K showed 54.82% of significance,
while eTh/K shows only 2.65%. The eU/K contributed with 65% for BS, 32% for
BSat, 8.54% for AlSat. In Table 3 it is possible to observe that the correlation
between eTh/K is negative for BS and BSat and positive for Al content and AlSat,
and the eU/K ratio shows similar behavior.
Aerogeophysical Data to Modeling Soil Properties: A Study Case in Bom Jardim—RJ 165

Fig. 3 Scatter plots of observed and estimated values pertinent to the predictions. (a) Base Sum,
(b) Base Saturation, (c) Al and (d) Aluminum Saturation

Fig. 4 Covariates importance ranking from RF method: (a) Base Sum and (b) Base Saturation.
For Base sum only 20 most important variables shown

High values of eTh/K and eU/K indicate eU and eTh accumulation in relation
to K content. Knowing that K is more mobile during weathering, especially in
tropical and subtropical climates and the relatively high eTh and eU are associated
with resistant minerals (Ulbrich et al., 2009), we can assume a relationship between
these covariates and the stage of soil evolution. Less weathered soils have relatively
higher values of BS and BSat (low values of these ratios) while more evolved soils
are depleted in bases and more acidic, and consequently have higher values of Al
content and AlSat (high values of these ratios).
de Mello et al. (2021b) also demonstrated this behavior in soils through gamma-
ray data, where K has high mobility and is easily leached in environments with
166 B. P. Bastos et al.

Fig. 5 Covariates importance ranking from RF method: (a) Al and (b) Aluminum Saturation

Table 3 Spearman correlation values between selected covariates and soil attributes
BS BSat Al AlSat BS BSat Al AlSat
DEM −0.05 −0.19 0.2 0.16 TC −0.14 −0.19 0.08 0.1
Convergence −0.03 −0.03 0.03 0.04 K 0.17 0.09 −0.16 −0.18
Gradient −0.17 −0.18 0.16 0.17 eTh/K −0.37 −0.36 0.29 0.33
PlanCurvature 0.25 0.23 −0.21 −0.24 eU/K −0.26 −0.2 0.23 0.26
MaximumCurv −0.27 −0.24 0.24 0.26 FactorF 0 −0.07 −0.03 −0.03
MRRTF 0.16 0.2 −0.16 −0.16 Ud 0.19 0.22 −0.1 −0.14
ValleyDepth 0.32 0.32 −0.27 −0.27 MI 0.08 0.08 −0.03 −0.01
NormalizedH −0.22 −0.24 0.22 0.24 AMF 0.11 0.07 −0.08 −0.09
StandardizedH −0.21 −0.27 0.24 0.25 GX −0.22 −0.18 0.12 0.18
MidSlope 0.21 0.22 −0.19 −0.2 FerricIron −0.01 0.01 −0.01 0.01
TopographicWI 0.12 0.18 −0.16 −0.15 Alteration −0.08 −0.1 0.07 0.07
CatchmentA 0.2 0.19 −0.15 −0.19 GSI −0.19 −0.18 0.14 0.2
CatchmentS 0.15 0.11 −0.09 −0.14 NDVI −0.02 −0.04 0.04 0.03
TPI −0.07 −0.04 0.07 0.08

highly weathered and acid soils. Also, the authors pointed out that eU and eTh have
greater stability and are commonly associates with secondary Fe-oxide, Al-oxides,
Ti-oxides and clays.
A great performance from the AGD covariates were also observed for Ud for BS
(58.42%) and for BSat (34.21%), AMF for BS (33.82%), MI for BS (25.98%), GX
for Bsat (40.88%), K for BS (41.80%) and BSat (30.10%), and TC for BS (43.39%).
The BS and BSat maps show similar distribution as well as Al content and AlSat
maps, as can be observed in Fig. 6. The contrast between the maps demonstrates the
Aerogeophysical Data to Modeling Soil Properties: A Study Case in Bom Jardim—RJ 167

Fig. 6 Final maps of the predictive models. (a) Base Sum (cmolc/kg), (b) Base Saturation (%),
(c) Al (cmolc/kg) and (d) Aluminum Saturation (%)

expected behavior, areas with high BS and BSat values presents low Al content and
AlSat values.
The BS and BSat maps demonstrate high values in NW and SE of the mapped
areas where prevail Cambisols and Ferralsols (CX and LVA in SiBCS) and, mainly
in areas of occurrence of the Rio Negro Complex, Serra dos Órgãos Suite and
Cordeiro Suite. The opposite occurs for Al and AlSat, high values are highlighted in
the center of the mapped area. In this region, associations of Acrisols and Ferrasols
(PV and LA in SiBCS) and units of the São Fidélis Group predominate. The São
Fidélis Group is the oldest lithology, followed by the Rio Negro Complex, which
may have influenced the high predicted Al values for this region.

5 Conclusions

The geophysical covariates had significant importance to predict the selected


soil properties Base Sum, Base Saturation, Aluminum content, and Aluminum
Saturation. The approach of simplification and optimization of the input dataset
by elimination of less significant covariates showed effectiveness, with satisfactory
168 B. P. Bastos et al.

results also easy interpreted models. AGD can be used in predictive modeling of soil
properties as support to understand the origin of soil spatial variability.

References

Beaudette, D. E., Roudier, P., & O’Geen, A. T. (2013). Algorithms for quantitative pedology:
A toolkit for soil scientists. Computers & Geosciences, 52, 258–268. https://doi.org/10.1016/
j.cageo.2012.10.020
Blum, M. L. B. (1999). Processamento e interpretação de dados de geofísica aérea no Brasil
Central e sua aplicação a Geologia Regional e a Prospecção Mineral (Doctorade Thesis on
Geology). Brasília University.
Böhner, J., & Selige, T. (2006). Spatial prediction of soil attributes using terrain analysis and
climate regionalisation. In J. Böhner, K. R. McCloy, & J. Strobl (Eds.), SAGA - Analyses and
Modelling Applications. Göttinger Geogr. Abh., 115.
Briggs, I. C. (1974). Machine contouring using minimum curvature. Geophysics, 39, 39–48. https:/
/doi.org/10.1190/1.1440410
Calderano Filho, B., Polivanov, H., Chagas, C. D. S., Carvalho Junior, W. D., Calderano, S., Guerra,
A., Donagemma, G. K., Bhering, S. B., & Aglio, M. (2012). Solos do médio alto curso do Rio
Grande, região serrana do Estado do Rio de Janeiro. Embrapa.
CPRM Serviço Geológico do Brasil. (2012). Relatório final Projeto Aerogeofísico Rio de Janeiro
(Projeto 1.117).
CPRM Serviço Geológico do Brasil. (2016). Mapa Geológico e de recursos minerais do Estado
do Rio de Janeiro. Belo Horizonte. Escala 1:400.000. Programa geologia do Brasil. Mapas
geológicos estaduais.
de Mello DC, Veloso GV, de Lana MG, de Mello FAO, Poppiel RR, Cabrero DRO, Di Raimo
LADL, Schaefer CEGR, Filho EIF, Leite EP, Demattê JAM (2021a) A new methodological
framework by geophysical sensors combinations associated with machine learning algorithms
to understand soil attributes. Earth and Space Science Informatics.
de Mello, D. C., Alexandre Melo Demattê, J., de Oliveira, A., Mello, F., Roberto Poppiel, R.,
ElizabetQuiñonez Silvero, N., Lucas Safanelli, J., Barros e Souza, A., Augusto Di Loreto Di
Raimo, L., Rizzo, R., Eduarda Bispo Resende, M., & Ernesto Gonçalves Reynaud Schaefer, C.
(2021b). Applied gamma-ray spectrometry for evaluating tropical soil processes and attributes.
Geoderma, 381, 114736. https://doi.org/10.1016/j.geoderma.2020.114736
Dentith, M. C., & Mudge, S. T. (2014). Geophysics for the mineral exploration geoscientist.
Cambridge University Press.
Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola,
C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., & Bargellini,
P. (2012). Sentinel-2: ESA’s optical high-resolution Mission for GMES operational services.
Remote Sensing of Environment, 120, 25–36. https://doi.org/10.1016/j.rse.2011.11.026
Ferreira, R. G., da Silva, D. D., Elesbon, A. A. A., Fernandes-Filho, E. I., Veloso, G. V., de
Fraga, M. S., & Ferreira, L. B. (2021). Machine learning models for streamflow regionalization
in a tropical watershed. Journal of Environmental Management, 280, 111713. https://doi.org/
10.1016/j.jenvman.2020.111713
Gallant, J. C., & Dowling, T. I. (2003). A multiresolution index of valley bottom flatness for
mapping depositional areas. Multiresolution Valley Bottom Flatness. https://doi.org/10.1029/
2002WR001426
GEOSOFT INC. (2010). Filtragem montaj MAGMAP. Processamento de dados de campos
potenciais no domínio da frequência. Extensão para o Oasis Montaj, v. 7.1. Tutorial e guia
do usuário, Toronto, ON Canadá.
Aerogeophysical Data to Modeling Soil Properties: A Study Case in Bom Jardim—RJ 169

Gnojek, I., & Prichystal, A. (1985). A new zinc mineralization detected by airbone gamma-ray
spectrometry in Northern Moravia (Czechoslovakia). Geoexploration, 23(4), 491–502.
Gomes, L. C., Faria, R. M., de Souza, E., Veloso, G. V., Schaefer, C. E. G. R., & Filho, E. I. F.
(2019). Modelling and mapping soil organic carbon stocks in Brazil. Geoderma, 340, 337–350.
https://doi.org/10.1016/j.geoderma.2019.01.007
Guisan, A., Weiss, S. B., & Weiss, A. D. (1999). GLM versus CCA spatial modeling of plant
species distribution. Plant Ecology, 143, 107–122.
Hjerdt, K. N., McDonnell, J. J., Seibert, J., & Rodhe, A. (2004). A new topographic index to
quantify downslope controls on local drainage. Water resources research, 40(5). https://doi.org/
10.1029/2004WR003130
IBGE Instituto Brasileiro de Geografia e Estatística. (2018). Base cartográfica vetorial contínua
do Estado do Rio de Janeiro na escala 1:25.000. Diretoria de Geociências. Departamento de
Cartografia. Projeto RJ-25.
IBGE Instituto Brasileiro de Geografia e Estatística. (2021). Divisão Territorial Brasileira.
Available in: https://www.ibge.gov.br
Iza, E. R. H. F., Horbe, A. M. C., Castro, C. C., & Herrera, I. L. I. E. (2018). Integration of
geochemical and geophysical data to characterize and map lateritic regolith: An example in
the Brazilian Amazon. Geochemistry, Geophysics, Geosystems, 19, 3254–3271. https://doi.org/
10.1029/2017GC007352
Jenny, H. (1994). Factors of soil formation: A system of quantitative pedology. Dover.
Köthe, R., & Lehmeier, F. (1996). SARA-system zur automatischen relief-analyse. User Manual.
Kuhn, M. (2022). Package caret: Classification and regression training (R package version 6.0-
92). Available in: https://CRAN.R-project.org/package=caret.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
Lagacherie, P., Arrouays, D., Bourennane, H., Gomez, C., Martin, M., & Saby, N. P. A. (2019).
How far can the uncertainty on a Digital Soil Map be known?: A numerical experiment using
pseudo values of clay content obtained from Vis-SWIR hyperspectral imagery. Geoderma, 337,
1320–1328. https://doi.org/10.1016/j.geoderma.2018.08.024
Lagacherie, P., Arrouays, D., Bourennane, H., Gomez, C., & Nkuba-Kasanda, L. (2020). Analysing
the impact of soil spatial sampling on the performances of Digital Soil Mapping models and
their evaluation: A numerical experiment on Quantile Random Forest using clay contents
obtained from Vis-NIR-SWIR hyperspectral imagery. Geoderma, 375, 114503. https://doi.org/
10.1016/j.geoderma.2020.114503
Li, X. (2006). Understanding 3D analytic signal amplitude. Geophysics, 71, L13–L16. https://
doi.org/10.1190/1.2184367
McCafferty, A. E., & Van Gosen, B. S. (2009). Airborne gamma-ray and magnetic anomaly signa-
tures of serpentinite in relation to soil geochemistry, northern California. Applied Geochemistry,
24, 1524–1537. https://doi.org/10.1016/j.apgeochem.2009.04.007
Mueller-Wilm, U., Devignot, O., & Pessiot, L. (2017). S2 MPC Sen2Cor configuration and user
manual. European Space Agency.
Perera, Y. Y., Zapata, C. E., Houston, W. N., & Houston, S. L. (2005). Prediction of the soil-
water characteristic curve based on grain-size-distribution and index properties. In Advances in
pavement engineering (pp. 1–12). American Society of Civil Engineers.
Pires, A. C. B. (1995). Identificação geofísica de áreas de alteração hidrotermal, Crixás-Guarinos,
Goiás. Revista Brasileira de Geociências, São Paulo., 25, 61–68. https://doi.org/10.25249/
0375-7536.19956168
Reinhardt, N., & Herrmann, L. (2019). Gamma-ray spectrometry as versatile tool in soil science: A
critical review. Journal of Plant Nutrition and Soil Science, 182, 9–27. https://doi.org/10.1002/
jpln.201700447
Riley, S. J., De Gloria, S. D., & Elliot, R. (1999). A terrain ruggedness that quantifies topographic
heterogeneity. Intermountain Journal of Science, 5, 23–27.
Rouse, J. W., Haas, R. H., Schell, J. A., & Deering, D. W. (1974). Monitoring vegetation Systems
in the Great Plains with ERTS. In Third ERTS symposium (Vol. 351, p. 309). NASA Special
Publication, NASA.
170 B. P. Bastos et al.

Santos, H. G., Jacomine, P. K. T., Anjos, L. H. C., de Oliveira, V. A., Lumbreras, J. F., Coelho,
M. R., de Almeida, J. A., de Araujo Filho, J. C., de Oliveira, J. B., & Cunha, T. J. F. (2018).
Sistema brasileiro de classificação de solos. Embrapa.
Sappington, J. M., Longshore, K. M., & Thompson, D. B. (2007). Quantifying landscape
ruggedness for animal habitat analysis: A case study using Bighorn sheep in the Mojave Desert.
Journal of Wildlife Management, 71, 1419–1426. https://doi.org/10.2193/2005-723
Simonson, R. W. (1959). Outline of generalized theory of soil genesis. Soil Science Society
Proceedings, 23, 152–156.
Tupinanbá, M., Texeira, W., & Heilbron, M. (2013). Evolução Tectônica e Magmática da
Faixa Ribeira entre o Neoproterozoico e o Paleozoico Inferior na Região Serrana do Estado
do Rio de Janeiro, Brasil. Anuário IGEO-UFRJ, 35(2), 140–151. https://doi.org/10.11137/
2012_2_140_151
Ulbrich, H. H. G. J., Ulbrich, M. N. C., Ferreira, F. J. F., Alves, L. S., Guimarães, G. B.,
& Fruchting, A. (2009). Levantamentos gamaespectrométricos em granitos diferenciados. I:
revisão da metodologia e do comportamento geoquímico dos elementos K, The U. Geologia
USP. Série Cientifica, 9(1), 33–53.
Van der Meer, F. D., van der Werff, H. M. A., & van Ruitenbeek, F. J. A. (2014). Potential of ESA’s
Sentinel-2 for geological applications. Remote Sensing of Environment, 148, 124–133. https://
doi.org/10.1016/j.rse.2014.03.022
Weiss, A. D. (2000). Topographic position and landforms analysis. 2006. In Poster presented at
the ESRI User Conference, San Diego, July 9th (Vol. 76).
Wilford, R., Bierwirth, P. N., & Craig, M. A. (1997). Application of airborne gamma-ray
spectrometry in soil/regolith mapping and applied geomorphology. AGSO Journal of Australian
Geology and Geophysics, 17, 201–216.
Wilson, J. P., & Gallant, J. C. (2000). Primary topographic attributes. In J. P. Wilson & J. C. Gallant
(Eds.), Terrain analysis: Principles and applications (pp. 51–85). Wiley.
Wood, J. (1996). The geomorphological characterisation of digital elevation models (Dissertation
Department of Geography). University of Leicester.
Wood, J. (2009). Geomorphometry in LandSerf. In T. Hengl & H. I. Reuter (Eds.), Geomorphom-
etry: Concepts, software, applications. Developments in soil science (Vol. 33, pp. 333–349).
Elsevier.
Xiao, J., Shen, Y., Tateishi, R., & Bayaer, W. (2006). Development of topsoil grain size index for
monitoring desertification in arid land using remote sensing. International Journal of Remote
Sensing, 27, 2411–2422. https://doi.org/10.1080/01431160600554363
Predicting and Mapping of Soil Carbon
and Nitrogen Stocks by Diffuse
Reflectance Spectroscopy and Magnetic
Susceptibility in Western Plateau of São
Paulo

Angélica Santos Rabelo de Souza Bahia and José Marques Júnior

1 Introduction

Soil C and N perform essential roles in soil health and ecosystem dynamics. They
are key components of the global biogeochemical cycles for understanding the
dynamics of greenhouse gas (GHG) emissions and climate change in all terrestrial
ecosystems (Sorensen et al., 2018). Soil total C is the driving force of biological
activity, serving as a primary source of energy and nutrients for many soil organisms
and an important factor affecting N mineralization and immobilization in soils.
While total N, an essential macronutrient for plant growth, is also one of the main
determinants and indicators of soil fertility and quality. Thus, the prediction of
these stocks is necessary for a range of agricultural and environmental applications.
Therefore, the objectives of this work were to investigate whether diffuse reflectance
spectroscopy (DRS) and magnetic susceptibility (MS) can be applied to predict soil
C and N stocks in a sandstone-basalt transition region and characterize the spatial
distribution of these attributes.
Soil is a heterogeneous, three-dimensional system, whose formation factors and
pedogenetic processes are complex. The scientific understanding of this system
and the evaluation of its unique qualities and functions has been gained through
long and arduous soil surveys complemented by chemical, physical, mineralogical,
and biological laboratory analysis (Viscarra Rossel et al., 2011). However, many
these conventional analytical techniques for soil analysis aren’t so simple and cheap
to perform, even though they are useful in research. Thus, the development of
fast, accurate and low-cost methods to quantify soil attributes is of paramount
importance to enable detailed mapping, mainly in tropical regions where there is

A. S. R. d. S. Bahia (✉) · J. Marques Júnior


São Paulo State University – UNESP, Faculty of Agricultural and Veterinary Sciences,
Jaboticabal, SP, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 171
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_12
172 A. S. R. d. S. Bahia and J. Marques Júnior

1024 1022 1020 1018 1016 1014 1012 1010 108 106 104 102 100 ν (Hz)

γ rays X rays UV IR Microwave FM AM Long radio waves


Radio waves

10–16 10–14 10–12 10–10 10–8 10–6 10–4 10–2 100 102 104 106 108 λ (m)

Visible spectrum

400 500 600 700


Increasing Wavelength (λ) in nm
Increasing Wavelenght (λ)

Increasing Frequency (ν)

Fig. 1 The electromagnetic spectrum

great variation of the soil attributes (Bahia et al., 2017). This is a necessity in
Pedometry, considering the high number of samples required to understand the
spatial behavior of soil attributes.
Pedometrics is concerned with the application of mathematical and statistical
methods to the study of the distribution and genesis of soils (Minasny et al.,
2014). This field of soil science has made great advances due to discoveries and
development in new technologies such sensors to measure soil properties and
complement, or replace, more conventional techniques used in the laboratory, noting
the use of frequencies across the electromagnetic (EM) spectrum (Fig. 1). Proximal
sensors provide quantitative results and can be faster and more cost-effective than
conventional laboratory analysis (Viscarra Rossel et al., 2011; Bahia et al., 2015;
Molin & Tavares, 2019). Furthermore, they are becoming smaller, faster, more
accurate, more energy efficient, wireless and smarter.
In the farming sector, the integration of monitoring and management systems has
been using proximal sensors to measure different soil and forage properties. Due
to their intrinsic characteristic of generating a volume of varied data and quickly
(Big Data), these technologies can provide large amounts of data to understand the
dynamics of the soil and landscape in real time. This knowledge becomes essential
in identifying agricultural potential and, consequently, in adopting economically and
environmentally sustainable management practices. Thus, they can help producers
to reduce costs and be more efficient in management, since they can collect
information from the field and carry out the fertilizations at varied rate according
to local needs, for example, in addition to assisting in environmental monitoring, in
the evaluation of soil quality and, finally, in the adoption of precision farming.
The main used techniques of proximal sensing include geophysical methods,
proximal soil detection methods and diffuse spectroscopy. Geophysical methods,
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 173

such as soil penetration radar, electromagnetic induction and electrical resistivity


have been used to document the variability of soil properties in different landscape
forms (Coutinho et al., 2017), which can be used to prepare specific management
maps. Proximal soil detection methods include gamma-ray spectroscopy, ion-
selective potentiometry, X-ray fluorescence and magnetic susceptibility. Although
used in a limited way so far, these technologies have potential for use in soil
surveys (Viscarra Rossel et al., 2010). Diffuse spectroscopy, on the other hand,
offers a high potential for quantification of various soil properties and, therefore,
its implementation is already underway in several laboratories across the country
(Bahia, 2016; Bahia et al., 2015, 2017; Demattê et al., 2019). Among the proximal
sensors that will be discussed in more detail below, DRS and MS stand out for
being promising methods used to estimate soil attributes efficiently (McBratney et
al., 2006; Bahia et al., 2015, 2017; Camargo et al., 2016; Camargo et al., 2018;
Demattê et al., 2019; Fernandes et al., 2020; Filla et al., 2021), and which are being
important for the advancement of Pedometry.
DRS is a technique that uses the measurement of reflectance or absorbance
of a sample arising from fundamental vibrations and electron overtones when it
is excited by a primary source of energy. Reflectance or absorbance spectra are
obtained from the interaction of the soil with electromagnetic radiation. The spectra
are then related to previous soil information using empirical functions and these
functions are used to predict features for new samples where only the spectral curve
is needed. This technique has already been used to quantify the soil carbon content,
to predict sand and clay content, as well as helping to track sediment sources and
predict potentially toxic elements in tropical soils (Bellon-Maurel & McBratney,
2011; Bahia et al., 2017; Tiecher et al., 2017; Camargo et al., 2018; Moura-Bueno et
al., 2020). DRS has become one of the most promising methods for data generation.
Many studies have shown its feasibility with the dissemination of spectral libraries,
resulting in greater scientific development than other techniques close-to-ground
sensing (Stevens et al., 2013; Demattê et al., 2016, 2019; Viscarra Rossel et al.,
2016; Moura-Bueno et al., 2020).
MS is a measure of the degree to which a material can be magnetized when
subjected to an applied magnetic field. The soil can be considered according to its
magnetic behavior and the interaction between the magnet and the minerals with
magnetic expression present in the soil sample that in a scale generate a weight
force. This force is then converted into magnetic susceptibility using a standard
curve. MS has stood out in recent years for being a predictor of attributes related to
the mineralogical composition of soils, such as iron oxides, since it has a strong
correlation with the clay contents, soil organic matter and adsorbed phosphate,
attributes that influence soil mineralogy (Torrent et al., 2007; Yang et al., 2015;
Camargo et al., 2016; Bahia, 2016; Bahia et al., 2017; Fernandes et al., 2020; Filla
et al., 2021). It can also be used to infer the degree of pedogenesis for grouping of
geochemical transects in landscapes (Cervi et al., 2019).
The development of predictive models for the soil attributes using DRS and
MS allows the analysis of a larger number of samples (Bahia et al., 2017). This
knowledge is relevant because it fills a gap in about alternative techniques for
174 A. S. R. d. S. Bahia and J. Marques Júnior

quantification of soil attributes. An example is the prediction of soil carbon and


nitrogen contents by DRS and MS (Bahia et al., 2017). Another example is the use
of statistical models for the prediction of potentially toxic elements in soils using
DRS and MS (Camargo et al., 2018). Portable sensors which enable you to take
the analyzer to the sample, rather than bring the sample to the lab are particularly
useful, but their prices vary according to their complexity. Many of these sensors are
currently in a developmental phase and are used primarily in research. Therefore, by
achieving satisfactory calibrations with cheaper equipment, the spatial distribution
evaluation and detailed mapping of large areas can be performed with greater ease.
The quantification of the carbon content in the soil is essential to identify areas
where the soils will become a source or sink of carbon, in the face of future
scenarios of global climate change (Lal et al., 2018). Soil organic carbon content
can be predicted by calibrated models with spectral data organized in libraries of
different scales, both global (Viscarra Rossel et al., 2016) and national (Demattê et
al., 2019). However, the development of accurate spectroscopic models remains a
challenging task, especially at regional and local scales, when more detailed and
accurate information is required. In addition, developing models also using MS is
essential, since it greatly facilitates field work and mapping of management areas.
Several protocols are emerging to facilitate monitoring in the context of low-
carbon agriculture, with a view to carbon credits, but challenges remain regarding
the use of robust methods with scientific support for this purpose. To be used
effectively for carbon credits derived from soil carbon stocks, quantification meth-
ods need to be broadly applicable, compliant with carbon credit requirements, and
strike a balance between cost and benefit. Thus, research in this area is increasingly
important and necessary.

2 Methodology

The studied area is located in the São Paulo State in Guatapará city, Brazil (Fig. 2).
Altitude varies from 519 to 649 m and, according to the Thornthwaite (1948)
classification, the local climate can be defined as B1rB’4a’, humid mesothermic
with a small water deficiency, with a summer evapotranspiration 70% lower than the
annual one. The natural vegetation was composed of sub-deciduous tropical forest.
However, it has been consisted of a sugarcane field with mechanized harvesting
for more than 15 years. The area is inserted in the geomorphological unit of the
Western Plateau of São Paulo, close to the limit of the Basaltic Cuestas, in the
lithostratigraphic divider Sandstone-Basalt (IPT, 1981). The area features three
original materials related to the transition between basalt of the São Bento Group,
Serra Geral (SG) Formation, Eluvial-Colluvial Deposit (ECD), and Alluvial Deposit
(AD) (IPT, 1981; Geobank, 2022). This type of geological transition represents
about 44,000 ha in the central region of São Paulo state (IPT, 1981). So, the results
can be extrapolated to other similar regions.
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 175

Fig. 2 Characterization of the study area. Location of the sampling grid (a) with spatial distribu-
tion of points for calibration (Dataset 1) and prediction (Dataset 2) of the models (b). Soil map at
scale 1:12,000 (LVef, Typic Eutrudox; LVd, Typic Hapludox; LVdf, Typic Hapludox; LVAd, Typic
Hapludox; and RQod, Typic Quartzipsamment) (c). Geological map at scale 1:500,000 (SG Serra
Geral; ECD Eluvial-Colluvial Deposit; AD Alluvial Deposit) (d). (Adapted from Bahia, 2016)

The SG Formation of the Paraná Basin consists of successive spills of basic


lavas interlocked with aeolian and fluvial sandstones (corresponding to the Botucatu
Formation) along with diabase dikes, sills, and andesites (White, 1908). It is
characterized by 97% basic intermediate rocks (basalts and andesites) and 3% acid
rocks (riodacites and rhyolites), with the latter mainly distributed in the central
and southern parts of the basin, presenting geochemical and petrographic differ-
ences that allow for its identification (Bellieni et al., 1986). ECD corresponds to
deposits of thick sediments, composed of blocks and grains of various dimensions,
transported by gravity and accumulated in concave areas at the foot of slopes, a
short distance from steeper slopes, or on rocky escarpments (Lacerda & Sandroni,
1985). AD corresponds to deposits of unconsolidated, fine to medium alluvial,
sandy-clayey sediments with variegated colors, remains of organic matter, and
the presence of pebbles and fine to coarse sands with gravel levels, and silty-
clay material remains, related to floodplains, argins, and current fluvial channels
(Geobank, 2022).
The soil map (scale of 1:12,000) registers the occurrence of the following
mapping units classified according to the Brazilian Soil Classifcation System
(SiBCS) (Santos et al., 2018) and Soil Taxonomy (Soil Survey Staff, 2014): LVef
176 A. S. R. d. S. Bahia and J. Marques Júnior

(SiBSC: Latossolo Vermelho eutroférrico; Soil Taxonomy: Typic Eutrudox); LVdf


(SiBCS: Latossolo Vermelho distroférrico; Soil Taxonomy: Typic Hapludox); LVd
(SiBCS: Latossolo Vermelho distrófico; Soil Taxonomy: Typic Hapludox); LVAd
(SiBCS: Latossolo Vermelho-Amarelo distrófico; Soil Taxonomy: Typic Hapludox);
RQod (SiBCS: Neossolo Quartzarênico órtico distrófico; Soil Taxonomy: Typic
Quartzipsamment).
A sampling mesh was made with regular spacing and sampling density of
approximately 1 point every 2.5 ha, totaling 372 sample points (Dataset 2), with
samples collected at a depth of 0.00–0.25 m, covering a total area of about 930 ha.
These points were used for predict unknown values. For the model’s calibration,
74 sample points (Dataset 1) were collected at a depth of 0.00–0.25 m, every 60 m
along a transect (approximately 4470 m) representative of the area, because it covers
all the soil variation and Fe2 O3 content of the study area (3–204 g kg−1 ) (Fig. 2b).
With the aid of the digital elevation model, surface water flow simulation and field
observations, this transect was delimited from the top to the foothill, following the
slope smoothest side.
Soil total contents of carbon and nitrogen were measured by dry combustion in
LECO Trunspec equipment (LECO Corporation, St Joseph, MI). Soil samples were
sieved and oxidized at high temperatures (1350 ◦ C), with the aid of 2.8 ultrapure
oxygen, which measures total carbon and nitrogen concentrations. Bulk density was
determined by taking samples of known volume and drying them to constant weight.
The stocks of C and N (in t ha−1 ) were calculated using C and N contents and bulk
density at depth used (0.25 m) in each sample point.
Reflectance values were recorded on a Lambda 950 UV/VIS/NIR spectropho-
tometer, equipped with an integrating sphere 150 mm in diameter, at 1 nm intervals
along the visible (VIS) and near infrared (NIR) range (380–2300 nm), using Halon
(PTFE) powder as standard white. The NIR region contains useful information
related to C, and the total nitrogen content derives from several important chemical
bonds, such as C-C, C=C, C-H, C-N, N-H, and O-H. MS was analyzed in soil
samples by the Bartington MS2 equipment coupled with the Bartington MS2B
magnetic susceptibility sensor. Evaluations were performed at low frequency
(0.47 kHz) (Dearing, 1994).
Calibration models based on VIS-NIR spectroscopy and MS were developed
separately for carbon and nitrogen stocks. To develop models based on soil spectra
and laboratory data, the partial least-squares regression (PLSR) was used (Geladi
& Kowalski, 1986). Already, the MS-based models were calibrated by linear
regression between magnetic and laboratory data. The calibrations and validations
were performed with the Dataset 1 (N = 74) and the prediction was performed on
samples from the Dataset 2 (N = 372). PLSR analysis was performed by ParLeS
software (Viscarra Rossel, 2008) and used to relate spectra with soil attributes. The
Fig. 3 shows the flowchart with the processes for obtaining spatial variability maps
of data obtained in the laboratory and predicted by VIS-NIR spectroscopy and MS.
Data were submitted to descriptive statistics using the SAS software. In order to
verify the statistical differences between the mean values of the attributes studied
considering the compartments, the Tukey test at 5% probability was applied to the
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 177

Fig. 3 Simplified flowchart with the processes used to obtain spatial variability maps of data
obtained in the laboratory (observed) and predicted by VIS-NIR spectroscopy and magnetic
susceptibility. PLSR partial least-squares regression

data. The pedotransfer functions (PTFs) based on MS were calibrated by linear


regression between magnetic and laboratory data. The evaluation of the precision of
the PTFs was made through the analysis of the coefficients of determination (R2 ) and
correlation (r), RMSE (root mean square error), RPD (residual prediction deviation)
and the Willmott agreement index (d).
The characterization of the spatial variability was performed through geostatisti-
cal analysis with modeling of experimental variograms and subsequent interpolation
by ordinary kriging. The accuracy of the variability maps was tested by mean error
(ME), mean absolute error (MAE), mean square error (MSE) and RMSE, as defined
in the equations below (Andersen & Bro, 2010). ME and MAE are metrics that
quantifies the accuracy of a prediction. They are defined as the average difference
between the actual and predicted value. ME <0 indicates that observed values are
178 A. S. R. d. S. Bahia and J. Marques Júnior

underestimated by predicted values, and ME >0 indicates overestimated values.


MSE calculates the average of the square of the difference between the actual and
predicted values, and RMSE is the square root of MSE.

1 ⎲
N
ME =
. (Pi − Oi )
N
i=1

1 ⎲
N
MAE =
. |Pi − Oi |
N
i=1

1 ⎲
N
MSE =
. (Pi − Oi )2
N
i=1


|
|1 ⎲N
.RMSE =
⏌ (Pi − Oi )2
N
i=1

where N is the number of estimated values, Pi is the predicted value, and Oi is the
observed value.

3 Results and Discussion

Soil C and N stocks ranged, respectively, from 19.5 to 52.6 t ha−1 (mean 36.6 t ha−1 )
and from 2.9 to 4.4 t ha−1 (mean of 3.7 t ha−1 ). Positive correlations were found for
C stock with DRS (R2 = 0.76; r = 0.87; d = 0.96) and MS (R2 = 0.77; r = 0.88;
d = 0.92), as well as for N stock with DRS (R2 = 0.67; r = 0.82; d = 0.73) and MS
(R2 = 0.61; r = 0.78; d = 0.71). The PTFs presented good precision and accuracy
parameters (higher values of R2 and RPD > 2, and lower values of RMSE), mainly
for the prediction of C (Table 1).
The good performance of FPTs by DRS is due to the fact of the NIR region of
the spectrum present information related to these elements, due to various chemical
bonds (CC, C=C, CH, CN, NH) (Fig. 4). Three main areas in which signals are
most heightened can be seen within the spectra: 1430, 1920, and 2200 nm. These
high absorption bands are caused by the bending and stretching vibrations of O-H
bonds of water molecule and phenolic compounds (O-H), amine (N-H), aliphatic
(C-H) and amide (N-H) in minerals around the 2209-nm region (Xie et al., 2011).
Zheng et al. (2008), when estimating total soil nitrogen, observed those wavelengths
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 179

Table 1 Summary of the results obtained by calibration of models using diffuse reflectance
spectroscopy (DRS) and magnetic susceptibility (MS)
Prediction models R2 RMSE RRMSE SDE RPD PLSR factors
C = 1.15 + 0.7779× VIS-NIR 0.76 1.24 0.21 1.17 2.52 5
N = 0.26 + 0.5487× VIS-NIR 0.67 2.10 0.18 0.15 2.48 5
C = 4.04 + 0.0044 × SM 0.77 1.98 0.11 1.25 2.59 –
N = 0.70 + 0.0009 × SM 0.61 2.41 0.14 0.16 2.11 –
N = 74. R2 coefficient of determination, RMSE root mean square error, RRMSE relative root mean
squared error, SDE standard deviation of error distribution, RPD residual prediction deviation,
PLSR factors number of factors used in the model PLSR, C carbon stock, N nitrogen stock

Fig. 4 Visible-infrared spectra of the selected soils studied in the area: RQod, Typic Quartzip-
samment; LVAd, Typic Hapludox; LVef, Typic Eutrudox; LVd, Typic Hapludox; and LVdf, Typic
Hapludox

of 940, 1050, 1100, 1200, 1300, and 1550 nm were selected as the estimation model
factors.
On the other hand, MS is directly related to C and N because these attributes are
related to the dynamics of microbial activity in the soil (Bayer et al., 2000). And
the MS is influenced by soil organic matter and clay content (Yang et al., 2015),
with the magnetite, maghemite and ferrihydrite being the main minerals responsible
for magnetic expression of soils (Mullins, 1977). Maher (1998) reports significant
correlation between MS and organic carbon, cation exchange capacity and clay
content.
All attributes analyzed showed spatial dependence, expressed by the autocorre-
lation distance (Range), through variogram adjustments (Table 2). The greater the
180 A. S. R. d. S. Bahia and J. Marques Júnior

Table 2 Spherical variogram parameters of best fit to soil C and N stocks in the prediction dataset
(observed laboratory data and predictions made by DRS and MS)
Data set C0 C 0 + C1 RD (%) A (m) R2 SSR ME MAE MSE RMSE
C stock Observed 3.02 16.50 21 1562 0.90 2.65 – – –
DRS 3.05 16.11 22 1548 0.90 2.15 0.84 1.90 6.39 2.53
MS 2.70 15.28 13 1721 0.90 2.95 2.51 3.01 14.45 3.80
N stock Observed 0.01 0.047 30 1640 0.99 1.7E-05 – – –
DRS 0.01 0.045 29 1718 0.99 2.0E-05 0.05 0.15 0.04 0.19
MS 0.02 0.052 33 1855 0.99 3.2E-06 0.09 0.22 0.09 0.29
N = 372. C0 nugget effect, C0 + C1 step, RD random degree, A range, R2 coefficient of
determination of the adjusted model, SSR sum of squares of the residuals, ME mean error, MAE
mean absolute error, MSE mean square error, RMSE root mean square error

range, the greater the spatial dependence. Thus, in the variograms presented, the
range is around 1500–1700 m. This is the distance (radius) within which neighbors
will be used to interpolate a value at a location. The spherical model was adjusted for
all attributes. This model adjusts to attributes that present abrupt variations across
the landscape (Cambardella et al., 1994). These variations may be related to the
types of source material (geology), relief and soils found in the study area, showing
the relationship between these factors and the detailed characterization of spatial
variability and the definition of mapping units.
The C stock showed a low random degree (RD ≤ 25%), while the N stock showed
moderate random degree (25% < RD ≤ 75%). All had low RD ranges, and the
values were similar comparing the observed and predicted data. The spatial pattern
of the mapped attributes was similar, as shown in Fig. 5. It is observed that the map
predicted by the DRS, mainly, was the one that most resembled the map obtained
with the observed data, which is confirmed by the accuracy parameters in the models
obtained (Table 2). It is observed that all maps had overestimated values, according
to the ME index (ME >0 indicates overestimated values), and the carbon stock was
more overestimated than the nitrogen stock. Even so, DRS was the tool that had
the best carbon stock prediction, according to the lowest MAE, MSE and RMSE
indexes.
Estimates of carbon and nitrogen stocks are unavoidably associated with errors.
According to Minasny et al. (2017), to assess a change in soil carbon content, e.g., in
the context of carbon farming for carbon credits, it is crucial to quantify the errors.
Here, we evaluate the error attributed to field measurements of carbon and nitrogen
stocks with DRS and MS. We observed that the map of the carbon stock predicted
by DRS is associated with an error rate of 28%, that is, the error follows a normal
distribution with standard deviation 28%. Already, the MS presented an error of
46%. The DRS overestimated the carbon stock around 5.6 t ha−1 while the MS,
around 9.2 t ha−1 . For Nitrogen stocks, the errors in the estimates were smaller. The
nitrogen stock map predicted by DRS showed an error of 6% (overestimating by
0.2 t ha−1 ), while for MS the error was 12% (overestimating by 0.4 t ha−1 ).
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 181

Fig. 5 Spatial distribution maps of soil C and N stock in the prediction dataset (observed
laboratory data and predicted by diffuse reflectance spectroscopy, DRS and magnetic susceptibility,
MS)

The effect of method error is in line with current carbon credit certification pro-
tocols (Verra’s VM0042 methodology). Equipment based on Vis-NIR is indicated
for several analyses, including carbon stock, because it allows high throughput,
however, it generates many errors (Bellon-Maurel & McBratney, 2011). As well
as MS, an extremely useful tool in pedometrics. However, there is a need to further
investigate more accurate protocols using these tools so that these errors can be
reduced. More work is needed to ascertain this.
It is known that the determination of soil attributes for mapping purposes requires
a large amount of sample and this becomes onerous, making it difficult to carry out
detailed mapping of large areas (Bahia et al., 2017). For this reason, the calibration
of mathematical models using DRS and MS is promising in order to predict these
attributes in unmeasured samples, showing advances in soil sensing.
On the maps, the areas with the highest stocks of C and N are exactly the
areas with more clayey soils (Oxisols) and, consequently, oxidic soils, rich in
hematite and goethite (Bahia et al., 2017). Soil N dynamics is closely associated
to those of C (Bayer et al., 2000) because N content in soil is almost entirely
dependent on the organic matter content (Martin et al., 2002). Soil C and N are
also related to iron oxides and soil structure because produce a hydrophobic coating
in the surfaces of these minerals which allows them to aggregate into particles.
This happens due to the action of microbial polymers (especially polysaccharides
and glomalin) produced by microorganisms during organic matter decomposition
(Tisdall & Oades, 1982; Jarvis et al., 2012). Hence, it explains the role of organic
matter in soil carbon and nitrogen protection, and consequently in its structure,
protecting clay minerals.
Proximal sensors are an effective tool for obtaining soil information on physical,
chemical, biological and mineralogical properties, performing their characterization
182 A. S. R. d. S. Bahia and J. Marques Júnior

and discrimination. These results corroborate Bahia et al. (2015). We can observe
that areas with more or less carbon emissions can be delimited, since more sandy
soils (such as RQod), consequently more porous, increase gas flow from soil to the
atmosphere. Therefore, this knowledge is important to identify areas with different
gas emission potentials, which may help in mapping and decision making to reduce
greenhouse gas emissions. In addition, these tools are of great importance in helping
the detailed mapping of large areas and with a detailed scale, showing advances in
Pedometry.

4 Conclusions

DRS and MS can be applied to predict soil C and N stocks in a sandstone-basalt


transition region. The interpolated maps based on the prediction of attributes by
DRS and MS show a spatial variability pattern similar to maps based on observed
data, which is confirmed by the accuracy indexes. However, the carbon and nitrogen
stocks maps showed interpolation errors. In the carbon stock maps, the error was
28% for DRS and 46% for MS, overestimating by 5.6 t ha−1 and 9.2 t ha−1 ,
respectively for DRS and MS. The error of the nitrogen stock maps was 6% for
DRS, overestimating by 0.2 t ha−1 , and 12% for DM, overestimating by 0.4 t ha−1 .
These errors are in line with current carbon credit certification protocols. However,
as a perspective, it will be essential to include predictive improvements in these
methods so that they can be fully used in certified soil carbon sequestration offset
protocols.

Acknowledgements This work was supported by the São Paulo Research Foundation (FAPESP)
[grant numbers 13/13978-9, 13/17552-6]. The authors are grateful to the Coordination for the
Improvement of Higher Education Personnel (CAPES) and to the National Council for Research
and Development (CNPq) by research support, and to the São Martinho Group for the concession
of the study area.

References

Andersen, C. M., & Bro, R. (2010). Variable selection in regression – a tutorial. Journal of
Chemometrics, 24, 728–737.
Bahia, A. S. R. S. (2016). Estimation of soil attributes by diffuse reflectance spectroscopy
and magnetic susceptibility in the landscape context. Doctoral thesis in agronomy (soil
science)—Faculty of Agricultural and Veterinary Sciences, São Paulo State University (Unesp),
Jaboticabal.
Bahia, A. S. R. S., Marques, J., & Siqueira, D. S. (2015). Procedures using diffuse reflectance
spectroscopy for estimating hematite and goethite in Oxisols of São Paulo, Brazil. Geoderma
Regional, 5, 150–156.
Bahia, A. S. R. S., Marques Júnior, J., La Scala, N., Cerri, C. E. P., & Camargo, L. A. (2017).
Prediction and mapping of soil attributes using diffuse reflectance spectroscopy and magnetic
susceptibility. Soil Science Society American Journal, 81, 1450–1462.
Predicting and Mapping of Soil Carbon and Nitrogen Stocks by Diffuse. . . 183

Bayer, C., Mielniczuk, J., Amado, T. J. C., Martin-Neto, L., & Fernandes, S. (2000). Organic matter
storage in a sandy clay loam Acrisol affected by tillage and cropping systems in southern Brazil.
Soil Tillage Research, 54, 101–109.
Bellieni, G., Comin-Chiaramonti, P., Marques, L. S., Melf, A. J., Nardy, A. J. R., Papatrechas,
C., Piccirillo, E. M., Roisenberg, A., & Stolfa, D. (1986). Petrogenetic aspects of acid and
basaltic lavas from the Parana’ plateau (Brazil): Geological, mineralogical and petrochemical
relationships. Journal of Petrology, 27(4), 915–944.
Bellon-Maurel, V., & McBratney, A. (2011). Near-infrared (NIR) and mid-infrared (MIR) spectro-
scopic techniques for assessing the amount of carbon stock in soils–critical review and research
perspectives. Soil Biology and Biochemistry, 43(7), 1398–1410.
Camargo, L. A., Marques Júnior, J., Pereira, G. T., Alleoni, L. R. F., Bahia, A. S. R. S., & Teixeira,
D. D. B. (2016). Pedotransfer functions to assess adsorbed phosphate using iron oxide content
and magnetic susceptibility in an Oxisol. Soil Use and Management, 32(2), 172–182.
Camargo, L. A., Marques Júnior, J., Barrón, V., Alleoni, L. R. F., Pereira, G. T., Teixeira, D. D. B.,
& Bahia, A. S. R. S. (2018). Predicting potentially toxic elements in tropical soils from iron
oxides, magnetic susceptibility and diffuse reflectance spectra. Catena, 165, 503–515.
Cambardella, C. A., Moorman, T. B., Novak, J. M., Parkin, T. B., Karlen, D. L., Turco, R. F.,
& Konopka, A. E. (1994). Field-scale variability of soil properties in Central Iowa soils. Soil
Science Society of America Journal, 58, 1501–1511.
Cervi, E. C., Maher, B., Poliseli, P. C., De Souza Junior, I. G., & Da Costa, A. C. S. (2019).
Magnetic susceptbility as a pedogenic proxy for grouping of geochemical transects in
landscapes. Journal of Applied Geophysics, 169, 109–117.
Coutinho, F. S., Pereira, M. G., Tostes, J. D. O., Francelino, M. R., & Gaia-Gomes, J. H. (2017).
Applicaton of Georadar in areas with different vegetation cover. FLORAM, 24, e20160011.
Dearing, J. A. (1994). Environmental magnetic susceptibility. Using the Bartington MS2 system.
British Library.
Demattê, J. A., Bellinaso, H., Araújo, S. R., Rizzo, R., & Souza, A. B. (2016). Spectral
regionalization of tropical soils in the estimation of soil attributes. Revista Ciência Agronômica,
47, 589–598.
Demattê, J. A. M., et al. (2019). (65 authors) the Brazilian soil spectral library (BSSL): A general
view. Applicaton and Challenges. Geoderma, 354, 113793.
Fernandes, K., Marques Júnior, J., Bahia, A. S. R. S., Demattê, J. A. M., & Ribon, A. A.
(2020). Landscape-scale spatial variability of kaolinite-gibbsite ratio in tropical soils detected
by diffuse reflectance spectroscopy. Catena, 195, 104795.
Filla, V. A., Coelho, A. P., Ferroni, A. D., Bahia, A. S. R. S., & Marques Júnior, J. (2021).
Estimation of clay content by magnetic susceptibility in tropical soils using linear and nonlinear
models. Geoderma, 403, 115371.
Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: A tutorial. Analytica Chimica
Acta, 185, 1–17.
Geobank. (2022). Serviço Geológico do Brasil. http://geosgb.cprm.gov.br/. Accessed 23 Sept 2022.
IPT – Instituto de Pesquisas Tecnológicas do Estado de São Paulo. (1981). Geomorphological map
of the state of São Paulo. IPT/DMGA.
Jarvis, S., Tisdall, J., Oades, M., Six, J., Gregorich, E., & Kögel-Knabner, I. (2012). Landmark
papers. European Journal of Soil Science, 63, 1–21.
Lacerda, W. A., & Sandroni, S. S. (1985). Movimentos de Massas Coluviais. ABMS.
Lal, R., Smith, P., Jungkunst, H. F., Mitsch, W. J., Lehmann, J., Nair, P. K. R., McBratney, A. B.,
Sá, J. C. M., Schneider, J., Zinn, Y. L., Skorupa, L. A., Zhang, H., Minasny, B., Srinivasrao,
C., & Ravindranath, N. H. (2018). The carbon sequestration potential of terrestrial ecosystems.
Journal of Soil and Water Conservation, 73(6), 145A–152A.
Maher, B. A. (1998). Magnetic properties of modern soils and quaternary loessic paleosols:
Paleoclimate implications. Palaeogeography, Palaeoclimatology, Palaeoecology, 137, 25–54.
Martin, P. D., Malley, D. F., Manning, G., & Fuller, L. (2002). Determination of soil organic
carbon and nitrogen at the feld level using near-infrared spectroscopy. Canadian Journal of
Soil Science, 82, 413–422.
184 A. S. R. d. S. Bahia and J. Marques Júnior

McBratney, A. B., Minasny, B., & Viscarra Rossel, R. A. (2006). Spectral soil analysis and
inference systems: A powerful combination for solving the soil data crisis. Geoderma, 136,
272–278.
Minasny, B., Malone, B. P., Stockmann, U., Odgers, N., & McBratney, A. B. (2014). Pedometrics.
Reference Module in Earth Systems and Environmental Sciences. https://doi.org/10.1016/B978-
0-12-409548-9.09163-6
Minasny, B., Malone, B. P., McBratney, A. B., et al. (2017). Soil carbon 4 per mille. Geoderma,
292, 59–86.
Molin, J. P., & Tavares, T. R. (2019). Sensor systems for mapping soil fertlity atributes: Challenges,
advances, and perspectves in brazilian tropical soils. Engenharia Agrícola, 39, 126–147.
Moura-Bueno, J. M., Dalmolin, R. S. D., Horst-Heinen, T. Z., Ten Caten, A., Vasques, G. M.,
Dotto, A. C., & Grunwald, S. (2020). When does stratification of a subtropical soil spectral
library improve predictions of soil organic carbon content? Science of the Total Environment,
1, 139895.
Mullins, C. E. (1977). Magnetic susceptibility of the soil and its signifcance in soil science: A
review. Journal Soil Science, 28, 223–246.
Santos, H. G., Jacomine, P. K. T., Anjos, L. H. C., Oliveira, V. A., Lumbreras, J. F., Coelho, M. R.,
& Cunha, T. J. F. (2018). Sistema brasileiro de classificação de solo (five ed. rev. e ampl ed.).
Embrapa.
Soil Survey Staff. (2014). Soil taxonomy (12th ed.). USDA.
Sorensen, C., Murray, V., Lemery, J., & Balbus, J. (2018). Climate change and women’s health:
Impacts and policy directions. PLoS Medicine, 15(7), e1002603.
Stevens, A., Nocita, M., Tóth, G., Montanarella, L., & van Wesemael, B. (2013). Prediction of
soil organic carbon at the European scale by visible and near infrared reflectance spectroscopy.
PLoS One, 8(6), e66409.
Thornthwaite, C. W. (1948). An approach towards a rational classification of climate. Geographical
Review, 38, 55–94.
Tiecher, T., Caner, L., Minella, J. P. G., Evrard, O., Mondamert, L., Labanowski, J., &
Rhheinheimer, D. D. S. (2017). Tracing sediment sources using mid-infrared spectroscopy in
Arvorezinha catchment. Southern Brazil. Land Degradaton & Development, 28, 1603–1614.
Tisdall, J. M., & Oades, J. M. (1982). Organic matter and water-stable aggregates in soils. European
Journal of Soil Science, 33, 141–163.
Torrent, J., Liu, Q. S., Bloemendal, J., & Barrón, V. (2007). Magnetic enhancement and iron oxides
in the upper Luochuan loess-paleosol sequence on the Chinese Loess Plateau. Soil Science
Society American Journal, 71, 1–9.
Viscarra Rossel, R. A. (2008). ParLeS: Software for chemometric analysis of spectroscopic data.
Chemometrics and Intelligent Laboratory Systems, 90, 72–83.
Viscarra Rossel, R. A., McBratney, A. B., & Minasny, B. (2010). Proximal soil sensing (446p).
Springer science.
Viscarra Rossel, R. A., Adamchuk, V. I., Sudduth, K. A., McKenzie, N. J., & Lobsey, C.
(2011). Proximal soil sensing: An effective approach for soil measurements in space and time.
Advances in Agronomy, 113, 243–291.
Viscarra Rossel, R. A., et al. (2016). (39 authors) a global spectral library to characterize the
world’s soil. Earth-Science Reviews, 155, 198–230.
White, I. C. (1908). Relatório final da comissão de estudos das minas de carvão de pedra do Brasil.
DNPM.
Xie, H. T., Yang, X. M., Drury, C. F., Yang, J. Y., & Zhang, X. D. (2011). Predicting soil organic
carbon and total nitrogen using mid-and near-infrared spectra for Brookston clay loam soil in
southwestern Ontario, Canada. Canadian Journal Soil Science, 91, 53–63.
Yang, P. G., Yang, M., Mao, R. Z., & Byrne, J. M. (2015). Impact of long-term irrigation with
treated sewage on soil magnetic susceptibility and organic matter content in North China.
Bulletin of Environmental Contamination and Toxicology, 95, 102–107.
Zheng, L., Li, M., Pan, L., Sun, J., & Tang, N. (2008). Estimation of soil organic matter and soil
total nitrogen based on NIR spectroscopy and BP neural network. Spectroscopy and Spectral
Analysis, 28, 1160–1164.
Iron Rods as Markers for Soil Horizon
Depths and Point Scatterers for
Estimating Pulse Velocity in GPR
Imagery

Carlos Wagner Rodrigues do Nascimento, Marcos Bacis Ceddia ,


Gustavo Mattos Vasques, Hugo Machado Rodrigues,
Ronaldo Pereira de Oliveira, and Saulo Siqueira Martins

1 Introduction

Among the available geophysical sensors, the ground penetrating radar (GPR) has
shown potential to identify and map soil features both horizontally and vertically
(Peng et al., 2020). The GPR uses time-domain reflectometry to image shallow
terrain, including soils, using antennas with frequencies ranging from 10 MHz to
1 GHz (Annan, 2009). The data collected by the GPR is stored in an image called
radargram. The sharpness of the radargram depends on the relative permittivity,
or dielectric constant (k), of the soil constituents. The relative permittivity is the
extent to which a material concentrates electric flux, which depends on the physical
properties of the material.
One of the requisites for interpreting the radargram is determining the depth of
the features of interest in the soil. For that, it is essential to know the velocity of the
electromagnetic pulse in the soil to convert the Y-axis of the radargram from time to
distance (depth) units (Annan, 2009). The pulse velocity depends on the physical
and electromagnetic properties of the soil, e.g., on its moisture and mineralogy
(Iqbal et al., 2021).
A proper method for measuring the GPR pulse velocity is the Common Midpoint
(CMP), which requires a GPR having a bi-static antenna, i.e., an antenna in which

C. W. R. do Nascimento (✉) · M. B. Ceddia · H. M. Rodrigues


Department de Agrotecnologias e Sustentabilidade, Universidade Federal Rural do Rio de
Janeiro, Seropédica, RJ, Brazil
G. M. Vasques · R. P. de Oliveira
Embrapa Solos. Rua Jardim Botânico, Rio de Janeiro, RJ, Brazil
e-mail: [email protected]; [email protected]
S. S. Martins
Faculdade de Geofísica, Universidade Federal do Pará, Belém, PA, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 185
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_13
186 C. W. R. do Nascimento et al.

the transmitter coil is physically separated from the receiver coil. However, the
CMP cannot be executed using GPR models with monostatic antennas (with the
transmitter and receiver coils together in one device). An alternative is estimating
the velocity of the electromagnetic pulse from hyperbolas created in the radargram
from the interaction of the pulse with “point scatterers”, which are underground
features with very distinct relative permittivity compared to the soil matrix (Jacob
& Urban, 2015).
This method has been used to find landmines (Garcia-Fernandez et al., 2019),
metallic targets (Quarto et al., 2007; Juliano et al., 2013), artifacts in archeology
(Correia et al., 2019), and others. The signature of a buried object in the radargram
is composed by the reflection of the waves from the coil that interact with the
point scatterers underground. According to Qiao et al. (2015), when the antenna
identifies an object linearly, the antenna position x and corresponding echo time
delay t approximately satisfy a three-parameter hyperbola equation, as follows:

t2 4(x − x0 )2
. − =1
t20 v2 t20

Where: t0 is the time delay for the echo; x0 is the horizontal position of the object; v
is the velocity of wave propagation underground, assumed to be constant in a small
region. The coordinates (x0 , t0 ) indicate the position of the hyperbola apex, whereas
v and the position (x0 , t0 ) of the vertex determine its shape. More information can
be found in Qiao et al. (2015).
Identifying point scatterers in the radargram depends on the resolution and
sharpness of the radargram, which depend on the soil constituents and the frequency
of the GPR antenna. The soil constituents that present conductive properties, such
as clays, soil water (Cezar et al., 2012), ions, and salts in high concentrations
(Correia et al., 2019), cause signal attenuation, and thus, a less sharp radargram.
Thus, in more conductive soils with more water and clay, the radargrams have lower
resolution and higher noise when compared with radargrams of more resistive (drier,
sandier) soils (Campos et al., 2019; Peng et al., 2020).
This work aimed to assess the feasibility of using iron rods as point scatterers to:
a) identify the depths of soil horizons transitions in 450 MHz GPR images; and b)
estimate the pulse velocity from the derived hyperbolas and convert the Y-axis of
the radargram from time to depth units.

2 Material and Methods

The work was carried out in the Seropédica municipality, Rio de Janeiro state,
southeastern Brazil, with central latitude 43◦ 41' S and longitude 22◦ 45' W. Five
trenches (2-m long by 1.5-m wide by 1.20–1.72-m deep) were opened at different
positions in the study area (Fig. 1), where soil profiles were described according
Iron Rods as Markers for Soil Horizon Depths and Point Scatterers for. . . 187

Fig. 1 Study area in Seropédica, Rio de Janeiro state, Brazil, showing the location of the five soil
profiles and the elevation (m) in the background

to Santos et al. (2015). The soils were classified as Planossolos (Planosols),


according to the Brazilian Soil Classification System (Santos et al., 2018), including
two profiles of PLANOSSOLO HÁPLICO Distrófico arênico (profiles P1 and
P5), two PLANOSSOLO HÁPLICO Distrófico gleissólico (P2 and P4), and one
PLANOSSOLO HÁPLICO Distrófico espessarênico (P3). Disturbed soil samples
were collected at each horizon and analyzed in the laboratory to measure physical,
and chemical attributes, including particle size fractions, exchangeable bases (Ca,
Mg, K, and Na), and exchangeable acidity (H+Al), according to Teixeira et al.
(2017). The sum of exchangeable bases (SB = Na + K + Ca + Mg) and the cation
exchange capacity (CEC = SB + H + Al) were calculated from the measured bases.
188 C. W. R. do Nascimento et al.

After the delineation of the horizons in the soil profiles, iron rods with dimen-
sions of 80 cm in length and 0.8 cm in diameter were inserted in the transitions
among soil horizons. Then, the five soil profiles were imaged using the GPR
MALÅ GroundExplorer (Guideline Geo AB, Sundbyberg, Sweden) by taking one
measurement at every 5 cm along transects close to the soil trenches. Radargrams
from the five profiles were acquired using monostatic shielded antenna of 450 MHz.
The spatial positions of the soil profiles were recorded in the field, as well as the
distances of the iron rods from the top and the side of the trenches.
The radargrams obtained in the field were pre-processed using the ReflexW
software (Sandmeier, 2009) by doing two procedures in sequence: static correction
and dewow. Using other filters available in the software degraded the quality of the
radargrams, making it more difficult to identify the underground features of interest,
i.e., the iron rod hyperbolas and soil horizons transitions.
After pre-processing, the hyperbolas from the buried iron rods were identified
in the radargrams, and velocity-estimation hyperbolas were fitted manually to them
in ReflexW. The GPR electromagnetic pulse velocities in the soil were estimated at
each hyperbola, followed by the conversion of the Y-axis of the radargram from
time (ns) to depth units (m) (Sandmeier, 2009). The accuracy of the estimated
pulse velocity and, consequently, of the conversion of the Y-axis to depth units,
was checked by comparing the depths to the iron rods (horizons transitions) in the
field (ground truth) with the estimated depths of the corresponding hyperbolas in
the radargrams after the Y-axis conversion.

3 Results and Discussion

In the P1 soil profile (PLANOSSOLO HÁPLICO Distrófico arênico), the sandy


texture reaches down to 69 cm depth, where there is an increase of about 4.5 times in
clay in the Bt1 horizon in relation to the E horizon above (from 92 to 388 g kg−1 ),
associated to a decrease of 270 g kg−1 in sand (Table 1; Fig. 2). The chemical
attributes SB, H + Al, and CEC also increased substantially in the Bt1 horizon
relative to the upper horizons.
The iron rods hyperbolas were clearly visible in the radargrams from P1 at the
first three horizon transitions at 8, 22, and 69 cm, respectively (Table 1; Fig. 2),
especially at the E horizon transitions at 22 and 69 cm, because it has a much lower
k then the conductive iron rods. The top of the hyperbola indicates the iron rod
position and its shape depends on the horizontal step and the pulse velocity in the
soil (Artagan & Borecky, 2016). Similar results were found by Juliano et al. (2013),
who observed hyperbolas in the radargrams caused from buried pipes in a sandy soil.
The hyperbolas were visible in the radargrams due to the great dielectric contrast
between materials in both cases.
The hyperbolas from the iron rods inserted in the subdivisions of the Bt horizon
(Bt1, Bt2, and Bt3) were not identified in P1 (Fig. 2). This may have been caused by
their similar physical characteristics, with higher clay contents, and similar chemical
characteristics. In addition, the GPR pulse gets weaker as the soil depth increases
Iron Rods as Markers for Soil Horizon Depths and Point Scatterers for. . . 189

Table 1 Physical and chemical attributes of the soil profiles


Sand Clay H + Al SB CEC
Profile Horizon Depth (cm) (g kg−1 ) (cmolc dm−3 )
P1 A 0–8 809 99 5.6 1.47 7.07
AE 8–22 876 43 3.0 1.90 4.90
E 22–69 831 92 1.5 1.16 2.66
Bt1 69–92 564 388 4.7 2.50 7.20
Bt2 92–133 441 443 5.6 2.10 7.70
Bt3 133–155+ 513 367 4.7 3.20 7.90
P2 A1 0–15 794 58 7.9 2.66 10.56
A2 15–35 830 60 4.7 1.46 6.16
Eg 35–57 866 54 3.7 1.35 5.05
Btg1 57–80 530 370 12.7 4.33 17.03
Btg2 80–110 563 311 9.9 6.22 16.12
Btg3 110–123 559 278 9.9 8.69 18.59
Cg 123–140+ 613 265 8.6 11.60 20.20
P3 A 0–9 913 31 2.8 2.37 5.17
E1 9–24 910 18 1.5 1.57 3.07
E2 24–38 922 17 1.1 2.09 3.19
E3 38–116 930 17 0.6 1.44 2.04
E4 116–145 875 40 0.7 1.36 2.06
E5 145–158 717 20 4.7 2.02 6.72
Bt 158–172+ 621 267 6.2 2.83 9.03
P4 A 0–13 762 50 0.2 4.28 12.68
AE 13–32 884 49 0.4 2.42 5.76
E 32–44 878 45 0.5 2.15 4.25
Btg1 44–66 703 216 1.7 2.56 8.36
Btg2 66–110 376 398 5.4 10.13 24.53
Cg 110–120+ 508 365 6.2 13.81 28.01
P5 A1 0–7 807 74 6.9 5.14 12.04
A2 7–16 865 45 6.0 2.20 8.20
AE 16–26 859 34 4.9 1.88 6.78
E1 26–41 851 41 2.4 1.75 4.15
E2 41–52 857 43 1.9 1.75 3.65
E3 52–67 819 85 2.4 1.84 4.24
BE 67–97 759 118 4.7 2.56 7.26
Bt1 97–127 675 247 5.1 2.18 7.28
Bt2 127–160+ 693 261 4.3 2.37 6.67
H+Al Exchangeable acidity, SB Sum of exchangeable bases, CEC Cation exchange capacity
The “+” means that the analysis represent the value of Hydrogen PLUS Aluminium

due to the pulse interacting with the soil constituents, minimizing the pulse scatter
(hyperbolas) from the iron rods. Since the buried objects were not visible in the
deeper layers, more robust methods would be needed to detect them (Sezgin et al.,
2004; Yurt et al., 2023).
190 C. W. R. do Nascimento et al.

Fig. 2 On the left, picture of the P1 Planosol profile (PLANOSSOLO HÁPLICO Distrófico
arênico) with the arrows indicating the positions where the iron rods were inserted and the numbers
representing the horizon transitions. On the right, the portion of the radargram close to the profile.
The fitted hyperbolas are shown along with the corresponding estimated pulse velocity. The vertical
lines indicate the approximate side limits of the soil trench. The horizontal line indicates the E-to-
Bt1 horizon transition

The pulse velocity estimated from the iron rods hyperbolas in P1 was
0.1150 m ns−1 (Fig. 2). In the E-to-Bt1 horizon transition, at 69 cm, an
approximately horizontal feature appears in the radargram, corresponding to the
bottom of the E horizon (Fig. 2, marked with a horizontal line).
After time-to-depth conversion of the radargram Y-axis, the depth of the bottom
of the E horizon (third transition) matched accurately its corresponding depth of
69 cm as observed in the field (Table 1). This shows the potential of using GPR to
map the depth to the argillic horizon, i.e., the Bt horizon right below the E horizon,
as described by Weimin et al. (2023). For instance, the horizontal feature marking
the top of the argillic (Bt) horizon in Fig. 2 can be extrapolated beyond the limits
of the soil trench, virtually along the entire transect covered by the radargram, if the
feature persists, and then used to trace the limits among soil types, expediting soil
survey.
In the P2 soil profile (PLANOSSOLO HÁPLICO Distrófico gleissólico), an
increase of about seven times in clay occurs from the E to the Btg1 horizon at
57 cm (from 54 to 370 g kg−1 ), with a corresponding decrease in the sand content
of 336 g kg−1 (Table 1). The chemical attributes also increase considerably from E
to Btg1.
The radargram from the P2 shows two hyperbolas, marking the second and third
horizon transitions at 15 and 35 cm, respectively (Table 1; Fig. 3). The E-to-Btg1
transition also appeared in the radargram at 35 cm. Similarly to P1, there is no clear
evidence of the iron rods in the lower horizons, possibly due to the higher clay and
exchangeable bases contents.
In the P2 radargram, the low clarity of the hyperbolas can be related to the higher
ion concentrations in the sandy horizons (compared to P1) and the presence of iron
Iron Rods as Markers for Soil Horizon Depths and Point Scatterers for. . . 191

Fig. 3 On the left, picture of the P2 Planosol profile (PLANOSSOLO HÁPLICO Distrófico
gleissólico) with the arrows indicating the positions where the iron rods were inserted and the
numbers representing the horizon transitions. On the right, the portion of the radargram close to the
profile. The fitted hyperbolas are shown along with the corresponding estimated pulse velocity. The
vertical lines indicate the approximate side limits of the soil trench. The horizontal line indicates
the Eg-to-Btg1 horizon transition

oxides in mottles (Bortolin & Malagutti Filho, 2012). Mottles have minerals of low
crystallinity (Kämpf et al., 2015) and more charge sites, leading to a higher k, which
decreases the difference in k between the iron rods and the surrounding soil matrix,
and thus, the sharpness of the hyperbolas (De Benedetto et al., 2010; Sagnard
& Tarel, 2016). This radargram had estimated pulse velocities of 0.1300 m ns−1
(Fig. 3).
The P3 soil profile (PLANOSSOLO HÁPLICO Distrófico espessarênico) was
described down to 1.70 m depth (Table 1; Fig. 4). This soil shows an expressive
increase in clay content of about 13 times in the Bt horizon compared to the E5
horizon, at 158 cm (from 20 to 267 g kg−1 ). The chemical atributes increased mainly
in E5 and Bt in relation to the upper horizons.
In the P3 radargram, the iron rod hyperbolas could be identified only in the
first four horizon transitions at 9, 24, 38, and 116 cm, respectively (Table 1;
Fig. 4). This suggests that the maximum depth achieved with the 450 MHz antenna
lies between 116 (last visible hyperbola) and 145 cm (next transition, where the
hyperbola is not visible), even in conditions of sandy textured horizons compared
to the other profiles. The estimated pulse velocities were 0.1300, 0.1140, 0.1140,
and 0.1180 m ns−1 . This happens because the pulse velocity changes along the soil
profile as the electromagnetic pulse interacts with the soil constituents (Forte et al.,
2014). Considering the horizontal and vertical variations within the soil, estimating
the pulse velocity may be a complex task, and often results in a range of different
estimated velocities for the same soil, as was the case for Artagan and Borecky
(2016) and Qiao et al. (2015).
For the P3 site, the first pulse velocity was chosen to create the depth model,
which led to errors at the depth transitions where the pulse velocities were different
192 C. W. R. do Nascimento et al.

Fig. 4 On the left, picture of the P3 Planosol profile (PLANOSSOLO HÁPLICO Distrófico
espessarênico) with the arrows indicating the positions where the iron rods were inserted and the
numbers representing the horizon transitions. On the right, the portion of the radargram close to
the profile. The fitted hyperbolas are shown along with the corresponding estimated pulse velocity.
The vertical lines indicate the approximate side limits of the soil trench

from the chosen one, agreeing with Artagan and Borecky (2016). Nonetheless, most
horizon transitions were properly placed at the right depths in most soil profiles,
which indicates that overall adequate time-to-depth conversions were done and
made possible due to adequate iron rod hyperbola identification and fitting in the
radargrams.
In the P4 soil profile (PLANOSSOLO HÁPLICO Distrófico gleissólico), an
increase of five times in clay occurs from the E to the Btg1 horizon at 44 cm (from
45 to 216 g kg−1 ), with a corresponding decrease in the sand content of 175 g kg−1
(Table 1; Fig. 5). The exchangeable bases increase mainly in the Btg2 horizon.
The P4 radargram captured the first, third, and fourth iron rods (with velocities of
0.1050, 0.1050, and 0.1000 m n−1 ), including the iron rod that marks the E-to-Btg1
horizon transition at 44 cm (third iron rod). This is the transition where the greatest
difference in k values among adjacent horizons is observed in the profile (Table 1;
Fig. 5). The time-to-depth conversion model used a pulse velocity of 0.1050 m ns−1 ,
that allowed to accurately convert the radargram Y-axis from time to depth units,
positioning the E-to-Btg1 horizon transition at approximately 44 cm (Fig. 5).
In the P5 soil profile (PLANOSSOLO HÁPLICO Distrófico arênico), different
from the previous Planosols, there is no significantly change in particle size fractions
from the E to the Bt horizon at 67 cm (from 819 to 759 g kg−1 in the sand content,
and from 85 to 118 g kg−1 in the clay content) (Table 1; Fig. 6). The chemical
attributes are higher in the BE, Bt1, and Bt2 horizons.
The iron rods hyperbolas are visible at the upper four horizon transitions down
to 41 cm. The estimated pulse velocities at these transitions were 0.1100, 0.1100,
0.0950, and 0.1060 m ns−1 , respectively (Fig. 6). The sandy horizons in P5 have
mottles, which should reduce the sharpness of the hyperbolas, like in P2 and P4,
but this was not the case. In addition, despite the physical similarities in sandy
Iron Rods as Markers for Soil Horizon Depths and Point Scatterers for. . . 193

Fig. 5 On the left, picture of the P4 Planosol profile (PLANOSSOLO HÁPLICO Distrófico
gleissólico) with the arrows indicating the positions where the iron rods were inserted and the
numbers representing the horizon transitions. On the right, the portion of the radargram close to the
profile. The fitted hyperbolas are shown along with the corresponding estimated pulse velocity. The
vertical lines indicate the approximate side limits of the soil trench. The horizontal line indicates
the E-to-Btg1 horizon transition

Fig. 6 On the left, picture of the P5 Planosol profile (PLANOSSOLO HÁPLICO Distrófico
arênico) with the arrows indicating the positions where the iron rods were inserted and the numbers
representing the horizon transitions. On the right, the portion of the radargram close to the profile.
The fitted hyperbolas are shown along with the corresponding estimated pulse velocity. The vertical
lines indicate the approximate side limits of the soil trench; the horizontal line indicates the E3-to-
BE horizon transition

horizons among all soil profiles, most of the sandy horizons in P5 have higher
chemical attribute values in relation to the sandy and some clayey horizons in the
other profiles.
The hyperbola detection and clarity in the radargram depend on the material, size,
shape, depth and position angle of the buried objects (Alshamy et al., 2021). And the
buried rods had the same material, diameter and insertion angle in all soil profiles.
194 C. W. R. do Nascimento et al.

Thus, it is not clear why most hyperbolas were visible in P5 but not in the other
profiles. Nevertheless, the 450 MHz antenna provided at least one clear hyperbola
in all radargrams that could be used to accurately convert their Y-axis from time to
depth units.

4 Conclusions

1. The use of iron rods to mark soil horizons transitions is recommended and works
best in sandy horizons. In more clayey horizons, and in deeper transitions, the
iron rod hyperbolas are not clear or not visible in the radargram;
2. The sharpness of the iron rod hyperbolas in radargrams depends on the depth
and sand content of the soil horizons. However, the presence of mottles and the
chemical properties of the horizons do not have a clear role on the visibility or
sharpness of the hyperbolas in the radargrams;
3. Buried iron rods generate hyperbola features in the radargram that are used for
estimating the GPR pulse velocity for converting the radargram Y-axis from
time to depth units, allowing to properly identify and visualize the soil horizons
transitions along the depth.

Acknowledgments To the Laboratory of Water and Soil in Agroecosystems (LASA) at the


Department of Agrotechnologies and Sustainability of the Federal Rural University of Rio de
Janeiro (UFRRJ) for their technical and logistic support. To Embrapa Agrobiology, UFRRJ, and
the Agricultural Research Corporation of the State of Rio de Janeiro (Pesagro) for providing
the study area for this work. Funding was provided by the Brazilian Coordination for the
Improvement of Higher Education Personnel (Coordenação de Aperfeiçoamento de Pessoal de
Nível Superior; CAPES), and by the Brazilian Agricultural Research Corporation (Embrapa) grants
05.13.15.003.00.00 and 03.12.10.002.00.00.

References

Alshamy, H. M., Abdul Sadah, J. W., Saeed, T. R., Mohammed, S. A., Hatem, G. M., & Gatan, A.
H. (2021). Evaluation of GPR detection for buried objects material with different depths and
scanning angles. IOP Conference Series: Materials Science and Engineering, 1090, 012042.
Annan, A. P. (2009). Electromagnetic principles of ground penetrating radar. In H. M. Jol (Ed.),
Ground penetrating radar: Theory and applications (1st ed., pp. 3–40). Elsevier.
Artagan, S. S., & Borecky, V. (2016). Estimation methods for obtaining GPR signal velocity.
International Journal of Structural and Civil Engineering, 3, 59–63.
Bortolin, J. R. M., & Malagutti Filho, W. (2012). Monitoramento temporal da pluma de
contaminação no aterro de resíduos urbanos de Rio Claro (SP) por meio do método geofísico
da eletrorresistividade. Geologia USP. Séries Cient, 12, 99–9113.
Campos, J. R. R., Vidal-Torrado, P., & Modolo. (2019). A Use of ground penetrating radar to study
spatial variability and soil stratigraphy. Engenharia Agrícola, 39, 358–364.
Cezar, E., Nanni, M. R., Chicati, M. L., & Oliveira, R. B. (2012). Emprego de GPR no estudo de
solos e sua correlação com métodos laboratoriais. Semina Ciencias Agrarias, 33, 979–988.
Iron Rods as Markers for Soil Horizon Depths and Point Scatterers for. . . 195

Correia, K. A., Silva, M. W. C., Mendes, A. C., Miranda, A. G. O., Luczynsky, E., & Cunha,
I. R. V. (2019). A utilização do Ground Penetrating Radar (GPR) na definição de penetração
de cunha salina e no monitoramento do nível freático em praia estuarina amazônica. Águas
Subterrâneas, 33, 87–101.
De Benedetto, D., Castrignanò, A., Sollitto, D., & Modugno, F. (2010). Spatial relationship
between clay content and geophysical data. Clay Minerals, 451, 97–207.
Forte, E., Dossi, M., Pipan, M., & Colucci, R. R. (2014). Velocity analysis from common offset
GPR data inversion: Theory and application to synthetic and real data. Geophysical Journal
International, 197, 1471–1483.
Garcia-Fernandez, M., Morgenthaler, A., Alvarez-Lopez, Y., Las Heras, F., & Rappaport, C.
(2019). Bistatic landmine and IED detection combining vehicle and drone mounted GPR
sensors. Remote Sensing, 11, 1–14.
Iqbal, I., Bin, X., Tian, G., Wang, H., Sanxi, P., Yang, Y., Masood, Z., & Hanwu, S. (2021).
Near surface velocity estimation using GPR data: Investigations by numerical simulation, and
experimental approach with AVO response. Remote Sensing, 13, 1–24.
Jacob, R. W., & Urban, T. M. (2015). Ground-penetrating radar velocity determination and
precision estimates using common-mid-point (CMP) collection with hand-picking, semblance
analysis, and cross-correlation analysis: A case study and tutorial for archaeologists. Faculty
Journal Articles, 1, 1–18.
Juliano, T., Meegoda, J., & Watts, D. (2013). Acoustic emission leak detection on a metal pipeline
buried in sandy soil. Journal of Pipeline Systems Engineering and Practice, 4, 149–155.
Kämpf, N., Marques, J. J., & Curi, N. (2015). Mineralogia dos solos brasileiros. In J. C. Ker,
N. Curi, C. E. G. R. Schaefer, & P. V. Torrado (Eds.), Pedologia: Fundamentos (1st ed., pp.
81–146). Sociedade Brasileira de Ciência do Solo.
Peng, G., Ruiyan, W., Gengxing, Z., & Yuhuan, L. (2020). The application of GPR to the detection
of soil wetted bodies formed by drip irrigation. PLoS One, 15, 1–15.
Qiao, L., Qin, Y., Ren, X., & Wang, Q. (2015). Identification of buried objects in GPR
using amplitude modulated signals extracted from multiresolution monogenic signal analysis.
Sensors, 15, 30340–30350.
Quarto, R., Schiavone, D., & Diaferia, I. (2007). Ground penetrating radar of a prehistoric site in
southern Italy. Journal of Archaeological Science, 34, 2071–2080.
Sagnard, F., & Tarel, J. P. (2016). Template-matching based detection of hyperbolas in ground-
penetrating radargrams for buried utilities. Geophysical Engineering, 13, 491–504.
Sandmeier, K. J. (2009). ReflexW Version 8.5: program for processing of seismic, acoustic or
electromagnetic reflection, refraction and transmission data. Sandmeier.
Santos, R. D., Santos, H. G., Ker, J. C., Anjos, L. H. C., & Shimizu, S. H. (2015). Manual de
descrição e coletas de solos no campo. Sociedade Brasileira de Ciência do Solo.
Santos, H. G., Jacomine, P. K. T., Anjos, L. H. C., Oliveira, V. A., Lumbreras, J. F., Coelho, M. R.,
Almeida, J. A., Cunha, T. J. F., & Oliveira, J. B. (2018). Sistema Brasileiro de Classificação de
Solos. Embrapa.
Sezgin, M., Kurugöllü, F., Taşdelen, I., & Öztürk, S. (2004). Real time detection of buried objects
by using GPR. Proceedings of SPIE, 5415, 447–455.
Teixeira, P. C., Donagemma, G. K., Fontana, A., & Teixeira, W. G. (2017). Manual de métodos de
análise de solo. Embrapa.
Weimin, R., Baojiang, L., Huanjun, L., Hang, D., & Yueyu, S. (2023). Ground penetrating radar
(GPR) identification method for agricultural soil stratification in a typical Mollisols area of
Northeast China. Chinese Geographical Science, 33, 664–678.
Yurt, R., Torpi, H., Kizilay, A., Koziel, S., Pietrenko-Dabrowska, A., & Mahouti, P. (2023). Buried
object characterization by data-driven surrogates and regression-enabled hyperbolic signature
extraction. Scientific Reports, 13, 5717.
Random Forest-Based Fusion
of Proximal and Orbital Remote Sensor
Data for Soil Salinity Mapping
in a Brazilian Semi-arid Region

Silvio R. L. Tavares, Gustavo M. Vasques, Ronaldo P. Oliveira,


Marlon M. Dantas, and Hugo M. Rodrigues

1 Introduction

Salinization, along with erosion, are the two main causes of soil degradation
worldwide. It is estimated that approximately 7.0% of the entire Earth’s surface
is affected by salts, either due to natural processes intrinsic to the soil itself or
because of poor soil-water-plant management in areas affected by problems of salt
accumulation in the soil (Medeiros et al., 2016). Soil salinity is a problem that affects
about 45 million (19.5%) of the 230 million hectares under irrigated cropping in the
globe (Metternicht & Zinck, 2003). The excess of salts severely limits agricultural
production, mainly in arid and semi-arid regions, where about 25% of the irrigated
areas is affected by salts (FAO, 2011). In Brazil, it is estimated that approximately
nine million hectares of soil are affected by the presence of salts (FAO, 2011). Saline
soils occur mainly in Rio Grande do Sul, in the Pantanal region of Mato Grosso and,
predominantly in the semi-arid region of the Northeast (Ribeiro et al., 2003).
According to Barros et al. (2004), salinity in irrigated areas in the Northeast
region of Brazil is a consequence of poor internal soil drainage together with
evaporation greater than precipitation, where excessive evaporation produces the
accumulation of soluble salts and the increase of exchangeable sodium on the
surface of the soil. However, the salinization process can be avoided or slowed

S. R. L. Tavares (✉) · G. M. Vasques · R. P. Oliveira


Embrapa Solos, Rio de Janeiro, Brazil
e-mail: [email protected]; [email protected]; [email protected]
M. Dantas
Instituto Federal do Rio Grande do Norte, Ipanguaçú, Rio Grande do Norte, Brazil
e-mail: [email protected]
H. M. Rodrigues
Universidade Federal Rural do Rio de Janeiro, Rio de Janeiro, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 197
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_14
198 S. R. L. Tavares et al.

down if rainfall is concentrated in sufficient quantities, associated with good soil


permeability and/or an efficient drainage system, thus promoting a natural washing
of the profile (Medeiros, 1998).
The salt-affected regions in areas of high temperature occur by evaporation of
water from the soil, transpiration from plants and transport of salts that are deposited
on the surface. Some agricultural techniques have been proposed to mitigate
these effects, such as the use of water with lower electrical conductivity and the
installation of drainage systems to remove salts. However, methods to diagnose and
monitor the occurrence of salinization after the application of mitigation activities
are challenging, since identifying the occurrence of soil salinization requires soil
samples to be collected for laboratory analysis by conventional laboratory methods
that use saturated soil paste, constituting a very laborious and expensive method that
hinders covering large areas.
Proximal sensors have been proposed as an alternative to monitor areas where
salinization processes occur. Apparent electrical conductivity and apparent mag-
netic susceptibility data have been reported as potential attributes that allow
identifying and mapping the advance or retreat of salinization, since these sensor
attributes are closely related to clay content, moisture, cation exchange capacity
and pH (Lopes & Montenegro, 2019). As an aid to mapping the occurrence of
salinization in irrigated fields, it is also possible to combine data from proximal
sensors with other data sources, such as radar data obtained by satellites (Huang et
al., 2016) that are freely available.
Electrical conductivity measured by electromagnetic induction devices, together
with remote sensing methods, have been promoted as tools to estimate soil salinity
at different scales, to diagnose and monitor irrigated saline areas in arid and semi-
arid regions (Ding & Yu, 2014; Taghizadeh-Mehrjardi et al., 2014; Yao et al., 2015).
The objective of this work was to map the occurrence of soil salinization from
the predictive mapping of salinization levels derived by electrical conductivity
measured in the laboratory as a function of data from proximal and orbital remote
sensors using random forest model.

2 Soil Salinization

The origin of soil salinization can occur by natural processes (primary salinization)
or by human activities (secondary salinization). Before the practice of irrigated
agriculture, the accumulation of salts in the soil profile was essentially the result
of natural processes such as floods, poor natural drainage and evaporation of saline
groundwater. Currently, vast areas are affected by salts due to anthropic actions,
such as irrigation without a drainage system, application of insufficient replacement
irrigation water, use of saline water or a combination of these factors.
The soluble salts normally present in the soil solution of arid and semi-
arid regions include sodium (Na+ ), calcium (Ca2+ ), magnesium (Mg2+ ), potas-
sium (K+ ), chloride (Cl− ), sulfate ([SO4 ]2− ), bicarbonate ([HCO3 ]− ), carbonate
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 199

Table 1 Classification of salt-affected soils


Soil propertya
Classes (Richards, 1954) EC (dS m−1 ) ESP (%) pH
Saline soils >4 <15 <8.5
Alkaline soils <4 >15 >8.5
Saline-alkaline soils >4 >4 >8.5
Classes (Santos et al., 2018)
Not saline soils <4
Saline soils >4
Salic soils >7
a EC electrical conductivity, ESP exchangeable sodium percent

([CO3 ]2− ), borate ([BO3 ]3− ) and nitrate ([NO3 ]− ). Soils affected by salts can be
classified as: saline, sodic and saline-sodic.
A gradual increase in salt accumulation (salinization) in irrigated agricultural
soils can be influenced by several factors, among which:
. Climate: The climatic conditions of the area, such as temperature, evaporation
and precipitation, can affect the salinization rate. The higher the temperatures
and evaporation and the lower the precipitation, the higher the salinization.
. Soil type and landscape position: Some soils are more prone to salinization than
others due to their permeability and ability to retain salts. The position of the soil
in the landscape is also another important factor, as soils in lower positions tend
to accumulate more salts.
. Irrigation water: The chemical quality of the irrigation water affects soil salinity.
Irrigation waters with higher salt contents lead to higher salinization.
. Crop type and management: The type of crop grown, and its management
practices can affect soil salinization. For example, salt-tolerant crops can help
reduce soil salinity levels, while intensive cultivation can lead to increased soil
salinity.
. Land use: Changes in land use, such as converting land from natural vegetation
to agriculture, can also increase soil salinization.
According to Richards (1954), the classification of saline, alkaline and saline-
alkaline soils are characterized by high electrical conductivity, high pH and high
sodium concentration, respectively (Table 1). The Brazilian Soil Classification
System (Santos et al., 2018) proposes an extra class of salic soils with electrical
conductivity greater than 7 dS m−1 .

3 Monitoring Soil Salinity

There are several laboratory methods for determining soil salinity. In general,
the choice of method will depend on the availability of equipment and financial
200 S. R. L. Tavares et al.

resources, as well as the desired precision of the analysis. The most common
methods used in the laboratory are:
. Electrical conductivity in the soil saturation paste: The electrical conductivity of
the soil extract can be measured using a conductivity meter. This is a simple and
widely used technique for assessing soil salinity worldwide.
. Silver chloride titration: This method determines the concentration of chloride
ions in the soil. It is more accurate than measuring electrical conductivity but
requires much more time to perform the analysis.
. Refractometry: Refractometry is a method used to measure the concentration of
soluble salts in a soil extract. It is based on measuring the refractive index of a
solution.
. Chemical analysis for ion determination: Chemical analysis can be used to
measure the concentration of specific ions that cause soil salinization, such as
sodium, calcium, magnesium, potassium, and chloride. This can be done using
atomic absorption spectroscopy, ion chromatography or acid-base titration.
In addition to collecting soil samples in the field and subsequently determining
the salts and/or salinity of the soil, some instruments indirectly determine the soil
salinity in situ quickly and efficiently. There are several field devices that can be
used to indirectly determine or to infer soil salinity, including:
. Portable soil conductivity meters: These devices measure the electrical conduc-
tivity of the soil, which is directly related to soil salinity.
. Portable pH meters: Measuring soil pH can provide useful information about soil
acidity or alkalinity, which can affect nutrient availability and soil salinity.
. Soil test kits: These kits typically include chemical reagents to test the soil for
salts and often include a correspondence table to determine the approximate
concentration of salts in the soil.
Besides field sensors, remote sensors can be used to infer soil salinity, among
which is the C-band radar from the Sentinel-1 satellite. C-band radar is particularly
useful for detecting and mapping soil moisture and salinity because it can penetrate
vegetation and upper soil layers. The Sentinel-1 satellite is equipped with a C-band
Synthetic Aperture Radar (SAR) sensor that provides high-resolution images over
large areas.
To map salt-affected irrigated agricultural soils using remote sensor (for instance,
Sentinel-1) data, researchers typically use machine learning algorithms to predict
soil salinity from remote sensor data and identify areas with high soil salinity. This
approach has been successfully used in several studies and has the advantage of
being economical and efficient, as large areas can be surveyed quickly (Mohammed
et al., 2022). However, it is important to note that interpretation of remote sensing
data requires careful consideration of environmental and management factors that
influence soil salinity, as well as validation with actual field data.
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 201

4 Materials and Methods

Mapping salt-affected irrigated agricultural soils using data from the proximal
sensor EM38-MK2 and the Sentinel-1 satellite C-band included the following
steps:
1. Data collection: Collect field data along transects covering the study area using
the EM38-MK2 conductivity meter and download Sentinel-1 satellite C-band for
the study area.
2. Data pre-processing: Remove outliers, fill in missing data, and transform the data
into a format suitable for analysis.
3. Data fusion: Combine EM38-MK2 proximal sensor data and Sentinel-1 remote
sensor data into a single dataset. This was done by interpolating the EM38-MK2
variables across the study area, followed by merging the resulting EM38-MK2
rasters with the Sentinel-1 C-band raster.
4. Feature extraction: Extract soil electrical conductivity and magnetic susceptibil-
ity data from EM38-MK2, and radar data from Sentinel-1 C-band to the 35 soil
sampling points by spatial overlay.
5. Model selection: Choose an appropriate machine learning model for the data and
problem at hand. In this paper, random forest (Breiman, 2001) was used.
6. Model training: Train the random forest model using the extracted features as
covariates and corresponding soil data, that is, the measured soil salinity levels,
as target variable.
7. Model evaluation: Evaluate the performance of the trained model using appropri-
ate evaluation metrics, including the overall accuracy and Kappa index (Cohen,
1960).
8. Model deployment: Apply the trained model across the study area to generate a
soil salinity level distribution map.
9. Validation: Validate soil salinity level predictions by comparing them to the
actual soil salinity level data measured in the field.

5 Study Area

The study was conducted on a 11-ha smallholder farm at the Baixo Açu Irrigated
Perimeter in the municipality of Alto do Rodrigues, Rio Grande do Norte state,
Northeast Brazil (Fig. 1). The climate type is Aw, according to Köppen-Geiger, with
an average annual rainfall 400 mm and average annual maximum temperature of
34 ◦ C. Due to its semi-arid conditions, the area is subject to natural soil salinization
processes that have increased by irrigated cropping.
202 S. R. L. Tavares et al.

Fig. 1 (a) Location of the study area, EM38-MK2 survey path, and laboratory electrical conduc-
tivity (EClab) sampling points; (b) Picture of the study are showing salt-affected soils. (Source:
Gustavo M Vasques)
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 203

6 Proximal and Remote Sensor Data

The EM38-MK2 (Geonics Limited, Mississauga, Canada) was used for continuous
apparent electrical conductivity (aEC) and apparent magnetic susceptibility (aMS)
readings. Readings were taken along parallel transects covering the study area in
“1 m” (aEC and aMS 1 m) and “0.5 m” (aEC and aMS 0.5 m) coil separation modes
and in vertical dipole orientation, totaling 5168 readings (Fig. 1, black dots). These
four variables were interpolated using variogram analysis followed by ordinary
kriging with 10 m resolution using 70% of the dataset (3617 points), and validated
using 30% of the dataset (1551 points).
Sentinel-1 satellite imagery for the study area was downloaded from the NASA’s
Earth Observing System Data and Information System (https://search.earthdata.
nasa.gov) website as a Level-1 High Resolution Interferometric Wide Swath Ground
Range Detected product, with 10 m spatial resolution and vertical-vertical (VV) and
vertical-horizontal (VH) polarizations.
For measuring field soil salinity in the laboratory, soil core samples were
collected in a 35 points uniform grid (50 × 50 m) (Fig. 1, red triangles), at 0–
10, 10–30 and 30–50 cm depths. Soil samples were analyzed for pH, and electrical
conductivity (EClab) by the saturated paste method, according to Teixeira et al.
(2017). Soil pH was interpolated with 10 m resolution across the study area by
inverse distance weighting, and the maps were used as covariates for soil salinity
level mapping. The soil salinity values (EClab) for the three depths were classified
according to Santos et al. (2018) in the following salinity level classes: Not saline—
EClab <4 dS m−1 ; Saline—4 dS m−1 < EClab <7 dS m−1 ; and Salic—EClab >7 dS
m−1 .
The salinity level derived from EClab was modeled by random forest (Breiman,
2001) and mapped at 10 m resolution, at each depth, as a function of the four EM38-
MK2 proximal sensor variables (aEC 1 m, aEC 0.5 m, aMS 1 m and aMS 0.5 m),
two Sentinel-1 C-bands (VV and VH) and three pH maps (at 0–10, 10–30 and 30–
50 cm). Leave-one-out cross-validation was implemented using the caret package
(Kuhn, 2022) in the R software (R Core Team, 2021). The overall accuracy and the
Kappa index (Cohen, 1960) were used to evaluate the classification results.

7 Results and Discussion

The aEC and aMS maps showed higher values in the central and central-west
portions of the area (Fig. 2a, b, c and d), and had interpolation validation errors
lower than 80 mS m−1 and 0.2%˚ (parts per thousand), respectively. Higher aMS
values also appear in the southwest (Fig. 2c and d). The radar C-band maps do not
show a clear pattern in the area (Fig. 2e and f). The pH maps show high values (>7)
for the entire study area, except for pH at 10–30 and 30–50 cm that had lower values
in the southwest due to a couple of low-pH measurements (Fig. 2h, i and j). High
204 S. R. L. Tavares et al.

pH values agree with the presence of salts in the soil, especially Na and Ca in this
region. Overall, the pH increases at all depths from the southwest to the northeast.
In the predicted soil salinity level maps at the three depths (Fig. 3), the not saline
and saline soils occur in the center and northeast parts of the area, while salic soils
occur in the northwest and south-southwest parts. In the center, higher aEC and aMS
values are found (Fig. 2a, b, c and d), whereas in the northeast higher pH values
occur (Fig. 2g, h and i). The central-northeast portion of the area has the highest
elevation, which decreases in both the northwest and south-southwest directions.
As such, the irrigation water that is applied in the upper terrain (central-northeast)
carries out the salts present in the soil towards the lower terrain (northwest and
south-southwest), explaining the spatial trends of the not saline to salic soils in the
area.

a b

Fig. 2 EM38-MK2 derived apparent electrical conductivity (aEC) with 1 m (a) and 0.5 m (b) coil
separation, and apparent magnetic susceptibility (aMS) with 1 m (c) and 0.5 m (d) coil separation;
Sentinel-1 Level-1 High Resolution Interferometric Wide Swath Ground Range Detected image
with vertical-vertical (VV) (e) and vertical-horizontal (VH) (f) polarization; Interpolated soil pH
maps at 0–10 (g), 10–30 (h) and 30–50 cm (i)
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 205

d e

Fig. 2 (continued)

The overall accuracy of the random forest model predictions at the three layers,
derived from leave-one-out cross-validation, were around 0.7, while Kappa values
were close to 0.5 (Table 2). The pH covariates for all depths were the most important
covariates in all random forest models, while aMS was the second most important
for the 10–30 and 30–50 cm layers.
In the Baixo Açu Irrigated Perimeter, same region of the present study, Barreto et
al. (2023) identified the best spectral indices to predict soil salinity in a 1500 ha area.
They compared Landsat 8 Operational Land Imager (OLI) (30 m spatial resolution)
and Sentinel-2 MultiSpectral Instrument (10 m resolution) images, and found that
the derived Salinity Index 1, when applied in areas with Normalized Difference
Vegetation Index <0.33 (bare soils), had the highest correlation (0.80) with soil
salinity. The soils of this region have a saline horizon at the surface and a saline-
sodic horizon at the subsurface, and the areas with most salt-affected soils are the
concave ones with deficient natural drainage (Justo et al., 2021).
206 S. R. L. Tavares et al.

g h

Fig. 2 (continued)

In a similar study, Gharsallah et al. (2022) used EM38-MK2 electrical conduc-


tivity and Landsat 8 OLI spectral indices to map soil salinity in a 665 ha olive
orchard in Tunisia. The in situ EM38-MK2 measurements were adjusted to electrical
conductivity values measured by saturated paste at six depths (0–20, 20–40, 40–
60, 60–80, 80–100 and 0–100 cm). They kriged soil salinity and obtained R2 of
0.86 and 0.89 for the 0–20 and 0–100 cm depths, respectively, concluding that the
combination of proximal and remote sensors is an effective approach for monitoring
the soil salinity in semi-arid conditions. Hoa et al. (2019) mapped soil salinity in
Vietnam using C-band data from the Sentinel-1 satellite and compared five machine
learning models: multilayer perceptron neural networks, radial basis function neural
networks, Gaussian processes, support vector machines, and random forest. The
Sentinel-1 images were able to identify saline soils using machine learning methods,
and argue that Gaussian processes combined with Sentinel-1 data can be used to
monitor soil salinity at 10 m spatial resolution and at every 6 days, showing the
potential of remote sensing for soil salinity mapping. These studies agree with our
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 207

a b

Fig. 3 Soil salinity level maps predicted using random forest at 0–10 (a), 10–30 (b) and 30–50 cm
(c)

study, showing that EM38-MK2 combined with Sentinel-1 imagery can be used to
efficiently map soil salinity.

8 Conclusions

The aEC and aMS data from EM38-MK2 combined with Sentinel-1 C-band radar
data allowed to map soil salinity levels in an irrigated cropland with good (70%)
accuracy using random forest.
The pH maps were essential for mapping soil salinity levels at the three layers
down to 50 cm.
208 S. R. L. Tavares et al.

Table 2 Variable importance, overall accuracy and Kappa index of the random forest soil salinity
level models
0–10 cm 10–30 cm 30–50 cm
Variablea Importance Variablea Importance Variablea Importance
pH 0–10 cm 100.0 pH 0–10 cm 100.0 pH 30–50 cm 100.0
C-band VH 10.9 pH 10–30 cm 62.6 pH 0–10 cm 57.4
pH 10–30 cm 9.2 aMS 1 m 48.3 aMS 1 m 52.1
aMS 0.5 m 5.3 aEC 0.5 m 44.6 C-band VV 27.4
aMS 1 m 5.0 pH 30–50 cm 43.5 aEC 0.5 m 25.4
aEC 0.5 m 3.8 aEC 1 m 32.9 pH 10–30 cm 19.9
pH 30–50 cm 3.5 C-band VH 30.4 aEC 1 m 13.1
aEC 1 m 3.0 aMS 0.5 m 20.2 aMS 0.5 m 12.0
C-band VV 0 C-band VV 0 C-band VH 0
Error metric Value Error metric Value Error metric Value
Accuracy 0.74 Accuracy 0.71 Accuracy 0.66
Kappa index 0.59 Kappa index 0.52 Kappa index 0.43
a aEC apparent electrical conductivity from EM38-MK2, aMS apparent magnetic susceptibility

from EM38-MK2, C-band Sentinel-1 Level-1 High Resolution Interferometric Wide Swath
Ground Range Detected band, VH vertical-horizontal polarization, VV vertical-vertical polar-
ization

References

Barreto, A. C., Neto, M. F., Oliveira, R. P., Moreira, L. C. J., Medeiros, J. F., & Sá, F. V. S. (2023).
Comparative analysis of spectral indexes for soil salinity mapping in irrigated areas in a semi-
arid region, Brazil. Journal of Arid Environments, 209, 104888.
Barros, M. F. C., Fontes, M. P. F., & Alvarez, V. H. (2004). Recuperação de solos afetados por
sais pela aplicação de gesso de jazida e calcário no Nordeste do Brasil. Revista Brasileira
Engenharia Agrícola e Ambiental, 8, 59–64.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Cohen, J. A. (1960). Coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 20, 37–46.
Ding, J., & Yu, D. (2014). Monitoring and evaluating spatial variability of soil salinity in dry
and wet seasons in the Werigan-Kuqa Oasis, China, using remote sensing and electromagnetic
induction instruments. Geoderma, 235–236, 316–322.
FAO (Food and Agriculture Organization of the United Nations). (2011). The state of the world’s
land and water resources for food and agriculture. FAO.
Gharsallah, M. E., Aichi, H., Stambouli, T., Rabah, Z. B., & Hassine, H. B. (2022). Assessment
and mapping of soil salinity using electromagnetic induction and Landsat 8 OLI remote sensing
data in an irrigated olive orchard under semi-arid conditions. Soil and Water Research, 17, 15–
28.
Hoa, P. V., Giang, N. V., Binh, N. A., Hai, L. V. H., Pham, T., Hasanlou, M., & Bui, D. T. (2019).
Soil salinity mapping using SAR Sentinel-1 data and advanced machine learning algorithms:
A case study at Ben Tre Province of the Mekong River Delta (Vietnam). Remote Sensing, 11,
128.
Huang, J., Prochazka, M. J., & Triantafilis, J. (2016). Irrigation salinity hazard assessment and risk
mapping in the lower Macintyre Valley, Australia. Science of the Total Environment, 551–552,
460–473.
Random Forest-Based Fusion of Proximal and Orbital Remote Sensor Data for. . . 209

Justo, J. F. A., Barreto, A. C., Silva, J. F., Neto, M. F., Sá, F. V. S., & Oliveira, R. P. (2021).
Identification and diagnosis of salt-affected soils in the Baixo-Açu irrigated perimeter, RN,
Brazil. Revista Brasileira de Engenharia Agrícola e Ambiental, 25, 480–484.
Kuhn, M. (2022). caret: Classification and regression training. R package version 6.0-92. https://
CRAN.R-project.org/package=caret
Lopes, I., & Montenegro, A. A. D. A. (2019). Spatialization of electrical conductivity and physical
hydraulic parameters of soils under different uses in an alluvial valley. Revista Caatinga, 32,
222–233.
Medeiros, J. F. (1998). Manejo da água de irrigação salina em estufa cultivada com pimentão.
Piracicaba: ESALQ/USP. Doctoral Dissertation.
Medeiros, J. F., Gheyi, H. R., Costa, A. R. F. C., & Tomaz, H. V. Q. (2016). Manejo do sistema
solo-água-planta em solos afetados por sais. In H. R. Gheyi, N. S. Dias, C. F. Lacerda, & E.
G. Filho (Eds.), Manejo da salinidade na agricultura: Estudo básico e aplicados (2nd ed., pp.
319–335). INCTSal.
Metternicht, G. I., & Zinck, J. A. (2003). Remote sensing of soil salinity: Potentials and constraints.
Remote Sensing of Environment, 85, 1–20.
Mohammed, M., Mahmoud, A. E., & Almolhem, Y. (2022). Applications of electromagnetic
induction and electrical resistivity tomography for digital monitoring and assessment of the
soil: A case study of Al-Ahsa Oasis, Saudi Arabia. Applied Sciences, 12, 1–17.
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for
Statistical Computing.
Ribeiro, M. R., Freire, F. J., & Montenegro, A. A. A. (2003). In N. Curi, J. J. Marques, L.
R. G. Guilherme, J. M. Lima, A. S. Lopes, & V. H. Alvarez (Eds.), Solos holomórficos no
Brasil: Ocorrência, gênese, classificação, uso e manejo sustentável (pp. 165–208). Sociedade
Brasileira de Ciência do Solo.
Richards, L. A. (Ed.). (1954). Diagnosis and improvement of saline and alkali soil (p. 60). United
States Department of Agriculture. USDA Agricultural Handbook.
Santos, H. G., Jacomine, P. K. T., Anjos, L. H. C., Oliveira, V. A., Lumbreras, J. F., Coelho, M. R.,
Almeida, J. A., Filho, J. C. A., Oliveira, J. B., & Cunha, T. J. F. (2018). Sistema Brasileiro de
Classificação de Solos (5th ed.). Embrapa.
Taghizadeh-Mehrjardi, R., Minasny, B., Sarmadian, F., & Malone, B. P. (2014). Digital mapping
of soil salinity in Ardakan region, Central Iran. Geoderma, 213, 15–28.
Teixeira, P. C., Donagemma, G. K., Fontona, A., & Teixeira, W. G. (2017). Manual de métodos de
análise de solo, 3rd ed. revised ed. Embrapa.
Yao, R. J., Yang, J. S., Wu, D. H., Xie, W. P., Cui, S. Y., Wang, X. P., Yu, S. P., & Zhang, X.
(2015). Determining soil salinity and plant biomass response for a farmed cropland using the
electromagnetic induction method. Computers and Electronics in Agriculture, 19, 241–253.
The Particle Size Causes a Change in the
Determination of Soil Color Via the Nix
Pro 2 Sensor

Viviane Flaviana Condé, Thays Vieira Bueno, Jéssica Ribeiro Oliveira,


Anifo Soares Mamudo Ibraimo, Marcio Rocha Francelino,
and Elpídio Inácio Fernandes-Filho

1 Introduction

Soil color is used for soil identification and classification, as well as for estimating
various soil properties such as the presence of iron, organic matter, drainage
conditions, and texture (Demattê et al., 2011; Moritsuka et al., 2014; Santos et
al., 2018). It is used as a differentiating property in the classification of soils at
the suborder categorical level for Oxisols, Argisols, and Nitosols in the Brazilian
Soil Classification System (SiBCS) (Santos et al., 2018) due to the morphological
character of this system. During the grading of a soil profile, the color is determined
in the field by visual comparison of fluorescent soil samples using the Munsell
color system as a standard (Munsell graphic—MSCC) (Munsell, 1905; Santos et
al., 2018).
Defining soil color visually is a subjective analysis and is subject to uncontrolled
effects such as ambient light conditions, the angle of incidence of sunlight, the
soil surface being evaluated, sample moisture, and differences in the perception of
observers. These conditions can compromise the results, resulting in divergences in
the classification of soils (Mancini et al., 2020; Moritsuka et al. 2014, 2019).
Several studies have been conducted to compare the visual method of color
determination with the use of spectral measurement instruments, which allow for
precise color readings and eliminate uncontrolled variables, minimizing errors,
making the process more efficient, simple, and fast. This highlights the importance
of developing field and laboratory instruments and analytical methods that allow
for objective and precise determinations (Demattê et al., 2011; Holman et al., 2018;
Mancini et al., 2020; Moritsuka et al., 2014, 2019; Silva et al., 2018).

V. F. Condé (✉) · T. V. Bueno · J. R. Oliveira · A. S. M. Ibraimo · M. R. Francelino ·


E. I. Fernandes-Filho
Universidade Federal de Viçosa (UFV), Viçosa, Minas Gerais, Brazil
e-mail: [email protected]; [email protected]; [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 211
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_15
212 V. F. Condé et al.

Spectral measuring instruments make it possible to accurately determine colors,


and there are many color sensors. The Nix Pro 2 is a low-cost, proximal, active color
sensor that emits a light beam with a spectrum in the visible range (“Nix Sensor”,
2021). It is necessary to improve techniques using sensors for color determination
in tropical soils and sample preparation. The objective of this work was to evaluate
the influence of sample particle size on the determination of soil color using the Nix
Pro 2 sensor.

2 Methodology

Eighteen samples of soil profiles were collected in the states of Goiás and Tocantins,
and each sampling point was classified according to the Brazilian Soil Classification
System (SiBCS) (Santos et al., 2018) (Table 1).
The soil sample was air-dried, ground in an agate mortar, and sieved in a stainless
steel sieve with openings of 2 mm (10 mesh), 0.297 mm (50 mesh), 0.149 mm (100
mesh), and 0.074 mm (200 mesh). The entire volume of soil was crushed until it
had completely passed through the sieves, in order to obtain a gradient of the mean
diameter of the fractions for the 18 samples under study. Nine grams of each sample
were put in a bottle with a height of 23 millimeters and a diameter of 25 millimeters
in order to allow a minimum depth of two centimeters of soil for analysis. The
surface was covered with a plastic PVC food film (15 μm). The experiment was
conducted in an 18 × 4 factorial (18 soil samples and 4 particle sizes) with 15
replications, totaling 1080 analyses.
The Nix™ Pro 2 Color Sensor is an active-type proximal sensor with a light-
emitting diode (LED) semiconductor. When energized, it emits a beam of light
with a spectrum in the visible region. In the lower part of the equipment, there
is a cavity 10 mm deep and 15 mm in diameter with 2.5 mm edges that come
into contact with the sample to avoid interference from external light sources
and also prevent the light from the sensor from dissipating beyond the analysis
perimeter. Calibration was performed by the manufacturer, and the operator defined
the observer’s (reading angle) and lighting settings (“Nix Sensor”, 2021). The
sensor settings were configured so that all samples were analyzed under the same
conditions of light (D50), opening angle (10◦ ), and humidity, with 15 repetitions of
readings.
Colors are formed by combining primary and secondary colors. There are several
color systems, and the RGB system is composed of three visible base colors: red
(R), green (G), and blue (B). Each one occupies an axis in a three-dimensional
space, where the hue varies for each color component. With the three axes, the origin
expresses the color black (0,0,0), and the vertex (1,1,1) represents the color white.
Each point in space expresses a distinct color that is formed from the combination of
the three base colors. In this study, RGB values were used to determine soil colors
in the Munsell system.
Table 1 Texture classification and soil classes of soil profiles in Brazilian Soil Classification System (SiBCS) and World reference base for soil resources
(WRB)
Texture
Sand Silt Clay
Sample Soil profile Depth (cm) (g Kg-1) Textural class SiBCS WRB
1 Btf2 78–120+ 547 251 202 Sand loam PLINTOSSOLO HÁPLICO Distrófico Plinthosols
2 Btcf 70–122 113 211 676 Clay PLINTOSSOLO PÉTRICO Concrecionário Plinthosols
3 Bwc3 100–140 157 210 633 Clay PLINTOSSOLO PÉTRICO Concrecionário Plinthosols
4 Bi 10–23 476 342 182 Sand loam CAMBISSOLO HÁPLICO Tb Distrófico Cambisols
5 C3 105–160+ 948 12 40 Sand NEOSSOLO QUARTZARÊNICO Órtico Arenosols
6 Bw 33–57 752 91 157 Silt loam LATOSSOLO AMARELO Distrófico Ferralsols
7 Bwc2 105–140 607 116 277 Sand loam LATOSSOLO AMARELO Distrófico Ferralsols
8 Bt 49–93 158 233 609 Clay ARGISSOLO VERMELHO Eutrófico Acrisols
9 Bif1 62–100 497 261 242 Loam CAMBISSOLO HÁPLICO Tb Distrófico Cambisols
10 Bt2 69–115 120 207 673 Clay NITOSSOLO VERMELHO Eutroférrico Nitisols
11 Bw2 86–132 330 181 489 Clay LATOSSOLO VERMELHO Acriférrico Ferralsols
12 CA 65–90 935 23 42 Sand NEOSSOLO QUARTZARÊNICO Hidromórfico Arenosols
13 2Ab 90–110 871 89 40 Sand NEOSSOLO QUARTZARÊNICO Hidromórfico Arenosols
14 Bi1 22–56 183 250 567 Clay CAMBISSOLO HÁPLICO Tb Distrófico Cambisols
The Particle Size Causes a Change in the Determination of Soil Color Via. . .

15 Bi2 56–80 217 274 509 Clay CAMBISSOLO HÁPLICO Tb Distrófico Cambisols
16 Bic 7–38 405 454 141 Loam PLINTOSSOLO PÉTRICO Concrecionário Plinthosols
17 Btg1 41–53 71 447 482 Silt clay ARGISSOLO ACINZENTADO Ta Distrófico Acrisols
18 3Big4 76–88 181 360 459 Clay ARGISSOLO ACINZENTADO Ta Distrófico Acrisols
213
214 V. F. Condé et al.

The RGB values obtained by Nix Pro 2 were used to calculate the hue, value and
chroma. From these three variables, it was possible to calculate the corresponding
variable in the Munsell color system.These analyses were conducted using the R
software (R Core Team, 2020) and the packages: aqp (Beaudette et al., 2013), dplyr
(Wickham et al., 2021), modest (Pallmann, 2017), patchwork (Pedersen, 2020),
readr (Wickham et al., 2020b), WriteXLS (Schwartz, 2021).
Descriptive statistical analysis was performed using the Scott-Knott mean group-
ing test at a 0.05 significance level, with the Sisvar software (Ferreira, 2019) for
the variables R, G, B, Hue, Value, and Chroma. Graphs were plotted using the R
software (R Core Team, 2020) and the packages: dplyr (Wickham et al., 2021),
ggplot2 (Wickham et al., 2020a), readxl (Wickham & Bryan, 2019).

3 Results and Discussion

Color in the RGB system is a combination of three variables (red, green, and blue),
and Munsell color, which in this study is the result of mathematical models based
on RGB data, is an arrangement of three other variables (hue, value, and chroma).
By the double factorial analysis (18 soil samples × 4 particle sizes), we observed
that the soil samples factor influenced the particle size factor on the red (R), green
(G), and blue (B) variables. That is, the two factors under study are interrelated. It
is possible to see when the soil sample’s means differed using the Scott-Knott test
at 0.05 significance (Tables 2, 3, and 4).
For the red band (R) (Table 2), in the breakdown of soil samples (that is, within
each soil sample, the particle size factor was analyzed), it was observed that, in
sample 3, the averages of 10 and 50 mesh formed one group. In sample 11, the
averages of 100 and 200 mesh were grouped (0.05 of significance by Scott-Knott)
regarding the influence of particle size in soil samples. The other soil samples
showed different averages, with no groupings within particle size.
It was observed in Table 2 that there was a tendency to increase the mean value
of R with the particle size reduction. The lowest amplitude observed between the
means of the four analyzed particle sizes was 9.26 (in soil sample 16), and the
highest amplitude was 67.6 (in soil sample 12). Samples 12 and 13 showed the most
significant discrepancy (amplitude >59) between the 10 and 200 mesh values. Soil
sample 12 showed an R average of 79.66 in 10 mesh and 147.26 in 200 mesh. Soil
sample 13 presented an R average of 80.13 in 10 mesh and 139.33 in 200 mesh.
In the green band (G), the averages of the soil samples differed among themselves
in terms of particle size, except in soil sample 10, where the averages of 50 and
100 mesh were grouped (0.05 significance by Scott-Knott) (Table 3). The smallest
amplitude between the averages of 10, 50, 100, and 200 mesh was 6.46 (in soil
sample 1), and the highest amplitude was 63.60 (in soil sample 12). For G averages,
we verified that soil samples 5, 12, and 13 (NEOSSOLOS QUARTZARÊNICOS)
had greater amplitude (>55) between the means of 10 and 200 mesh (Table 4).
Sample 5 had an average of G of 120.86 in 10 mesh and 175.93 in 200 mesh, sample
The Particle Size Causes a Change in the Determination of Soil Color Via. . . 215

Table 2 Means of red (R) values, obtained using the Nix™ Pro 2 color sensor from soil samples
in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 180.60 N a 182.46 M b 184.40 M c 190.80 M d
2 167.40 K a 172.20 K b 177.86 K c 176.60 I d
3 154.73 I a 154.40 H a 170.73 I b 172.00 G c
4 185.53 O a 188.00 O b 192.00 N c 196.33 N d
5 161.80 J a 184.93 N b 197.00 O c 207.26 P d
6 140.06 G a 144.60 E b 161.60 H c 180.80 K d
7 174.73 L a 170.53 J b 180.13 L c 191.40 M d
8 134.53 F a 147.60 F b 156.86 F c 165.33 F d
9 193.26 P a 197.60 P b 201.20 P c 206.33 O d
10 110.26 B a 129.66 B b 136.80 C c 148.26 C d
11 124.80 D a 133.93 C b 147.93 D c 147.40 B c
12 79.66 A a 89.33 A b 122.53 B c 147.26 B d
13 80.13 A a 89.60 A b 114.33 A c 139.33 A d
14 122.53 C a 144.86 E b 152.53 E c 162.60 D d
15 128.46 E a 136.66 D b 157.93 G c 163.33 E d
16 178.80 M a 175.60 L b 177.26 K c 184.86 L d
17 154.86 I a 161.20 I b 175.93 J c 179.60 J d
18 150.53 H a 153.46 G b 162.13 H c 174.20 H d
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

12 had an average of G of 64.46 in 10 mesh and 128.06 in 200 mesh, and sample 13
had an average of 64.53 in 10 mesh and 120.73 in 200 mesh.
In blue (B), there was a grouping of means of 10 and 200 mesh for soil sample
2 and a grouping of 10 and 50 mesh for soil sample 4 (Table 4). The other soil
samples obtained averages of the analyzed particle sizes distinct from each other
at 0.05 significance by Scott-Knott. In variable B, the lowest amplitude observed
between the averages of the studied particle sizes was 5.74 (in a soil sample 1),
and the highest amplitude was 57.33 (in soil sample 12). For the averages of B, we
verified that soil samples 5, 12, and 13 also presented a more significant discrepancy
(amplitude >56) between the averages of 10 and 200 mesh (Table 4). Sample 5
obtained an average of B of 91.06 in 10 mesh and 147.80 in 200 mesh, and sample
12 presented an average of B of 58.00 in 10 mesh.
It was possible to notice from the R, G, and B data that the particle size changed
the spectral behavior of the soil samples for frequencies in the visible region.
Therefore, to determine the variables: hue, value, and chroma, we used these RGB
data. So, these three variables make up the Munsell color system.
Hue refers to the dominant spectrum of color. The value refers to the hue of
the color, how much black or white the color contains, ranging from zero (absolute
black) to ten (whole white). Chroma is an attribute referring to the relative purity
of the color, and it defines the difference between a pure hue and a pure gray. So
216 V. F. Condé et al.

Table 3 Means of green (G) values, obtained using the Nix™ Pro 2 color sensor from soil samples
in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 134.40 L a 136.80 M c 134.86 K b 140.86 K d
2 108.06 G a 112.93 I c 121.46 G d 111.66 C b
3 97.73 F a 104.26 H b 121.86 G c 123.33 H d
4 146.80 N a 147.93 O b 152.00 N c 157.06 P d
5 120.86 I a 147.73 O b 161.46 O c 175.93 Q d
6 109.73 H a 114.46 J b 129.46 I c 151.53 O d
7 126.53 K b 122.46 L a 132.40 J c 145.46 N d
8 94.73 E a 103.20 G b 110.53 F c 120.00 F d
9 164.60 O a 166.73 P b 172.46 P c 178.33 R d
10 64.26 A a 77.26 C b 77.40 A b 92.46 A c
11 73.26 B a 81.00 D b 95.53 B c 95.00 B d
12 64.46 A a 73.60 B b 103.93 D c 128.06 I d
13 64.53 A a 73.00 A b 96.26 C c 120.73 G d
14 77.86 C a 98,00 F b 105.73 E c 115.33 E d
15 80.80 D a 88.40 E b 110.93 F c 112.66 D d
16 145.40 M d 137.66 N a 138.13 L b 143.53 L c
17 121.93 J a 122.86 L b 139.20 M c 144.60 M d
18 121.00 I b 120.33 K a 127.60 H c 140.33 J d
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

a color with a chroma of 1 would be very close to gray since its hue represents its
maximum chroma (Munsell, 1905; “Munsell Color Copyright”, 2021; Santos et al.,
2015).
We verified that particle size affected the variable hue (H) in the breakdown of
soil samples; only the averages of soil samples 2, 4, and 8 were grouped regardless
of particle size (Table 5).
Soil samples 1, 3, and 12 formed one group with 10 and 50 mesh averages.
Samples 14 and 15 grouped the particle size averages into two groups: a group for
10 and 50 mesh and a group containing 100 and 200 mesh averages. Soil samples
7 and 9 created one group of 50, 100, and 200 mesh averages into a single group.
Samples 11, 13, and 17 formed one 100 and 200 mesh averages group. Soil samples
5, 6, and 16 did not group the means of the different particle sizes, so the analyzed
particle sizes differed. For soil sample 18, only the averages of 10 and 200 mesh
were grouped (Table 5).
In variable H, the lowest amplitude between the averages of the four particle sizes
was observed at 0.10 (soil sample 2), while the highest amplitude was 2.75 (in soil
sample 16).
In value (V), there was a similarity in the averages in sample 18 for the
particle sizes of 10 and 50 mesh and in sample 16 for the averages of 10 and
200 mesh (Table 6). The other soil samples showed particle size averages differing
The Particle Size Causes a Change in the Determination of Soil Color Via. . . 217

Table 4 Means of blue (B) values, obtained using the Nix™ Pro 2 color sensor from soil samples
in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 109.60 M b 112.00 O c 106.26 M a 109.73 J b
2 75.86 H a 80.93 H b 90.86 G c 75.93 C a
3 63.80 F a 76.20 G b 90.66 G c 93.26 G d
4 111.53 N a 111.13 N a 114.40 O b 119..53 L c
5 91.06 K a 119..33 P b 132.13 P c 147.80 N d
6 87.33 i a 90.40 L b 102.26 L c 124..60 M d
7 89.66 J b 87.46 J a 95.06 J c 108.00 I d
8 71.60 G a 76.00 G b 82.00 E c 91.73 F d
9 140.46 P a 141.80 Q b 149.40 Q c 155.00 O d
10 49.66 A a 58.33 A c 54.86 A b 67.80 A d
11 55.86 D a 59.60 B b 70.53 B c 69.20 B d
12 58.00 E a 66.60 E b 92.86 H c 115.33 K d
13 58.06 E a 65.13 D b 86.00 F c 109.40 J d
14 52.53 B a 69.53 F b 75.66 C c 84.73 E d
15 55.00 C a 61.46 C b 79.73 D c 78.66 D d
16 114.20 O c 108.86 M a 110.00 N b 115.40 K d
17 87.46 I b 84.13 I a 97.80 K c 104.73 H d
18 92.26 L b 89.40 K a 93.60 I c 104.66 H d
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

in all mesh measurements analyzed. Soil samples 5, 12, and 13 obtained the
most discrepant values (amplitude >2.00) between 10 and 200 mesh averages. The
smallest amplitude between the averages of 10, 50, 100, and 200 mesh was 0.26 (in
soil sample 1), and the highest amplitude was 2.56 (in soil sample 12). In sample 5,
we observed an average of V in 10 mesh of 5.30 and 200 mesh of 7.30. Sample 12
presented an average of V in 10 mesh equal to 2.83 and 200 mesh of 5.39. Sample
13 had an average V of 10 mesh at 2.84 and 200 mesh at 5.10.
In chroma (C), soil samples 4, 7, 11, and 17 formed one group with 50 and 100
mesh averages (Table 8). In soil samples 9 and 12, the averages of 10 and 50 mesh
were pooled; in soil sample 6, the averages of 10 and 200 mesh were grouped (Table
7).
In soil sample 15, the averages formed one group of 10, 50, and 200 mesh; in
soil sample 18, only 100 and 200 mesh averages were grouped. The means of nine
soil samples (1, 2, 3, 5, 8, 10, 13, 14, and 16) differed in all particle sizes (Table 7).
In variable C, the lowest amplitude observed between the averages of the
different analyzed particle sizes was 0.18 (in soil sample 4), while the highest
amplitude was 1.49 (in soil sample 10).
The satisfactory performance of the color sensors is related to the uniformity of
the analyzed material. Passing the entire sample through a sieve is an alternative to
homogenizing. However, in the grinding process, aggregates, quartz grains from
218 V. F. Condé et al.

Table 5 Means of hue values (H) obtained by mathematical conversions from RGB values of soil
samples in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 13.46 B a 13.53 D a 13.83 D b 14.26 C c
2 13.34 B a 13.28 C a 13.35 C a 13.25 B a
3 13.97 C a 13.82 E a 14.52 E c 14.31 C b
4 17.52 I a 17.51 L a 17.64 L a 17.74 K a
5 15.76 F a 16.24 I b 16.82 J c 17.36 J d
6 16.57 G a 17.15 K b 17.38 K c 17.91 K d
7 15.82 F b 15.45 H a 16.00 I a 16.36 H a
8 14.54 E a 14.56 G a 14.44 E a 14.48 C a
9 17.18 H b 16.75 J a 16.67 J a 16.85 I a
10 11.60 A a 11.77 A a 11.78 A a 12.46 A b
11 11.54 A a 12.25 B b 12.95 B c 13.13 B c
12 14.39 D a 14.35 F a 15.59 H b 16.05 G c
13 14.21 D a 14.67 G b 15.38 G c 15.47 F c
14 14.35 D a 14.47 F a 14.74 F b 14.68 D b
15 13.95 C a 14.02 E a 14.89 F b 14.82 D b
16 17.90 J d 16.24 I c 15.72 H b 15.15 E a
17 18.71 L b 18.27 N a 18.90 N c 18.98 M c
18 18.44 K c 18.03 M a 18.26 M b 18.62 L c
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

sand particles, and other coarse minerals from the soil will reduce particle size
and may alter the perception of colors. This fact is evident in samples 12 and 13
(Fig. 1), which were collected from a profile of NEOSSOLO QUARTZARÊNICO
Hidromórfico (Table 8). At first, these samples 12 and 13 should have a darkened
color on the surface horizon and show light colors with the grinding of the grains.
Changes in particle size lead to changes mainly in the chroma variable (Table 8).
The influence of soil granulometry on color is related to the specific surface
of the soil components and the spectral response. The uniformity of the analyzed
material influences the performance of the sensors. Sifting the soil is an alternative to
homogenizing. However, in the breaking process, quartz grains from sand particles
and other soil minerals will have their particle size reduced, which may change the
perception of colors. Studies indicate that in sandy-textured soils, minor variations
in organic matter can lead to significant variations in color (Demattê et al., 2011;
Moritsuka et al., 2014).
The Munsell color system is composed of discrete intervals, so significant
changes in chroma can lead to a change in hue (Munsell, 1905; Munsell Color
Copyright, 2021). The hue of the soils has its variation mainly related to the hematite
and goethite contents. The influence of soil particle size on its color is associated
with the specific surface of the soil components and the spectral response. Studies
indicate that in soils with a sandy texture, minor variations in organic matter can
The Particle Size Causes a Change in the Determination of Soil Color Via. . . 219

Table 6 Means of value (V) obtained by mathematical conversions from RGB values of soil
samples in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 5.88 N a 5.97 M c 5.93 L b 6.16 K d
2 5.02 I a 5.21 I b 5.50 I c 5.23 E d
3 4.60 G a 4.78 G b 5.43 G c 5.48 H d
4 6.25 P a 6.31 O b 6.47 M c 6.65 N d
5 5.30 L a 6.29 N b 6.79 N c 7.30 O d
6 4.77 H a 4.95 H b 5.54 J c 6.36 M d
7 5.58 M b 5.43 K a 5.80 K c 6.28 L d
8 4.29 F a 4.67 F b 4.97 F c 5.32 F d
9 6.85 Q a 6.95 P b 7.16 O c 7.37 P d
10 3.19 B a 3.78 B b 3.88 A c 4.40 A d
11 3.62 C a 3.93 C b 4.47 D d 4.45 B c
12 2.83 A a 3.22 A b 4.45 C c 5.39 G d
13 2.84 A a 3.20 A b 4.15 B c 5.10 C d
14 3.69 D a 4.49 E b 4.78 E c 5.16 D d
15 3.84 E a 4.15 D b 4.98 F c 5.09 C d
16 6.15 O c 5.91 L a 5.94 K b 6.17 K c
17 5.24 K a 5.33 J b 5.93 K c 6.12 J d
18 5.18 I a 5.19 I a 5.48 H b 5.96 I c
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

lead to significant variations in color. (Demattê et al., 2011; Moritsuka et al., 2014;
Santos et al. 2015, 2018). When changing the particle size of the samples, there was
a difference in the color results expressed in all variables, as was evident in samples
12 and 13.
Of the 18 samples under study, seven soil samples have color as the second
categorical level. Despite the variability caused by particle size, only two samples
classified with color at the second categorical level (samples 7 and 8) showed hue
variation. Red soils have 10R and 2.5YR hues, red-yellow soils have 5YR hues, and
yellow soils have 7.5YR and 10YR hues (SANTOS, H. G. dos et al., 2018). Sample
8 (ARGISSOLO VERMELHO Eutrófico) was classified by sensor method as red-
yellow, hue 5YR independent of particle size. Sample 7 (LATOSSOLO AMARELO
Distrófico) was classified as yellow (7.5YR hue) for the 200 mesh particle size
and red-yellow (5YR hue) for the other particle sizes. It is worth noting that the
verification of colors in the field is carried out on samples with varying degrees
of moisture, and those in this experiment with Nix ™ Pro 2 were carried out on
air-dried soil. When the soil is wetted, color perception is altered due to water
molecules’ absorption of light energy (Barbosa et al., 2019; Novo & Ponzoni, 2001;
Santos et al., 2015).
The averages of hue, value, and chroma of the 15 repetitions of each soil
sample were converted to colors in the Munsell system (Table 2). We observed
220 V. F. Condé et al.

Table 7 Means of chroma values (C) obtained by mathematical conversions from RGB values of
soil samples in four particle sizes
Particle size (Granulometry)
Sample 2 mm (10 mesh) 0.297 mm (50 mesh) 0.149 mm (100 mesh) 0.074 mm (200 mesh)
1 4.49 I b 4.43 H a 4.90 J c 5.01 M d
2 6.01 M c 5.96 Q b 5.61 N a 6.60 Q d
3 5.99 M d 5.08 O c 5.03 L b 4.96 L a
4 4.34 H a 4.50 I c 4.52 G c 4.43 I b
5 4.36 H d 3.87 F c 3.72 E b 3.32 E a
6 3.17 C a 3.25 D b 3.52 D c 3.19 D a
7 5.21 L c 5.13 P b 5.16 M b 4.97 L a
8 3.94 F a 4.55 J b 4.77 I d 4.64 J c
9 2.96 B d 3.15 C d 2.91 C b 2.86 C a
10 4.37 H a 5.01 N b 5.86 O d 5.49 P c
11 4.89 K a 5.15 P b 5.17 M b 5.21 N c
12 1.36 A a 1.36 A a 1.72 B b 1,86 B c
13 1.39 A a 1.47 B b 1.63 A c 1.77 A d
14 4.65 J a 4.83 L b 4.88 J c 4.91 K d
15 4.89 K a 4.93 M a 4.93 K a 5.34 O b
16 3.75 E a 3.99 G b 4.05 F c 4,20 G d
17 4.01 G a 4.61 K c 4.58 H c 4.36 H b
18 3.45 D a 3.83 E b 4.07 F c 4.06 F c
Means followed by the same letter do not differ from each other (Scott-Knott test, 5%).

Fig. 1 Soil samples in four particle sizes (granulometry)

that there were minor variations in value and chroma for most of the soil samples
soil in the four particle sizes. However, soil samples 1 (PLINTOSSOLO HÁPLICO
Distrófico—sandy clay loam textural class), 5 (NEOSSOLO QUARTZARÊNICO
Órtico—textural sand class), 7 (LATOSSOLO AMARELO Distrófico—sandy loam
The Particle Size Causes a Change in the Determination of Soil Color Via. . . 221

Table 8 Colors of the 18 soil samples in Munsell Soil Color Charts (MSCC) values in four particle
sizes (granulometry). MSCC colors were calculated from RGB color data obtained with the Nix™
Pro 2 color sensor
Soil Granulometry (mm) MSCC* Color name
1 2 2.5YR 6/4 Light yellowish brown
0.297 2.5YR 6/4 Light yellowish brown
0.149 5YR 6/5 Light reddish brown
0.074 5YR 6/5 Light reddish brown
2 2 2.5YR 5/6 Light olive brown
0.297 2.5YR 5/6 Light olive brown
0.149 2.5YR 6/6 Olive yellow
0.074 2.5YR 5/7 Light olive brown
3 2 5YR 5/6 Yellowish red
0.297 5YR 5/5 Reddish brown
0.149 5YR 5/5 Reddish brown
0.074 5YR 6/5 Light reddish brown
4 2 7.5YR 6/4 Light brown
0.297 7.5YR 6/4 Light brown
0.149 7.5YR 6/4 Light brown
0.074 7.5YR 7/4 Pink
5 2 5YR 5/4 Reddish brown
0.297 7.5YR 6/4 Light brown
0.149 7.5YR 7/4 Pink
0.074 7.5YR 7/3 Pink
6 2 7.5YR 5/3 Brown
0.297 7.5YR 5/3 Brown
0.149 7.5YR 6/4 Light brown
0.074 7.5YR 6/3 Light brown
7 2 5YR 6/5 Light reddish brown
0.297 5YR 5/5 Reddish brown
0.149 5YR 6/5 Light reddish brown
0.074 7.5YR 6/5 Light brown
8 2 5YR 4/4 Reddish brown
0.297 5YR 5/5 Reddish brown
0.149 5YR 5/5 Reddish brown
0.074 5YR 5/5 Reddish brown
9 2 7.5YR 7/3 Pink
0.297 7.5YR 7/3 Pink
0.149 7.5YR 7/3 Pink
0.074 7.5YR 7/3 Pink
10 2 2.5YR 3/4 Dark olive brown
0.297 2.5YR 4/5 Olive brown
0.149 2.5YR 4/6 Olive brown
0.074 2.5YR 4/6 Olive brown
(continued)
222 V. F. Condé et al.

Table 8 (continued)
Soil Granulometry (mm) MSCC* Color name
11 2 2.5YR 4/5 Olive brown
0.297 2.5YR 4/5 Olive brown
0.149 2.5YR 4/5 Olive brown
0.074 2.5YR 4/5 Olive brown
12 2 5YR 3/1 Very dark gray
0.297 5YR 3/1 Very dark gray
0.149 5YR 4/2 Dark reddish gray
0.074 5YR 4/2 Dark reddish gray
13 2 5YR 3/1 Very dark gray
0.297 5YR 3/2 Dark reddish brown
0.149 5YR 4/2 Dark reddish gray
0.074 5YR 5/2 Reddish gray
14 2 5YR 4/5 Reddish brown
0.297 5YR 4/5 Reddish brown
0.149 5YR 5/5 Reddish brown
0.074 5YR 5/5 Reddish brown
15 2 5YR 4/5 Reddish brown
0.297 5YR 4/5 Reddish brown
0.149 5YR 5/5 Reddish brown
0.074 5YR 5/5 Reddish brown
16 2 7.5YR 6/4 Light brown
0.297 5YR 6/4 Light reddish brown
0.149 5YR 6/4 Light reddish brown
0.074 5YR 6/4 Light reddish brown
17 2 7.5YR 5/4 Brown
0.297 7.5YR 5/5 Brown
0.149 10YR 6/5 Light yellowish brown
0.074 10YR 6/4 Light yellowish brown
18 2 7.5YR 5/4 Brown
0.297 7.5YR 5/4 Brown
0.149 7.5YR 6/4 Light brown
0.074 7.5YR 6/4 Light brown

textural class), 16 (PLINTOSSOLO PÉTRICO Concrecionário—frank textural


class) and 17 (ARGISSOLO ACINZENTADO Ta Distrófico—clayey textural class)
showed variation in hue, which configures an essential alteration in the nomen-
clature of the color, because, between the particle sizes, there were divergences
in yellow and red-yellow hues. In the particle sizes 10 and 50 mesh of samples
1 and 17, hue 2.5YR and 7.5 YR were obtained, respectively. In 100 and 200 mesh,
5YR in sample 1, and 10YR in 17. In 10 mesh in samples 5 and 16, the hues 5YR
and 7.5YR were generated respectively, while in the other particle sizes, 7.5 YR in
The Particle Size Causes a Change in the Determination of Soil Color Via. . . 223

sample 5 and 5YR in sample 16. Sample 7 showed a 7.5YR hue at 200 mesh and
5YR for the different particle sizes.
When converting to the Munsell system, there were minor variations in value
and chroma for practically all soil samples in the four particle sizes. However, 11 of
the 18 samples showed similar results in the 2 mm and 0.297 mm particle sizes. In
addition, five soil samples (1, 5, 7, 16, and 17) had a hue variation, representing
an essential change in color nomenclature (Table 1). In general, the 2 mm and
0.297 mm particle sizes presented a performance similar to the readings taken in
the field.
Red soils have 10R and 2.5YR hues, red-yellow soils have 5YR hues, and
yellow soils have 7.5YR and 10YR hues (Santos et al., 2018). Of the seven
soil samples classified with color in the second categorical level, only two did
not match the determination made in the field with Munsell Soil Color Charts
(MSCC) when analyzed using a Nix Pro 2 color sensor. Sample 8 (ARGISSOLO
VERMELHO) was classified as red-yellow, 5YR hue regardless of particle size.
Sample 7 (LATOSSOLO AMARELO) was classified as yellow-red (hue 7.5YR) in
the particle size 0.074 mm and red-yellow (hue 5YR) in the other particle sizes.
It is essential to determine methodologies for using sensors in the field to
optimize the work. For example, the analysis of soil color in the field is performed
with soil in wet conditions (Santos et al., 2015). In the experiments of this study, we
used air-dried soil samples; however, new tests must be carried out with wet samples
to define field methods.

4 Conclusions

The Nix™ Pro 2 color sensor could potentially optimize soil color determination.
Particle size alters the spectral behavior of soil samples for frequencies in the visible
region. Grain sizes of 2 mm and 0.297 mm performed similarly to field readings.
Although there is variability in the reading as a function of particle size in a
color system such as RGB, the differences observed are predominantly sandy when
converted to the Munsell color system. The Nix Pro 2 color sensor can optimize soil
color determination; however, analytical procedures require a careful methodology
to ensure reproducibility and repeatability.

References

Barbosa, C. C. F., Novo, E. M. L. M., & Martins, V. S. (2019). Introdução ao Sensoriamento


Remoto de sistemas aquáticos.
Beaudette, D. E., Roudier, P., & O’Geen, A. T. (2013). Algorithms for quantitative pedol-
ogy: A toolkit for soil scientists. Comput Geosci, 52, 258–268. https://doi.org/10.1016/
j.cageo.2012.10.020
224 V. F. Condé et al.

de Novo M, E. M. L., & Ponzoni, F. J. (2001). Introdução ao sensoriameno remoto. São José dos
Campos.
Demattê, J. A. M., Bortoletto, M. A. M., Vasques, G. M., & Rizzo, R. (2011). Quantifi-
cação de matéria orgânica do solo através de modelos matemáticos utilizando colorime-
tria no sistema Munsell de cores. Bragantia, 70, 590–597. https://doi.org/10.1590/S0006-
87052011005000006
dos Santos, R. D., dos Santos, H. G., Ker, J. C., et al. (2015). Manual de descrição e coleta de
solos no campo (7a ed.). Sociedade Brasileira de Ciência do Solo.
Ferreira, D. F. (2019). SISVAR: A computer analysis system to fixed effects split plot type designs.
Revista Brasileira de Biometria, 37(4), 529, 20 dez.
Holman, B. W. B., Collins, D., Kilgannon, A. K., & Hopkins, D. L. (2018). The effect of technical
replicate (repeats) on Nix Pro Color Sensor™ measurement precision for meat: A case-study on
aged beef colour stability. Meat Sci, 135, 42–45. https://doi.org/10.1016/j.meatsci.2017.09.001
Mancini, M., Weindorf, D. C., Monteiro, M. E. C., et al. (2020). From sensor data to Munsell color
system: Machine learning algorithm applied to tropical soil color classification via Nix™ Pro
sensor. Geoderma, 375, 114471. https://doi.org/10.1016/j.geoderma.2020.114471
Moritsuka, N., Matsuoka, K., Katsura, K., et al. (2014). Soil color analysis for statistically
estimating total carbon, total nitrogen and active iron contents in Japanese agricultural soils.
Soil Sci Plant Nutr, 60, 475–485. https://doi.org/10.1080/00380768.2014.906295
Moritsuka, N., Kawamura, K., Tsujimoto, Y., et al. (2019). Comparison of visual and instrumental
measurements of soil color with different low-cost colorimeters. Soil Sci Plant Nutr, 65, 605–
615. https://doi.org/10.1080/00380768.2019.1676624
Munsell, A. H. (1905). A color notation, Copyright. Boston.
Munsell Color Copyright. (2021). https://munsell.com/about-munsell-color/. Accessed 1 Apr
2021.
Nix Sensor. (2021). https://www.nixsensor.com/. Accessed 2 Apr 2021.
Pallmann, P. (2017). Package: modest (Model-Based Dose-Escalation Trials).
Pedersen, T. L. (2020) Package ‘ patchwork ’: The Composer of Plots.
R Core Team. (2020). R: A Language and Environment for Statistical Computing.
Santos, R. D. dos., et al. (2015). Manual de descrição e coleta de solos no campo. 7a ed.Viçosa,
MG: Sociedade Brasileira de Ciência do Solo.
Santos, H. G., et al. (2018). Sistema Brasileiro de Classificação de Solos (5th ed.). Embrapa.
Schwartz, M. (2021). WriteXLS: Cross-Platform Perl Based R Function to Create Excel 2003
(XLS) and Excel 2007 (XLSX) Files.
Silva, S. H. G., Hartemink, A. E., Teixeira AF dos, S., et al. (2018). Soil weathering analysis using
a portable X-ray fluorescence (PXRF) spectrometer in an Inceptisol from the Brazilian Cerrado.
Appl Clay Sci, 162, 27–37. https://doi.org/10.1016/j.clay.2018.05.028
Wickham, H., & Bryan, J. (2019). Readxl: Read Excel Files.
Wickham, H., Chang, W., Henry, L., et al. (2020a). Package ‘ggplot2’: Create Elegant Data
Visualisations Using the Grammar of Graphics.
Wickham, H., Hester, J., & Francois, R. (2020b). Package ‘ readr ’: Read Rectangular Text Data.
Wickham, H., François, R, Henry, L., & Müller, K. (2021). Package ‘ dplyr ’.
Mapping Soil Salinity: A Case Study
from Marajó Island, Brazilian Amazonia
Thematic Session: Advances in Soil Sensing

Renata Jordan Henriques, Fábio Soares de Oliveira,


Carlos Ernesto Gonçalves Reynaud Schaefer, Márcio Rocha Francelino,
Eduardo Osório Senra, Valéria Ramos Lourenço, David Lukas de Arruda,
and Paulo Roberto Canto Lopes

1 Introduction

The Marajó archipelago in the Amazon River mouth, northern Brazil (between Pará
and Amapá State), has more than 30.000 km2 (Lisboa, 2012), with largest areas of
well-drained uplands (10–35 m a.s.l) with rainforests, and waterlogged lowlands
(1–9 m a.s.l) (Henriques et al., 2022). The Marajó Island comprises a complex
geoenvironmental ecosystem with lowlands exhibiting extensive areas of saline
herbaceous plains (IBGE, 2003; Cohen et al., 2008; Henriques, 2022; Henriques
et al., 2022). The aim of this study is to better understand the largest salt-plains
of Marajó Island, and its ecosystem dynamic. This study is aligned with the 2030
Agenda for Sustainable Development (United Nations, 2018). The heart of this
action involves the 17 Sustainable Development Goals (SDGs) to encourage all the
countries to discuss actions about, mainly, climate change (United Nations, 2018,
2020).
Salts accumulation on the earth’s surface results from physic-chemical and
human processes, representing considerable environmental hazard in some places
(Gorji et al., 2017). Remote sensing is a way to identify surface salinity and its
particularities by analyses of spectral signatures and algebra of sensor image bands
(Wu et al., 2018). Marajó Island is one of the largest fluvio-marine archipelagos
worldwide, with expressive areas of salt-affected soils in Brazil. Most of the salinity
areas are in arid and semi-arid regions (Marazuela et al., 2019), but also occupy

R. J. Henriques (✉) · F. S. de Oliveira


Department of Geography, University Federal of Minas Gerais, Belo Horizonte, MG, Brazil
C. E. G. R. Schaefer · M. R. Francelino · E. O. Senra · V. R. Lourenço · D. L. de Arruda
Department of Soils Science, University Federal of Viçosa, Viçosa, MG, Brazil
P. R. C. Lopes
University Campus of Castanhal, University Federal of Pará, Castanhal, PA, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 225
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_16
226 R. J. Henriques et al.

Fig. 1 Marajó Island in the Amazon River mouth, northern Brazil

humid regions (wetlands), such as the Pantanal biome in Brazil (Furquim et al.,
2017), and coastlines in the world (Yu et al., 2014; Wu et al., 2018). The excess
of soluble salts (saline soils), the dominance of exchangeable sodium in the soil
exchange complex (sodic soils) and both combination (saline-sodic), can affect
several chemical and physical soil properties over time in the landscapes, besides
inducing degradation process as erosion trends, vegetation stress, and economic
damages (Metternicht & Zinck, 2003; Gorji et al., 2017).
The soil salinity is influenced mainly by soil texture, groundwater fluctuations,
capillary process that induces the salt’s ascendant into soil profile by evaporation at
the surface (Li et al., 2014). Saline soils in the coastlines worldwide are commonly
associated with the result of seawater fluctuation, in which Marajó Island evidence
intrusion water through the eastern sea-platform, with possible tectonic influence
associated with strong local evapotranspiration (Rossetti et al., 2013a, b), and zones
of improper soil and water management (Wu et al., 2018). The salt-plains were
characterized and observed in field as extensive plains subjected to waterlogging
processes (Henriques et al., 2022). In this context, the aim of this work is to present
the applicability of spectral indices to recognize lands with superficial salinity
using Sentinel-2 images. The study is aligned with the necessity of increase the
knowledge of soil salinity in Brazil, specially of its particularities in the fluvial-
marine ecosystem, in support of PronaSolos Brazil Soil Project. As a main guide
hypothesis, the study area (Fig. 1) has potential to reveal great cycles of marine
transgression and regression in the last Eras.
Mapping Soil Salinity: A Case Study from Marajó Island, Brazilian Amazonia 227

2 Methodology

The study area, Marajó Island, was visited in December 2018, during summer
(Fig. 1). The study was developed in three main stages: (i) field work for soil
sampling and in situ landscape observations; (ii) laboratory procedures for obtained
the electrical conductivity (ECs ) of the most superficial horizon soil samples; and
(iii) digital processing of Sentinel-2A images for obtain the signature spectral values
of surface. The first step consisted in samples of the most superficial soil horizon
between different environment observations: P1 as a non-saline land, and P2, P3,
P4 recognized in a domain of apicums and salt-plains with P4 in a mangrove
ecosystem domain (Henriques et al., 2022; Henriques, 2022). The P5 were identified
by Sentinel-2 image with 10 m of spatial resolution. The study area is in the most
eastern Marajó Island, near to Pará State with 59.1 km2 (Fig. 1) and is considered as
an important ecosystem that regards Holocene indicators of the regional evolution
of the Amazon River drainage basin in the South America plate.
The second step consisted in obtained the electrical conductivity of saturated
paste extract method (ECs ) (USDA, 2014) of P1, P2, P3, and P4 superficial soil
horizons sampled. The method consists of air-dried the samples and obtained the
fine-earth fraction selected by 2 mm sieve, also known as air-dried fine earth
(ADFE) (Schoeneberger et al., 2012; Soil Survey Staff, 2014), or Terra fina seca
ao ar (TFSA), in Portuguese (Teixeira et al., 2017). Were weighed 100 g of ADFE
samples and added deionized water to obtain a homogeneous mass until present
free sliding movement in the spatula (USDA, 2014). The saturated past of P1 to P4
samples were overnighted and realized the ECs in the next day. The values obtained
are represented by unit of mS.cm−1 .
In the last stage, were used salinity indices based on methodology of Rouse et
al. (1973), Khan et al. (2001), Fourati (2015), and Wang et al. (2019). Before the
application of these indices, were obtained the Sentinel-2 image from Copernicus
database (ESA, 2015), and applied the atmospheric corrections using the Sen2Cor
plugin according with Louis et al. (2016) and Main Knor (2017). The Sentinel-2
image corrected were utilized the Normalized Difference Vegetation Index (NDVI)
(Rouse et al., 1973), and the Normalized Difference Salinity Index (NDSI re1)
(Wang et al., 2019), in which are appropriated for identified spectral information
of land vegetation cover and potential of saline occurrence. The Intensity Index
(Int1) (Fourati, 2015), Salinity Index 1 (SI1) (Khan et al., 2001), and Salinity
Index red edge1 (SI1 re1) (Wang et al., 2019), in turn, were more specific remote
sensing analysis to identify how much salt contain an environment observed
(Table 1).
The five (5) spectral indices applied for the P1 to P5 were represented in different
maps (Fig. 2), with purpose to better compare the results of each index and the
respective environment recognized in field work in 2018. The maps were patterned
according to the delimited study area (Fig. 1), and classified into four (4) classes, (i)
flooded area with presence of water; (ii) areas of significant vegetation cover what
228 R. J. Henriques et al.

Table 1 Spectral indices of salt-affected soils


Index Equation Sentinel-2 equation Reference
NDSI re1 (red-edge 1 − NIR)/ (B5 − B8A)/(B5 + B8A) Wang et al. (2019)
Normalized (red-edge1 + NIR)
Difference Salinity
Index red edge1
Int1 (G + R)/2 (B3 + B4)/2 Fourati (2015)
Intensity Index1
SI1 (G X R) 0.5 (B3 x B4) 0.5 Khan et al. (2001)
Salinity Index1
SI1 re1 (B + red-edge1) 0.5 (B3 x B5) 0.5 Wang et al. (2019)
Salinity Index red
edge1
NDVI (NIR − R) / (B4 − B8A)/(B4 + B8A) Rouse et al. (1973)
Normalized (NIR + R)
Difference
Vegetation Index

Fig. 2 Salt-affected soils in the eastern of Marajó Island, Brazil

turns difficult obtain the surface saline data; (iii) (iv) two classes of the main surface
salinity information obtained according to the spectral indices, visual analysis, and
the results of ECs soil sampled.
Mapping Soil Salinity: A Case Study from Marajó Island, Brazilian Amazonia 229

3 Results and discussion

The P1 soil sampled presented it electric conductive (ECs ) of 0.09 mS/cm−1 , while
P2 had 16.2 mS/cm−1 , P3 with 6.1 mS/cm−1 , and P4 5.03 mS/cm−1 . The P5,
identified by Sentinel-2 image present 4.6 mS/cm−1 . The Normalized Difference
Vegetation Index (NDVI) evidenced better results of the vegetation cover at the
surface, with strong surface reflectance in north zone and in smaller areas in the
south (Fig. 2) The class of Vegetation stressed in the NDVI map (Fig. 2) correspond
to largest areas with herbaceous plains with halophytes, that represent plants with
salt-tolerant and adapted to saline semi-deserts and coastal environments with
mangrove (Ben Hamed, 2022). The NDVI evidence the most exposed areas of study
area, corresponding, also, as a traditional spectral index used to diagnose vegetation
health. The Normalized Difference Salinity Index (NDSI re1), compared to NDVI,
is a better spectral saline diagnosis, in which, for the eastern Marajó Island, reveal
significant values for the north portions (Fig. 2).
The Sentinel-2 visual remote sensing analysis corroborated with the results of
the NDSI re1, represented the most extensive plains with rarefied vegetation. While
the NDVI and NDSI re1 are indices whose interpretation are also combined with the
patterns of vegetation occurrence, the Intensity Index1 (Int1), Salinity Index1 (SI1),
and Salinity Index red-edge1 (SI re1) evidenced best results and validation only for
the interior of delimitated area represented by the bordered red line in the Fig. 2.
The Intensity Index (Int1) and Salinity Index1 (SI1) evidenced more areas flooded
with water in the south of study area near the P1 (Fig. 2). The results for Salinity
Index red-edge1 (SI re1) presents the most generative values for the study area, in
which it was also the most confused with the salinity of sea water. As conclusion,
the results of the five (5) indices applied combined revealed expressive saline areas
in the eastern of Marajó Island, with important salt-plains in the north of study area
(Fig. 2).
The P1 and P2 soil samples and landscape observations are the compara-
tion between a non-saline area (P1) and areas affected by salt (P2) (Fig. 3).
The spectral signature of each point (P1-P5) evidenced considerable vegeta-
tion influence with high values of absorption wavelength between B2, B3, and
B4 Sentinel-2 (Band2Blue-490 nm, B3Green-560, and B4Red-665 nm), with
high reflectance between B7-VegRedEdge-783 nm, B8-NIR-842 nm, and B8A-
VegRedEdge-865 nm, in which commonly characterize the NDVI results for good
health vegetation with less hydric stress. P2 and P3 presented a peak at the B11-
SWIR-1.610 nm, representing a specific spectral signature for the most saline-sodic
soils analysed, with 16.2 and 6.1 mS/cm−1 , respectively. The P4 (5.03 mS/cm−1 )
and P5 (4.6 mS/cm−1 ) presented less reflectance at B12-SWIR-2190 nm, with high
values at B8-NIR-842 nm (Fig. 3).
The spectral indices presented elevate concentrations of salt in almost of the
59.1 km2 of the study area (Figs. 2 and 3), with potential to be applicably on the
entire lowlands (salt-plains and apicums) of Marajó Island, which totals more than
5.000 km2 (Henriques et al., 2022). For the eastern Marajó Island were identified
230 R. J. Henriques et al.

Fig. 3 Spectral signature and the example of non-saline (P1) and salt-affected soils and plains
(P2)

Fig. 4 Marajó Island salt-plains and apicums, with the halophyte Sesuvium portulacastrum

surface salt distribution patterns, with most predominance of herbaceous plains in


the north, and possible Holocene connection with the waters of Marajó bay. As
possible environment landscape evolution, the southern zone of study area reveals
the predominance of salt dispersed within the soils, and less waterlogging processes
compared to the North.
The main indicator of salinity observed in field are the extensive saline plains
associated with halophyte plants, such as Sesuvium portulacastrum, and commonly
semi-arid plants such as the palm tree as Copernicia prunifera (Miller) H.E.
Moore (Arecaceae) (carnaúba) (Fig. 4). The area, in general, has extensive apicums
associated with the Marajó’s mangrove ecosystem (Cohen et al., 2008; França et al.,
2019), that made a very fragile landscape in front of a climate change and sea-level
rise predictions for the next decades (Vousdoukas et al., 2020). The spectral image
Mapping Soil Salinity: A Case Study from Marajó Island, Brazilian Amazonia 231

signatures and the saline indices applied attained better results where the salt is most
concentrated near the surface.
The P3 (Fig. 4) was a representative landscape sampled. The salt-plains and
apicums are visibly with expressive zones of water accumulation along with the
adjacent river dynamics bordered by mangrove with Avicennia. The margins of
the river are influenced by the tidal effects of the Marajó bay and present its
geomorphological evidence (Fig. 4). The halophytes (Sesuvium portulacastrum)
founded in field work are specific and adaptable plants of saline areas (Ben Hamed,
2022) and can indicate a long-term period of stability in the island. The North of
the study area was analysed mainly by remote sensing, in reason of the difficult of
access though the river’s dynamic processes. However, the results of the salinity
indices and NDVI and NDSI has indicatives of a most and largest salt-plain in
compared to the salt-plains and apicums of the south (Fig. 4). The North is visually
characterized by an extensive plain with potential of a most complex fluvio-marine
dynamic.
The spatial distribution of the salt-plains suggests more connection with the
Marajó bay waters, with signs of an extensive transgression who could be bathed
the entire study area analysed. This interpretation is corroborated with the field
work observations according, mainly, to the Soil Profile P8 in Henriques et al.
(2022), when a marked stratigraphy discontinued represented by a buried sub
superficial mangrove, follow by and arenols horizon, and an organic topsoil horizon
near the surface can indicate significant evidence of an ancient moment (Holocene
or before) bathed by an ocean associated with complex sandy-dunes ecosystem.
Considering the time of soil formation in tropical zones, this water transgression
could be occurred in the Holocene-Pleistocene (Quaternary) or before ages such as
the Cenozoic (Pliocene, Miocene, with more than 5 Ma). Marajó Island as a part of
the Equatorial Atlantic Margin and in the context of the Pangea fragmentation in the
Mesozoic (ca. 130 Ma) (Zalán, 2007, 2012), the evolution of the Island revealed a
complex ecosystem that need more landscape studies in the most diverse of fields,
such as the tectonic dynamic (Rossetti et al., 2007, 2013b; Castro et al., 2010), local
biomes such as mangroves, apicums and the herbaceous salt-plains (Cohen & Lara,
2003; Cohen et al., 2005), and climate change discussions (Marengo & Souza Jr.,
2018; Viola & Franchini, 2018; Prasad & Pietrzykowski, 2020).

4 Conclusions

The saline indices were satisfactory to analyse the salt distribution patterns in
the eastern plains Marajó Island. The results indicate the Normalized Difference
Vegetation Index (NDVI) and the Normalized Difference Salinity Index red-edge1
(NDSI re1) as important satellite image indicators about the state of vegetation
health and the reflectance of the salt-plains and its reflectance. While the most
specific saline indices (Intensity Index-Int1, Salinity Index 1-SI1, and Salinity Index
red-edge1-SI re1) attended well the identification of the herbaceous salt-plains of
232 R. J. Henriques et al.

the study area with presence of Sesuvium portulacastrum. The spectral signature
of each point (P1-P5) corroborates with the strong presence of salt in the soils.
The field work was important to observe the main landscape characteristics, which
helped identified the complex ecosystem formed by the interactions between the
waters of the Amazon River and the Atlantic Ocean, associated with the evolution
of the Equatorial Atlantic Margin of Brazil. The study, as conclusion, evidenced
the great potential of remote sensing in recognize saline areas in Brazil, especially
in a complex fluvio-marine ecosystem such as the Marajó archipelago in the
Amazon River mouth. In this context, the eastern of Marajó Island indicate a great
geodiversity, with signs of expressive marine water incursion, tidal effects, rainfall
accumulation, tectonic dynamic, intensive evapotranspiration processes, and sea-
level rise predictions for the next decades. This study also evidenced the high
potential of remote sensing in recognize saline-sodic areas in Brazil, especially in a
complex fluvio-marine ecosystem such as the Marajó archipelago in contribution to
the PronaSolos Brazil Soil Project.

Acknowledgements National Council for Scientific and Technological Development (CNPq,


Brazil) Project Number 27/2017, and Coordination for the Improvement of Higher Education
Personnel (CAPES, Brazil).

References

Ben Hamed, K. (2022). Responses of halophytes to nitric oxide (NO). In V. P. Singh, S. Singh,
D. K. Tripathi, & M. C. Romero-Puertas (Eds.), Nitric Oxide in Plant Biology (pp. 391–406).
Elsevier.
Castro, D. F., de Fátima Rossetti, D., & Ruiz Pessenda, L. C. (2010). Facies, δ13C, δ15N and C/N
analyses in a late Quaternary compound estuarine fill, northern Brazil and relation to sea level.
Marine Geology, 274(1–4), 135–150. https://doi.org/10.1016/j.margeo.2010.03.011
Cohen, M. C. L., Behling, H., & Lara, R. J. (2005). Amazonian mangrove dynamics during the
last millennium: The relative sea-level and the Little Ice Age. Review of Palaeobotany and
Palynology, 136(1–2), 93–108. https://doi.org/10.1016/j.revpalbo.2005.05.002
Cohen, M. C. L., & Lara, J. (2003). Temporal changes of mangrove vegetation boundaries
in Amazonia: Application of GIS and remote sensing techniques. Wetlands Ecology and
Management, 11, 223–231.
Cohen, M. C. L., Lara, R. J., Smith, C. B., Angélica, R. S., Dias, B. S., & Pequeno, T. (2008).
Wetland dynamics of Marajó Island, northern Brazil, during the last 1000 years. Catena (Amst),
76(1), 70–77. https://doi.org/10.1016/j.catena.2008.09.009
ESA. (2015). ESA Introducing Sentinel-2.
Fourati, H. (2015). Modeling of soil salinity within a semi-arid region using spectral analysis.
Arabian Journal of Geosciences, 8, 1–8.
França, M. C., Cohen, M. C. L., Pessenda, L. C. R., Francisquini, M. I., de Ribeiro, C. M. J., &
de Oliveira, T. R. (2019). Tannin as a New Indicator of Paleomangrove Occurrence within an
Amazonian Coastal Region. Journal of Coastal Research, 35(1), 82. https://doi.org/10.2112/
jcoastres-d-17-00023.1
Furquim, S. A. C., Santos, M. A., Vidoca, T. T., de Almeida Balbino, M., & Cardoso, E. L. (2017).
Salt-affected soils evolution and fluvial dynamics in the Pantanal wetland, Brazil. Geoderma,
286, 139–152. https://doi.org/10.1016/j.geoderma.2016.10.030
Mapping Soil Salinity: A Case Study from Marajó Island, Brazilian Amazonia 233

Gorji, T., Sertel, E., & Tanik, A. (2017). Monitoring soil salinity via remote sensing technology
under data scarce conditions: A case study from Turkey. Ecological Indicators, 74, 384–391.
https://doi.org/10.1016/j.ecolind.2016.11.043
Henriques, R. J. (2022). Geoambientes, geoarqueologia e cenários de mudanças climáticas na Ilha
de Marajó. Universidade Federal de Minas Gerais.
Henriques, R. J., Oliveira, F. S., Schaefer, C. E. G. R., Francelino, M. R., Lopes, P. R. C.,
Senra, E. O., & Lourenço, V. R. (2022). Soils and landscapes of Marajó Island, Brazilian
Amazonia: Holocene evolution, geoarchaeology and climatic vulnerability. Environment and
Earth Science, 81(9), 25. https://doi.org/10.1007/s12665-022-10310-2
IBGE. (2003). Mapas da Amazônia Legal escala 1:250.000. Instituto Brasileiro de Geografia e
Estatística.
Khan, N. N., Rastoskuev, V. V., Shalina, E. V., & Sato, Y. (2001). Mapping salt-affected soils using
remote sensing indicators—A simple approach with the use of G1S IDRISI. In 22nd Asian
conference on remote sensing (pp. 25–29).
Li, X., Chang, S. X., & Salifu, K. F. (2014). Soil texture and layering effects on water and salt
dynamics in the presence of a water table: A review. Environmental Research Letters, 22, 41–
50. https://doi.org/10.1139/er-2013-0035
Lisboa, P. L. B. (2012). A terra dos Aruãs: Uma História Ecológica Do Arquipélago Do Marajó.
Museu Paraense Emílio Goeldi.
Louis, J., Debaecker, V., Pflug, B., Main-Knorn, M., Bieniarz, J., Mueller-Wilm, U., Cadau, E.,
& Gascon, F. (2016). Sentinel-2 Sen2Cor: L2A processor for users. European Space Agency,
(Special Publication) ESA SP SP-740(May):9–13.
Main Knor, M. (2017). Sen2Cor for Sentinel-2. In Conference paper (pp. 1–12).
Marazuela, M. A., Vázquez-Suñé, E., Ayora, C., García-Gil, A., & Palma, T. (2019). Hydrodynam-
ics of salt flat basins: The Salar de Atacama example. Science of the Total Environment, 651,
668–683. https://doi.org/10.1016/j.scitotenv.2018.09.190
Marengo, J. A., & Souza, Jr. C. (2018). Climate Change: Impacts and scenarios for the Amazon
(1st ed.). São Paulo.
Metternicht, G. I., & Zinck, J. A. (2003). Remote sensing of soil salinity: Potentials and constraints.
Remote Sensing of Environment, 85(1), 1–20. https://doi.org/10.1016/S0034-4257(02)00188-8
Prasad, M. N. V., & Pietrzykowski, M. (2020). Climate change and soil interactions (1st ed.).
Elsevier.
Rossetti, D. F., Valeriano, M. M., & Thales, M. (2007). An abandoned estuary within Marajó
Island: Implications for late quaternary paleogeography of northern Brazil. Estuaries and
Coasts, 30(5), 813–826. https://doi.org/10.1007/BF02841336
Rossetti, D. F., Rocca, R. R., & Tatumi, S. H. (2013a). Evolução dos Sedimentos Pós-Barreiras na
zona costeira da Bacia São Luís, Maranhão, Brasil. Boletim do Museu Paraense Emílio Goeldi
Ciências Naturais, 8(1), 11–25.
Rossetti, D. F., Bezerra, F. H. R., & Dominguez, J. M. L. (2013b). Late oligocene-miocene
transgressions along the equatorial and eastern margins of Brazil. Earth-Science Reviews, 123,
87–112.
Rouse, J., Haas, R., Schell, J., & Deering, D. (1973). Monitoring vegetation systems in the great
plains with ERTS.
Schoeneberger, P. J., Wysocki, E. C., Benham, E. C., & Staff SS. (2012). Field book for describing
and sampling soils. National Soil Survey Center.
Soil Survey Staff. (2014). Keys to soil taxonomy, 12th ed.
Teixeira, P. C., Donagemma, G. K., Fontana, A., & Teixeira, W. G. (2017). Manual de métodos de
análise de solo.
United Nations. (2018). The 2030 agenda and the sustainable development goals: An opportunity
for Latin America and the Caribbean, LC/G,2681. Santiago.
United Nations. (2020). Agenda 2030 para o desenvolvimento sustentável. sustainabledevelop-
ment.un.org
USDA. (2014). Soil survey field and laboratory methods manual.
Viola, E., & Franchini, M. (2018). Brazil and climate change: Beyond the Amazon. Routledge.
234 R. J. Henriques et al.

Vousdoukas, M. I., Ranasinghe, R., Mentaschi, L., Plomaritis, T. A., Athanasiou, P., Luijendijk,
A., & Feyen, L. (2020). Sandy coastlines under threat of erosion. Nature Climate Change, 10,
260–263.
Wang, J., Ding, J., Yu, D., Ma, X., Zhang, Z., Ge, X., Teng, D., Li, X., Liang, J., Lizaga, I., Chen,
X., Yuan, L., & Guo, Y. (2019). Capability of Sentinel-2 MSI data for monitoring and mapping
of soil salinity in dry and wet seasons in the Ebinur Lake region, Xinjiang, China. Geoderma,
353(May), 172–187. https://doi.org/10.1016/j.geoderma.2019.06.040
Wu, S., Wang, C., Liu, Y., Li, Y., Liu, J., Xu, A., Pan, K., Li, Y., & Pan, X. (2018). Mapping the salt
content in soil profiles using Vis-NIR hyperspectral imaging. Soil Science Society of America
Journal, 82(5), 1259–1269. https://doi.org/10.2136/sssaj2018.02.0074
Yu, J., Li, Y., Han, G., Zhou, D., Fu, Y., Guan, B., & Wang, G. (2014). The spatial distribution
characteristics of soil salinity in coastal zone of the Yellow River Delta. Environment and Earth
Science, 72, 589–599. https://doi.org/10.1007/s12665-013-2980-0
Zalán, P. V. (2007). Evolução Fanerozóica das Bacias Sedimentares Brasileiras. In Geologia da
Plataforma Sul-Americana (pp. 595–613).
Zalán, P. V. (2012). Bacias sedimentares da margem equatorial. In Y. Hasui, C. D. R. Carneiro, F.
F. M. de Almeida, & A. Bartorelli (Eds.), Geologia do Brasil (pp. 497–501). Beca.
Applied Morphometry to Digital Soil
Mapping in Detailed Scale
Thematic Session: Pedometrics Guidelines to
Systematic Soil

Gustavo Souza Valladares , Waldir de Carvalho Junior ,


and Helena Saraiva Koenow Pinheiro

1 Introduction

In the last decades with the advancement of computer science and available technol-
ogy, a large number of tools have become available for analysis and interpretation
of data in different areas of knowledge. Soil science was included in this context
by using techniques of data analysis such as: data mining and machine learning,
geographic information system (GIS), global positioning system (GPS), remote
and proximal sensors, digital terrain models (DTM), and digital elevation models
(DEM). The global advance of this technology promoted the development of digital
soil mapping, with researchs and products that improved in the last two to three
decades (McBratney et al., 2003).
In the soil surveys, topography is an important environmental characteristic to
identify and map soil units at different scales, also to understand soil-landscape
relationships. Pereira et al. (2022) cites several studies that relate soil types and
their relationship with the landscape in pedological and geomorphological studies
at different scales, and the terminology used according to the researcher’s interest
is Soil geomorphology or Pedogeomorphology (Zinck et al., 2015; Daniels et al.,
1971; Conacher & Dalrymple, 1977). If the interest is in soil genesis, evolution,
spatial distribution or cartography, can be used at the areas of Morphopedology,
Geomorphopedology or Geopedology (Tricart & Kilian, 1979; Zinck, 2013; Zinck

G. S. Valladares (✉)
Federal University of Piaui, Teresina, Piauí, Brazil
e-mail: [email protected]
W. de Carvalho Junior
Embrapa Soils, Rio de Janeiro, Brazil
H. S. K. Pinheiro
Federal Rural University of Rio de Janeiro, Rio de Janeiro, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 235
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_17
236 G. S. Valladares et al.

et al., 2015). Geopedology is in the first instance of a methodological approach to


soil survey, while providing at the same time a framework for geographic analysis
of soil distribution patterns (Zinck et al., 2015).
GIS science techniques based on DTM can be used to create primary and
secondary topographical attributes, such as slope, and plan curvature to mapping
soils in detailed scale (1:10,000) integrated into an artificial neural network (Silveira
et al., 2013). Meier et al. (2018) verified satisfactory performance in digital soil
mapping using machine learning models in a tropical mountainous area, based
on topographical and remote sensing attributes. The use of terrain attributes and
machine learning algorithms (randomForest) were applied to predict soil classes in
southeastern Brazil (Sena et al., 2020). Carvalho Júnior et al. (2020) verified that
the most efficient method for mapping soils in the mountainous region of the state
of Rio de Janeiro was ramdomForest comparing with others models of machine
learning.
This work applied a digital soil mapping methodology, in an experimental center
of “APTA-Frutas” in Jundiaí, SP, using morphometric parameters and machine
learning (randomForest). The aim of the work was to create a methodology to obtain
a preliminary soil map (digital map), to guide the pedologists in the field works to
elaborate soil units’ maps aiding the soil sampling. This map was compared with
other map made by conventional pedological methodologies, in detailed scale.

2 Material and Methods

The study area has 59 hectares and is located at Jundiaí, approximately 75 km


northwest of São Paulo, Brazil, in a mountainous relief in the Atlantic Plateau (Fig.
1). The study area have 1409 mm of annual precipitation, with the rainiest months
between October and March. The land use is predominantly apple, vineyard, peach,
citrus and natural vegetation (Atlantic Forest).
The original soil map of the area was made at 1:10,000 scale (Valadares et al.,
1971). It was digitalized in GIS, and the corresponding map’s legend adopted the
World Reference Base for Soil Resources and Classification-WRB (ISSS 2022).
Using the TOPOTORASTER tool in ArcGIS 10.3 package, a digital terrain
model (DTM) with 4 m of spatial resolution (Fig. 2a) was generated, based on the
1:10,000 vectorial features map (Melo & Lombardi Neto, 1999) with contour lines
equidistant at 5 m. Based on the DTM, covariates as curvature, slope (Fig. 2a, b and
c) and distance from the streams, were derived at ArcGIS software.
Soil orders were identified in 104 sampled points (Fig. 1a and b), and a 10 m
buffer were created around de sample points, to obtain a database with 2329 points
according the defined soil classes. The database was separated into training (70%)
and testing (30%) sample sets, being the training used to calibrate the model
and the testing sample set used to validate, obtaining the accuracy indexes. The
morphometric information was used to correlate with soils classes allowing to create
Applied Morphometry to Digital Soil Mapping in Detailed Scale 237

Fig. 1 Study area located of CAPTA-Frutas, Jundiaí, São Paulo State, Brazil, and sample points

a randomForest model (Breiman, 2002; Liaw & Wiener, 2002). Machine learning
models and database processing was performed at R software (R Core Team, 2022).
The randomForest is a classifier consisting of a collection of decision trees in
which the random vector is independent identically distributed and each tree casts a
unit vote for the most popular class at input x. In addition, it is very user-friendly in
the sense that it has only two parameters to fit the models (the number of variables in
the random subset at each node and the number of trees in the forest), and is usually
not very sensitive to outlier (Carvalho Júnior et al., 2020).
The validation of the digital map and verification of model accuracy was
performed by the Kappa index (K) and overall accuracy. The digital map was
compared with the traditional soil map, and kappa index was obtained.

3 Results and Discussion

The altimetry map ranges from 690 to 757 m (Fig. 2a). Figure 3a shows the
frequency distribution for altitude. Within the elevation range of 690–703 m all
the Dystric Gleysols and a part of the Orthic Acrisols occur, while in the elevation
higher than 730 m occur the Dystric Cambisols and the Xanthic Ferralsols. In the
altimetry of 704 to 730 m the Orthic Acrisols, Dystric Cambisols and the Xanthic
Ferralsols are common. It is not possible to differentiate exactly the soil types using
only the altimetry map.
238 G. S. Valladares et al.

Fig. 2 Maps derived from DTM of CAPTA-Frutas, Jundiaí, SP, Brazil. (a) altimetry and hydrog-
raphy; (b) curvature; (c) slope; distance from the streams (d)
Applied Morphometry to Digital Soil Mapping in Detailed Scale 239

Fig. 2 (continued)
240 G. S. Valladares et al.

Fig. 3 Frequence distribution for morfometric atributtes of CAPTA-Frutas, Jundiaí, SP, Brazil. (a)
elevation; (b) curvature; (c) slope; and (d) distance from the streams
Applied Morphometry to Digital Soil Mapping in Detailed Scale 241

Table 1 Descriptive statistics for soil classes and covariates


Parameters Dystric Cambisols Dystric Gleysols Xanthic Ferralsols Orthic Acrisols
Altimetry (m)
Minimum 696 691 709 691
Average 728 699 729 709
Maximum 757 718 754 735
sd 13 6 13 11
Declivity (%)
Minimum 0 0 3 0
Average 22 10 12 17
Maximum 73 50 34 51
sd 12 10 5 9
Curvature
Minimum −14.39 −7.24 −6.17 −10.52
Average 0.18 −0.89 0.12 −0.37
Maximum 13.06 2.96 7.97 3.42
sd 1.19 1.22 0.99 0.98
Distance (m)
Minimum 7 0 165 1
Average 163 13 281 106
Maximum 361 114 356 303
sd 72 21 39 78
sd standard deviation

The curvature map (Fig. 2b) varies from −14 (concave areas, <0) to 13 (convex
areas, >0). The local condition of a mountainous area with irregular soils surfaces
is representative in the study area corroborating the large curvature range. The
frequency distribution for curvature is presented (Fig. 3b). In concave areas, Dystric
Gleysols or Orthic Acrisols are commonly founded, while Dystric Cambisols are
predominant in the convex areas, with lesser expression of Xanthic Ferralsols.
The Xanthic Ferralsols also occurs in areas with curvature values close to zero,
represented by flattened areas.
In the study area, slopes vary from 0% to 73% (Fig. 2c). It was observed the
occurrence of soils related to the slope gradient (Fig. 3c), occurring from the lowest
to the highest slope, the following soils: Dystric Gleysols, Xanthic Ferralsols, Orthic
Acrisols, and Dystric Cambisols. The Dystric Cambisols can occur in reliefs with
low slope gradient at the tops of hills, but it’s a minority occurrence. The distance
from the streams frequency distribution in presented in the Fig. 3d, and can be
observed a linearized form. Statistics for soil class is presented in Table 1.
The randomForest model (ntree = 500 and mtry = 3) presented high accuracy
indexes. Considering digital map using training dataset, the accuracy was 98%,
reaching a K index of 0.97, which is considered an “almost perfect” result, according
to Landis and Koch (1977). Considering traditional map (Fig. 4a) and the sampling
242 G. S. Valladares et al.

Fig. 4 Soil map elaborated by traditional soil mapping (a); preliminary digital soil map derived
from morphometric attributes (b) of CAPTA-Frutas, Jundiaí, SP, Brazil
Applied Morphometry to Digital Soil Mapping in Detailed Scale 243

points used for training, 97% were correct in the classification, reaching a K of 0.95,
which was a worse result compared with randomForest indexes.
The class with the worst classification in the traditional map was the Orthic
Acrisol (90%), as a result of a polygon very narrow, there represents an underesti-
mation of the area occupied by this soil class. Assessing each class individually, the
traditional map was more successful for Dystric Cambisols and Xanthic Ferralsols,
and the digital map for Dystric Gleysols and Orthic Acrisols, with accuracy >97%
for these soil classes.
The conventional pedological map was overlapped with digital soil map (Fig.
4), with global equivalence of 84%, and K equal 0.65, in the Dystric Cambisol the
equivalence in area was 93%, in the Xanthic Latosol was 88%, in the Dystric Gleisol
was 71%, and Orthic Acrisol was 62%. The Orthic Acrisol presented confusion with
Dystric Cambisol, and Dystric Gleisol with Orthic Acrisol.

4 Conclusions

The methodology proposed was adequate to preliminary mapping of some soil


types, as Dystric Cambisol, Xanthic Latosol, and Dystric Gleisol, with area
equivalence greater than 70%. The methodology could be used combined with
conventional soil survey techniques to obtain better results. The digital soil class
map is very useful guide further soil sampling and to create preliminary soil maps.

Acknowledgements Brazilian National Council for Scientific and Technological Development


(CNPq) and São Paulo Research Foundation (FAPESP) for fundings.

References

Breiman, L. (2002). RandomForests. Machine Learning, 45, 5–32.


Carvalho Junior, W. D., Pereira, N. R., Fernandes Filho, E. I., et al. (2020). Sample design effects
on soil unit prediction with machine: Randomness, uncertainty, and majority map. Revista
Brasileira de Ciência do Solo, 44. https://doi.org/10.36783/18069657rbcs20190120
Conacher, A. J., & Dalrymple, J. B. (1977). The nine unit landsurface model: An approach to
pedogeomorphic research. Geoderma Special Issue, 18, 1–2.
Daniels, R. B., Gamble, E. E., & Cady, J. G. (1971). The relation between geomorphology and soil
morphology and genesis. Advances in Agronomy, 23, 51–87.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data.
Biometrics, 33, 159–174.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2, 18–22.
McBratney, A. B., Mendonça-Santos, M. L., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117, 3–52.
Meier, M., Souza, E. D., Francelino, M. R., Fernandes Filho, E. I., & Schaefer, C. E. G. R. (2018).
Digital soil mapping using machine learning algorithms in a tropical mountainous area. Revista
Brasileira de Ciência do Solo, 42. https://doi.org/10.1590/18069657rbcs20170421
244 G. S. Valladares et al.

Melo, A. R. & Lombardi Neto, F. (1999). Planejamento Agroambiental do Centro Avançado de


Pesquisa do Agronegócio de Frutas. Campinas: IAC/APTA. (CD-ROM).
Pereira, M. G., Silva, R. C., Pinheiro Junior, C. R., et al. (2022). Soil genesis on the soft slopes
of ancient coastal plains, southeastern Brazil. Catena, 210, 105894. https://doi.org/10.1016/
j.catena.2021.105894
R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for
Statistical Computing. https://www.R-project.org/
Sena, N. C., Veloso, G. V., Fernandes-Filho, E. I., et al. (2020). Analysis of terrain attributes
in different spatial resolutions for digital soil mapping application in southeastern Brazil.
Geoderma Regional, 21, e00268. https://doi.org/10.1016/j.geodrs.2020.e00268
Silveira, C. T., Oka-Fiori, C., Santos, L. J. C., et al. (2013). Soil prediction using artificial
neural networks and topographic attributes. Geoderma, 195, 165–172. https://doi.org/10.1016/
j.geoderma.2012.11.016
Tricart, J., & Kilian, J. (1979). L’´eco-Geografie et l’amenagement du Milieu Naturel. Maspero.
Valadares, J. M. A. S., Lepsch, I. F., & Küpper, A. (1971). Levantamento pedológico detalhado da
Estação Experimental de Jundiaí, SP. Bragantia, 30, 337–385.
Zinck, J. A. (2013). Geopedology, elements of geomorphology for soil and geohazard studies
(ITC Special Lecture Notes Series). ITC (Faculty of Geo-Information Science and Earth
Observation).
Zinck, J. A., Metternicht, G., Bocco, G., & Del Valle, H. F. (Eds.). (2015). Geopedology: An
integration of geomorphology and pedology for soil and landscape studies. Springer.
Prediction of Soil Carbon Stock in the
PIAUI State Coast by Remote Sensing

Mirya G. T. Portela, Gustavo S. Valladares, Marcos G. Pereira,


Léya J. R. S. Cabral, João V. A. Amorim, and Giovana M. de Espindola

1 Introduction

Soil organic carbon (SOC) is a dynamic property of this compartment and represents
the essential component of forest ecosystems considered potential carbon stores
(Kumar et al., 2016).
Coastal environments support biodiverse habitats of conservation interest and
provide other essential benefits, such as carbon sequestration, due to the high soil
carbon accumulation rates. This carbon, called blue carbon, plays an essential role
in climate change mitigation strategies (Nehren & Wicaksono, 2018) and presents
variability depending on the region and factors such as soil and existing vegetation
(Portela et al., 2020).
Some studies have been developed to present these evaluations of carbon in the
soil in coastal regions (Nehren & Wicaksono, 2018; Vilhena et al., 2018; Marchand,
2017; Bai et al., 2016; Barreto et al., 2016). However, what is observed is that
measurements of this attribute in soils in coastal and delta regions still need research,
and the existing research is mainly related to mangroves.

M. G. T. Portela (✉) · L. J. R. S. Cabral


Federal University of Piaui, Teresina, Piauí, Brazil
e-mail: [email protected]
G. S. Valladares
Department of Geography, Federal University of Piaui, Teresina, Piauí, Brazil
M. G. Pereira
Department of Soils, Federal Rural University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
J. V. A. Amorim
Bom Jesus Technical College- Federal University of Piaui, Bom Jesus, Piauí, Brazil
G. M. de Espindola
Transport Department, Federal University of Piaui, Teresina, Piauí, Brazil

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 245
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_18
246 M. G. T. Portela et al.

Among the main difficulties in soil studies is that SOC measurements need soil
sampling, which is expensive and time-consuming. Consequently, the number of
samples available in a given area is generally scarce and does not reflect the actual
level of variation that may be present in the study sites. Therefore, the precise
interpolation of carbon concentrations in unsampled locations is necessary for better
planning and management of these areas.
Some techniques, from simple linear models (Yang et al., 2015) to sophisticated
techniques (Somarathna et al., 2016) have been used to estimate the soil organic
carbon. Bonfatti et al. (2016) pointed out that multiple linear regression was more
substantial than other methods in predicting organic carbon in soils in southern
Brazil. Melo et al. (2016) reported that kriging was a satisfactory method for
predicting organic carbon in soils under different uses in the semiarid region of
Ceará, while Ceddia et al. (2017) indicated cokriging and regression kriging as the
most appropriate method for predicting carbon stocks up to 100 cm in soils in the
Central Amazon.
Studies developed by Mondal et al. (2017) reported that the regression kriging
method to predict soil carbon stocks in Central India showed satisfactory results.
This study aimed to determine the concentrations and carbon stocks in the
Parnaíba River Delta (PRD) soils, Piauí, employing digital soil mapping techniques.

2 Material and Methods

2.1 Characterization of the Study Area

The study area is located in the state of Piauí, in the western portion of the northeast
region of Brazil, and comprises part of the Parnaíba River Delta Environmental
Protection Area (APA), and a portion of the Parnaíba Delta Marine Extractive
Reserve (Resex). It occupies, more precisely, the region limited by the Igaraçu River
on the southeast, the Parnaíba River on the west, and the Atlantic Ocean, covering
the municipality of Ilha Grande and part of the municipality of Parnaíba, occupying
an area of approximately 282 km2 , with around 8 km2 of it on the Parnaíba Delta
Marine Extractive Reserve (Fig. 1).
The APA was created by the Brazilian government in 1996 to protect the deltas
of the Parnaíba, Timonha, and Ubatuba rivers, covering an area of approximately
3138 km2 . This APA is characterized by presenting a mosaic of ecosystems
intersected by bays and estuaries, also being a very dynamic fluvial-marine region
formed by the ecological tension between Caatinga, Cerrado, and marine systems
(Guzzi, 2012).
The Resex was also created by the Brazilian government through the Decree of
November 16th 2000, to guarantee the exploration and conservation of renewable
natural resources traditionally used by the extractive population of the area, covering
an area of approximately 270.21 km2 .
The study was carried out from December 2016 to February 2017 in five areas
with different vegetation types, classified according to Fernandes et al. (1996):
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 247

Fig. 1 Location of the study area, with the location of the sample points

Psammophile pioneer vegetation (PPV), Dune subevergreen vegetation (DSV),


Mangrove evergreen vegetation (MEV), Floodplain vegetation (FV), and vegetation
associated with Carnaubals (VC).

2.2 Soil Sampling

The points for carrying out the soil survey were previously defined, considering
areas of high representativeness regarding the structure of the vegetation, based on
the preliminary knowledge of the area, based on the photo interpretation also to
obtain more significant variability, and on the Normalized Difference Vegetation
Index (NDVI) and its variations. Through this index, it was possible to verify the
density of photosynthetically active vegetation and, in this way, highlight the sample
points in the study area.
Besides, considering that it is an area of native vegetation, the access routes are
challenging. Consequently, collection points with greater accessibility for the work
team were chosen. However, moving at least 100 m away from the edges of roads
and looking for the points preferably closer to those previously defined.
248 M. G. T. Portela et al.

Table 1 The number of profiles (N) and frequency of soil suborders in the visited sites
SiBCS1 Soil Taxonomy N Frequency (%)
ORGANOSSOLO TIOMÓRFICO Histosols 1 2.5
GLEISSOLO TIOMÓRFICO Entisol 7 17.5
GLEISSOLO HÁPLICO Udorthent 6 15.0
GLEISSOLO MELÂNICO Typic Fluvaquent 1 2.5
GLEISSOLO SÁLICO Aridisol 1 2.5
VERTISSOLO HÁPLICO Udert 2 5.0
CAMBISSOLO FLÚVICO Typic Dystrochrept 4 10.0
NEOSSOLO FLÚVICO Udifluvents 3 7,5
NEOSSOLO QUARTZARÊNICO Quartzipsamment 12 30.0
PLANOSSOLO NÁTRICO Udalf 1 2.5
ESPODOSSOLO HUMILÚVICO Humod 2 5.0
Total – 40 100
1 SiBCS, Brazilian Soil Classification System (Santos et al., 2018)

The soils were collected from 40 sample points of representative areas of the
different vegetation types, making a total of 242 samples of layers or horizons of
soil profiles up to 100 cm deep. Table 1 features the classification of the sampled
soils.

2.3 Determination of Total Soil Carbon Stocks

SOC contents were quantified using the wet oxidation method of organic matter,
which is based on the oxidation of organic carbon using dichromate ions in a sulfuric
medium (Teixeira et al., 2017). To determine the bulk density (Bd), the volumetric
ring method was proposed by the Soil Analysis Methods Manual (Donagemma et
al., 2011).
After determining the soil density, carbon stocks were calculated following
Batjes (2000), Eq. 1:


n
.CS = OC i x BD i x Ti (1)
i=1

Where:
CS is the OC stock (kg m−2 ) at the desired layer (0–30 or 0–100 cm);
OCi is the OC content (g kg−1 ) at horizon I;
BDi is the bulk density (Mg.m−3 ) at horizon I;
Ti is the thickness (m) of the portion of horizon i that lies within the desired layer,
and;
n is the number of horizons that have a portion within the desired layer.
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 249

2.4 Remote Sensing Covariate Data

The spectral variables used to estimate the stocks CS30 and CS100, were obtained
from images of the OLI sensor (Operational Land Imager), Landsat 8, orbit/point
219/062. The sensor product has a spatial resolution of 30 meters, a radiometric
resolution of 16 bits, and a temporal resolution of 16 days. The images obtained
from the OLI instrument consist of nine multispectral bands, but in the study, only
six bands were used (band 2 to band 7), the ones in the range of the visible and
infrared spectrums.
The image was collected on June 21st, 2016, from the United States Geological
Survey (USGS), with cloud coverage of 5.7%, solar elevation angle of 52.20, and
azimuth angle of 44.36◦ .
From the bands, eight indexes were generated: RVI (Pearson & Miller, 1972),
NDVI (Rouse et al., 1974), SAVI (Huete, 1988), EVI (Huete et al., 1997), NDWI
(Gao, 1996), GNDVI (Gitelson et al., 1996), MNDWI (Xu, 2006), CTVI (Perry &
Lautenschlager, 1984).
All remote sensing covariate rasters were assembled in a Geographic Information
System (GIS) and their values extracted to the field soil data in ArcGis, deriving the
database used to build the forecasting models.

2.5 Predictive Methods Evaluated

Three methods were used to predict the CS100, including multiple linear regression
(MLR), ordinary kriging (OK) and regression kriging (RK).
The MLR consists of determining the adjusted equations considering the soil
variables as a dependent variable and all vegetation bands and indexes as the
independent variables. The Stepwise backward algorithm was used to choose the
most significant independent variables in the regression. The variable selection
model (p < 0.05) was calculated in XLSTAT, an extension of Microsoft Excel, for
the CS100.
The OK is a univariate method that uses the primary variable (SOC or CS)
measured at sample points to predict them in non-sampled locations. In the process,
the mean is taken as a constant, but unknow, value and its stationery is assumed only
within a local neighborhood centered on the forecast location.
The RK is a hybrid geostatistical method, as it encompasses two approaches:
first, it uses regression to predict a variable, and then it uses simple kriging to
interpolate the residuals of the regression model (Hengl et al., 2004), and by
difference improve the estimates. The RK used in this study was the type C,
which involves an ordinary regression model followed by kriging the values of the
regression residuals (Odeh et al., 1995).
The models of organic carbon concentration and carbon stocks were validated
with 20% of the data, using three statistical parameters: RMSE, MAE, and R2 . R2
250 M. G. T. Portela et al.

represents the coefficient of determination of linear regression, considering observed


and predicted values. MAE corresponds to the mean absolute error and the RMSE
to the mean squared error, obtained by Eqs. 2 and 3:

1⎲
n
EAM =
. (Oi − Pi ) (2)
n
i=1


| n
|1 ⎲
.RMSE = ⏌ (Oi − Pi )2 (3)
n
i=1

Where, Oi and Pi are the observed and predicted values, respectively.

3 Results

The SOC under vegetation ranged from 0.03 to 92.76 g kg−1 of soil, with the
highest average levels observed in soils under mangroove evergreen vegetation and
the lowest average levels associated with soils under dune subevergreen vegetation.
Among the vegetations, the levels of organic carbon in MEV ranged from 18.02
to 92.76 g kg−1 , in FV ranged from 1.16 to 66.73 g kg−1 , in VC ranged from 0.25
to 46.59 g kg−1 , in DSV ranged from 0.03 to 3.67 g kg− 1, and in PPV ranged from
0.13 to 6.70 g kg−1 .
To CS100, analysis of variance indicated significant differences for vegetation
cover (p < 0.01). The CS100 values, regardless of vegetation, ranged from 5.83
to 466.63 Mg ha−1 . It was observed that the highest average value of carbon
stock in the soil is associated with the mangrove evergreen vegetation, and the
lowest averages are associated with psammophile pioneer vegetation and dune
subevergreen vegetation (p < 0.01) (Fig. 2), as well as observed for carbon content.
The CS100 values under mangrove evergreen vegetation ranged from 232.48
to 457.3 Mg ha−1 . In soils under floodplain vegetation, stocks ranged from 59.6
to 466.63 Mg ha−1 , under vegetation associated with Carnaubals they went from
31.74 to 220.97 Mg ha−1 , under dune subevergreen vegetation ranged from 5.94
to 19.66 Mg ha−1 , and under psammophile pioneer vegetation ranged from 5.83 to
43.18 Mg ha−1 .
The CS100 prediction, and the models developed for these properties are
highlighted in Table 2.
In the model for predicting of CS100, the bands 3, 6 and 7 showed a negative
correlation with this property of 44.7%, 54.4%, and 47.6%, respectively, according
to the models covariates. Regarding the prediction of CS100 using the OK, the
results of the spatial dependence analysis of soil variables are shown in Table 3.
There was a high spatial autocorrelation of CS100 high values. As for OK of the
residues to the CS100 was found moderate spatial dependence. The RK requires
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 251

Fig. 2 Average soil carbon stocks, in the different vegetation types in the Parnaíba River Delta, at
0–100 cm depths. MEV Magrove evergreen vegetation, FV Floodplain vegetation, VC Carnaubal
vegetation, PPV Pioneer psamophilic vegetation, DSV Dune subvergreen vegetation. Means of the
same letter do not differ statistically. CV%0-100cm = 18.10%

Table 2 Linear regression models using remote sensing variables


Atributes Covariables Regression models R2
CS100 All variables CS100 (Mg ha−1 ) = 466.12–0.25*Band3– 0.53
0.32*Band6 + 0.42*Band7
CS100 Carbon stock at 0–100 cm. The bands 3, 5, 6 and 7 were derived from Landsat 8, OLI
sensor

Table 3 Variograms adjusted for soil parameters.


Variable C0 C C1 Range (m) .
C
C0+ C ∗ 100 R2
Original variables
CS100a 4.230 23,490 27,720 9300 84.74 0.68
Regression residuals
CS100a 4,250,000.000 5,764,000,000 10,014,000,000 8620 57.56 0.62
C0 nugget variance, C1 Sill variance, C Contribution, CS100 Carbon stock at 0–100 cm; a Gaussian
Model

that the OK to interpolate the regression residues. For this, the regression residues
must present spatial autocorrelation. In this case, all regression models showed
autocorrelated residues, which can be estimated by the RK (type C).
The descriptive statistics of the training data, validation, and estimates by the
three methods are shown in Table 4.
For CS100, the average validation data were lower than the training data.
Observing the average values for CS100 predicted by the studied methods, it
252 M. G. T. Portela et al.

Table 4 Descriptive statistics of training, validation, and estimation of the CS100 by Multiple
linear regression (MLR), Ordinary kriging (OK), and Regression kriging (RK)
Training data Validation data MLR OK RK
CS100 Mean 112.92 103.57 117.22 111.51 122.82
Min. 5.83 5.94 −16.79 14.97 5.39
Max 466.63 466.63 294.01 384.37 393.12
CS100 Carbon stock at 0–100 cm depth

Table 5 Validation of multiple linear regression, ordinary kriging, and regression kriging for
predicting soil organic carbon (g kg−1 ), and carbon stock of 0–100 cm (Mg ha−1 )
Multiple Linear Regression Ordinary Kriging Regression Kriging
Variables MAE RMSE R2 MAE RMSE R2 MAE RMSE R2
CS100 11.11 152.16 0.53 −15.11 49.33 0.67 2.15 38.35 0.95
CS100 Carbon stock at 0–100 cm, R2 Determination coefficient, RMSE Root Mean square error,
MAE Mean absolute error

is indicated that the MLR and the RK overestimate predictions, while the OK
underestimates them.
The validation for the MLR, OK, and RK models is shown in Table 5.
Comparing the validation of the estimated data by the three predictive methods,
it was observed that the highest values for the coefficient of determination and the
lowest values for RMSE and MAE were reported by the prediction of the variables
performed by the RK. For organic carbon prediction, the OK presented the lowest
R2 , but the lowest RMSE when compared to MLR.
The final maps with the CS100 predicted are shown in Fig. 3. Comparing the
validation results, it is highlighted that the mapping of CS100 developed by the RK
method was better than that performed by other methods.

4 Discussion

The observed SOC values showed its most significant expression or highest values
in the superficial layers and, in general, a decreasing trend in the depth profile,
a pattern also observed by Bai et al. (2016) in soil profiles of the Yellow River
Delta, China. The same authors verified some profiles with higher values of organic
carbon in the 40–60 cm layer, similar to that observed in some profiles of this
study, highlighting those under mangrove evergreen vegetation, that presents the
high biomass vegetation in the study area (Portela et al., 2020).
Similar results were also highlighted by Barreto et al. (2016), in soils under
mangroves on the Caribbean coast. This pattern may be associated with higher
clay content in the deeper layers, considering these materials are active particles
in sequestering carbon (Barreto et al., 2016). Another reason is the processes of
translocation of soluble compounds from organic matter that can also influence
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 253

Fig. 3 Prediction Maps of 0-100 cm carbon stocks derived from the best prediction methods: (a)
Ordinary Kriging; (b) Multiple Linear Regression (c) Regression kriging

these carbon values in the deeper layers. According to Wang et al. (2013), carbon
values at a more significant depth profile can be explained for two reasons: by the
distribution of roots with intense biological activity in deeper layers and by the
downward migration of organic carbon that can occur with the help of the driving
force of water from precipitation or other sources.
In this sense, the bioturbation process also stands out, associated with crabs
that live above and below the sediment surface. These bioturbators alter the
redistribution of carbon dioxide and dissolved carbon by mixing sediments during
excavation, increasing the availability of various compounds for more rooted
microbial communities, and the levels of organic carbon (Coverdale et al., 2014;
Macreadie et al., 2017).
Likewise, when this property is compared with the soils and the vegetation
types, it appears that the highest levels of organic carbon reported in this study
are associated with the levels of clay and silt observed in some soil profiles under
mangrove evergreen vegetation (up to 30% clay and 69% silt on average), floodplain
vegetation (up to 35% clay and 66% silt on average), and vegetation associated with
carnaubals (up to 35% clay and 44% silt on average).
Bai et al. (2016), working on soils in coastal areas, highlighted that the water
content, pH, and clay and silt contents are properties that explain the SOC variations
in up to 80% at depths up to 20 cm. According to the authors, this partner is
254 M. G. T. Portela et al.

Fig. 3 (continued)

observed because the increase in these fractions stimulates the growth of plants,
water retention, and increases soil levels and carbon stocks. In the case of pH,
it influences the limitation of the binding capacity of clay compounds, leading to
decreased adsorption of organic matter in the soil (Morrissey et al., 2014).
In this sense, the soils under psammophile pioneer vegetation and dune subev-
ergreen vegetation stand out, which despite having similar granulometric charac-
teristics, for presenting sandy texture, reported different levels of organic carbon,
demonstrating that the herbaceous vegetation favored a higher concentration of
organic carbon.
This result may be a consequence of the high incorporation of organic matter on
the surface, considering that the primary source of carbon in soils under herbaceous
plants comes from the roots, which accumulate in the superficial layers of the soil,
concentrate more biomass and stabilize the soil organic matter (Cerri et al., 1991;
Schmidt et al., 2011).
Highlighting the the CS100 related to the mangrove evergreen vegetation,
such values are higher than the carbon stock interval presented by Nehren and
Wicaksono (2018), who, studying mangrove soils in Indonesia, observed CS100
under mangroves ranging from 3.3 to 366.7 Mg ha−1 .
Evaluating the CS100 in areas of mangroves and sandbanks in the Brazilian
Amazon, Kauffman et al. (2018a), found values of around 341 Mg ha−1 , this
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 255

Fig. 3 (continued)

being close to the quantified value for CS100, in the mangroves of this study, of
309.54 Mg ha−1 .
The average observed in this study is higher but does not exceed, the averages
observed by Kauffman et al. (2018b), for CS100 of mangroves in Ceará, Brazilian
semiarid, of 413 Mg ha−1 , and by Kauffman and Bhomia (2017) for Gabon, of
392 Mg ha−1 and Liberia, of 901 Mg ha−1 . On the other hand, this average exceeds
those observed by Sasmito et al. (2020) in Papua, of 179 Mg ha−1 Marchand (2017)
in French Guiana, of 107 Mg ha−1 , and by Sahu et al. (2016) in mangroves in India,
of 54.3 Mg ha−1 . It should also be noted that the mangrove soils of the Parnaíba
River Delta showed the CS100 higher than those of organosolos (histosols) in Brazil
(Valladares et al., 2016), which had stocks of 203.59 Mg ha−1 .
The texture can influence the accumulation of organic matter, as it reduces
microbial decomposition, stabilizing the soil carbon and decreasing leaching,
favoring the accumulation. According to Mondal et al. (2017), clayey soils generally
have higher levels of organic matter and consequently highgenerally have higher
levels of organic matter and, therefore, high carbon stock values.. This pattern
gradually reduces in clay-sandy, loam-sandy, and sandy soils. Thus, according to
the author, areas with sandy texture soils have less carbon stock.
A study by Hontoria et al. (1999), in soils in Spain, reported an absence of
correlation between organic carbon content and soil texture. In Brazil, there was
no correlation between organic carbon content and soil granulometry in the order of
256 M. G. T. Portela et al.

organosolos (histosols) (Valladares, 2003), indicating that hydromorphic soils may


present a different pattern.
Therefore, the low carbon supply in soils under dune subevergreen vegetation
and psammophile pioneer vegetation, which are characterized as sandy soils (98%
and 97% on average, respectively) is explained. Organic matter is more susceptible
to decomposition in soils with a thicker texture, a condition that favors sequestration
and subsequent carbon accumulation compared to finer soils.
Also, another factor that influences the reduced carbon supply is the absence of
soil anoxia, which would facilitate the accumulation of carbon in the soil (Kauffman
et al., 2018a; Nobrega et al., 2016). Anoxia is a condition established in micro-
reducing environments with low oxygen circulation, in which it contributes to the
preservation of organic matter in the soil, since it inhibits its oxidation or alterations
by aerobic organisms regardless of its chemical structure (Schmidt et al., 2011;
Barreto et al., 2016), a condition absent in sandy soils.
Adame et al. (2013) also emphasize that high carbon stocks in humid areas can
be associated with the type of vegetation present, whereas places with high and
vigorous forests can accumulate more substantial amounts of carbon.
However, this issue is not considered a rule, primarily concerning mangrove
vegetation. A recovering mangrove, for example, can grow and become tall and
vigorous and still not have high carbon stock, as this accumulation takes much
longer to accumulate in the soil, compared to the time the vegetation takes to grow.
Likewise, in soils under dune subevergreen vegetation, the predominant species
is Anacardium occidentale, a species represented by medium to large-sized individ-
uals, in whose roots could exert some influence on carbon storage. This hypothesis
is not confirmed in this work, so that, if so, the carbon stocks in the soils under this
vegetation would be high, which did not happen.
It is also noteworthy that the density of the soil influences the carbon levels and
stocks in the soil. The density values in this study ranged from 0.38 to 1.60 Mg m−3 .
This variation amplitude directly affects carbon levels and stocks (Valladares et
al., 2016), and is related to the different sources of organic matter and soil cover
conditions. Studies report that soils with lower densities values have higher SOC
and CS values (Cipriano-Silva et al., 2014; Barreto et al., 2016; Beutler et al., 2017),
observing the opposite effect in soils with higher density records.
The patterns observed in this study corroborates Valladares et al. (2016) when
considering that the variations that occur in the levels and carbon stocks are directly
related to the types of vegetation present, relief, drainage, and mineral substrate,
besides being controlled by the vegetation cover and the changes in weather
conditions, water, and humidity.
This study offers a perspective for using covariables derived from remote sensing
to predict soil attributes, using geostatistical methods in areas of difficult access on
the Piauí coast.
Considering the models developed by multiple linear regression, the covariables
derived from the spectral images that were selected, explained the variance of the
soil attributes in a moderate proportion (between 50 and 65%). Meaning that these
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 257

models have proximity when compared to data collected in the field, being entirely
satisfactory.
In the predictions of CS100, the negative correlations indicate that low
reflectance values in these bands are related to high CS100. This result is associated
with the spectral pattern of the soil, where the reflectance curve reaches peaks
of absorbance of electromagnetic energy at points with high levels of carbon, as
indicated by the studies by Carneiro et al. (2019). Yang et al. (2015), observed
negative correlations between SOC in soils in the alpine region and the spectral
band with a wavelength between 630 and 690 nm. Some studies indicate that at
wavelengths in the range between 500 and 700 nm, the lower the reflectance, the
higher the carbon content, as there is a greater absorption of energy incident on soil
(Bellinaso et al., 2010; Carneiro et al., 2019).
Also noteworthy are the conditions of the region under study, characterized by
areas with significant influence of water bodies. Barsi et al. (2014), state that band
5, for example, responds well to the identification of coastal lines. The same authors
also suggest that band 6 can potentially discriminates soil moisture content, and
band 7 to identify better moisture content in soil and vegetation.
Likewise, the excellent correlation with the MNDWI is also linked to the
conditions of the study site, which even has some flooded areas. Besides, this index
improves the efficiency in identifying ofidentifying water bodies and reducing noise
produced by areas with dense vegetation or exposed soil.
In comparison with the other predictive methods, MLR generally had lower
performance than OK and RK. The covariables used in the model developed by
OK explained the variance in a proportion above 50%. However, when comparing
statistical errors for the prediction of CS100, RK presented the best results.
Similar results were obtained by Bhunia et al. (2018), who tested methods
of SOC prediction in soils in West Bengal, and got the slightest errors in the
prediction by OK. According to the authors, OK generally offers better interpolation
to estimate values in non-sampled locations.
In the predictions of carbon stocks, regression kriging was the most appropriate
method, based on the lowest values of statistical errors and the highest coefficients
of determination.
The RK method has given satisfactory results in some studies (Ceddia et al.,
2017; Mondal et al., 2017) due to selecting a higher number of predictors for the
predictive models. One of the possible explanations for the RK method having
presented a better performance toconcerning OK is due to the additional capacityof
a more significant number of auxiliary information (Piccini et al., 2014; Mondal et
al., 2017).
It is worth mentioning that the errors obtained may be related to some reasons,
among them, the limited number of training observations or the spatial configuration
of the observations. However, in this research, with the data collected, it was
possible to promote prediction to choose RK as the best method.
258 M. G. T. Portela et al.

5 Conclusions

Soils under mangrove evergreen vegetation have higher SOC compared to other
vegetations, as well as carbon stocks. In this environment in sandy soils, the lowest
SOC and CS values were verified. Regardless of vegetation, carbon concentrations
have decreased in depth in all soils.
Among the prediction methods, RK was the most suitable for predicting CS, and
the independent variables that presented the best responses were band 3, band 6, and
band 7 (for CS100).
The models used were satisfactory in the digital mapping of CS in the soils of
the Parnaíba River Delta.

References

Adame, M. F., Kauffman, J. B., Medina, I., Gamboa, J. N., Torres, O., Caamal, J. P., Reza,
M., & Herrera-Silveira, J. A. (2013). Carbon stocks of tropical coastal wetlands within
the karstic landscape of the Mexican Caribbean. PLoS One, 8(2). https://doi.org/10.1371/
journal.pone.0056569
Bai, J., Zhang, G., Zhag, Q., Lu, Q., Jia, J., Cui, B., & Liu, X. (2016). Depht-distribuition patterns
and control of soil organic carbon in coastal salt marshes with different plant covers. Scientific
Reports, 6(34835). https://doi.org/10.1038/srep34835
Barreto, M. B., Mónaco, S. L., Díaz, R., Barreto-Pittol, E., López, L., & Peralba, M.
C. R. (2016). Soil organic carbon of mangrove forests (Rhizophora and Avicennia)
of the Venezuelan Caribbean coast. Organic Geochemistry, 100. https://doi.org/10.1016/
j.orggeochem.2016.08.002
Barsi, J. A., Lee, K., Kvaran, G., Markham, B. L., & Pedelty, J. A. (2014). The spectral response of
the Landsat-8 operational land imager. Remote Sensing, 6(10), 10232–10251. https://doi.org/
10.3390/rs61010232
Batjes, N. H. (2000). Effects of mapped variation in soil conditions on estimates of soil carbon
and nitrogen stocks for South America. Geoderma, 97(1–2), 135–144. https://doi.org/10.1016/
S0016-7061(00)00031-8
Bellinaso, H., Demattê, J. A. M., & Romeiro, S. A. (2010). Soil spectral library and its use in soil
classification. Revista Brasileira de Ciência do Solo, 34(3), 861–870. https://doi.org/10.1590/
S0100-06832010000300027
Beutler, S. J., Pereira, M. G., Tassinari, W. D. S., Menezes, M. D. D., Valladares, G. S., &
Anjos, L. H. C. D. (2017). Bulk density prediction for Histosols and soil horizons with high
organic matter content. Revista Brasileira de Ciência do Solo, 41. https://doi.org/10.1590/
18069657rbcs20160158
Bhunia, G. S., Shit, P. K., & Maiti, R. (2018). Comparison of GIS-based interpolation methods for
spatial distribution of soil organic carbon (SOC). Journal of the Saudi Society of Agricultural
Sciences, 17(2), 114–126. https://doi.org/10.1016/j.jssas.2016.02.001
Bonfatti, B. R., Hartemink, A. E., Giasson, E., Tornquist, C. G., & Adhikari, K. (2016). Digital
mapping of soil carbon in a viticultural region of Southern Brazil. Geoderma, 261, 204–221.
https://doi.org/10.1016/j.geoderma.2015.07.016
Carneiro, A. S. R., Jesus, T. B., Santos, E. P., & Santos, R. L. (2019). Utilização da espectrorra-
diometria na caracterização do teor de matéria orgânica presente no solo. Revista Eletrônica de
Gestão e Tecnologias Ambientais, 7(1), 86–95. https://doi.org/10.9771/gesta.v7i7.28086
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 259

Ceddia, M. B., Gomes, A. S., Vasques, M. G., & Pinheiro, E. F. M. (2017). Soil carbon stock
and particle size fractions in the central amazon predicted from remotely sensed relief. Remote
Sensing, 9(2), 9–19. https://doi.org/10.3390/rs9020124
Cerri, C. C., Volkoff, B., & Andreaux, F. (1991). Nature and behaviour of organic matter in
soils under natural forest, and after deforestation, burning and cultivation, near Manaus. Forest
Ecology and Management, 38(3–4), 247–257. https://doi.org/10.1016/0378-1127(91)90146-M
Cipriano-Silva, R., Valladares, G. S., Prereira, M. G., & Anjos, L. H. C. (2014). Caracterização de
Organossolos em ambientes de várzea do nordeste do Brasil. Revista Brasileira de Ciência do
Solo, 38, 26–38. https://doi.org/10.1590/S0100-06832014000100003
Coverdale, T. C., Brisson, C. P., Young, E. W., Yin, S. F., Donnelly, J. P., & Bertness, M. D. (2014).
Indirect human impacts reverse centuries of carbon sequestration and salt marsh accretion.
PLoS One, 9(3), 1–7. https://doi.org/10.1371/journal.pone.0093296
Donagemma, G. K., de Campos, D. V. B., Calderano, S. B., Teixeira, W. G., & Viana, J. H. M.
(Org.)., (2011). Manual de métodos de análise de solos. Embrapa Solos.
Fernandes, A. G., Lopes, A. S., Silva, E. V., Conceição, G. M., & Araújo, M. F. V. (1996). IV—
Componentes biológicos: Vegetação. In CEPRO, Macrozoneamento Costeiro do Estado do
Piauí: relatório geo-ambiental e sócio-econômico (pp. 43–72). Fundação CEPRO.
Gao, B. (1996). A normalized difference water index for remote sensing of vegetation liquid water
from space. Remote Sensing of Environment, 58(3), 257–266. https://doi.org/10.1016/S0034-
4257(96)00067-3
Gitelson, A. A., Kaufman, Y. J., & Merzlyak, M. N. (1996). Use of a green channel in remote
sensing of global vegetation from EOS-MODIS. Remote Sensing of Environment, 58(3), 289–
298. https://doi.org/10.1016/S0034-4257(96)00072-7
Guzzi, A. (2012). Biodiversidade do Delta do Parnaíba: litoral piauiense. EDUFPI,466p.
Hengl, T., Heuvelink, G. B. M., & Stein, A. (2004). A generic framework for spatial prediction
of soil variables based on regression-kriging. Geoderma, 120(1–2), 75–93. https://doi.org/
10.1016/j.geoderma.2003.08.018
Hontoria, C., Saa, A., & Rodriguez-Murillo, J. C. (1999). Relationships Between Soil Organic
Carbon and Site Characteristics in Peninsular Spain. Soil Science Society of America Journal,
63(3). https://doi.org/10.2136/sssaj1999.03615995006300030026x
Huete, A. R. (1988). A soil adjusted vegetation index (SAVI). Remote Sensing of Environment,
25(3), 295–309. https://doi.org/10.1016/0034-4257(88)90106-X
Huete, A. R., Liu, H. Q., Batchily, K., & Leeuwen, W. (1997). A comparison of vegetation indices
over a global set of TM images for EOS-MODIS. Remote Sensing of Environment, 59(3), 440–
451. https://doi.org/10.1016/S0034-4257(96)00112-5
Kauffman, J. B., & Bhomia, R. K. (2017). Ecosystem carbon gradients in West-Central Africa:
Global and regional comparisons. PLoS ONE 7, 12, 11, e0187749. http://dx.doi.org/10.1371/
journal.pone.0187749
Kauffman, J. B., Bernardino, A. F., Ferreira, T. O., Giovannoni, L. R., O. de Gomes, L. E., Romero,
D. J., Jimenez, L. C. Z., & Ruiz, F. (2018a). Carbon stocks of mangroves and salt marshes of
the Amazon region, Brazil. Biology Letters, 14(9). https://doi.org/10.1098/rsbl.2018.0208
Kauffman, J. B., Bernardino, A. F., Ferreira, T. O., Bolton, N. W., Gomes, L. E. O., & Nobrega,
G. N. (2018b). Shrimp ponds lead to massive loss of soil carbon and greenhouse gas emissions
in Northeastern Brazilian mangroves. Ecology and Evolution, 8(1–11). https://doi.org/10.1002/
ece3.4079
Kumar, P., Pandey, P. C., Singh, B. K., Katiyar, S., Mandal, V. P., Rani, M., Tomar, V., &
Patairiya, S. (2016). Estimation of accumulated soil organic carbon stock in tropical forest
using geospatial strategy. The Egyptian Journal of Remote Sensing and Space Science., 19(1),
109–123. https://doi.org/10.1016/j.ejrs.2015.12.003
Macreadie, P. I., Nielsen, D. A., Kelleway, J. J., Atwood, T. B., Seymour, J. R., Petrou, K.,
Connolly, R. M., Thomson, A. C. G., Trevathan-Tackett, S. M., & Ralph, P. J. (2017). Can
we manage coastal ecosystems to sequester more blue carbon? Frontiers in Ecology and the
Environment, 15(4), 206–213. https://doi.org/10.1002/fee.1484
260 M. G. T. Portela et al.

Marchand, C. (2017). Soil carbon stocks and burial rates along a mangrove forest chronosequence
(French Guiana). Forest Ecology and Management, 384, 92–99. https://doi.org/10.1016/
j.foreco.2016.10.030
Melo, A. A. B., Valladares, G. S., Ceddia, M. B., Pereira, M. G., & Soares, I. (2016). Spatial dis-
tribution of organic carbon and humic substances in irrigated soils under different management
systems in a semiarid zone in Ceará, Brazil. Semina: Ciências Agrárias, 37(4), 1845–1856.
https://doi.org/10.5433/1679-0359.2016v37n4p1845
Mondal, A., Khare, D., Kundu, S., Mondal, S., Mukherjee, S., & Mukhopadhyay, A. (2017).
Spatial soil organic carbon (SOC) prediction by regression kriging using remote sensing data.
The Egyptian Journal of Remote Sensing and Space Sciences., 20(1), 61–70. https://doi.org/
10.1016/j.ejrs.2016.06.004
Morrissey, E. M., Gillespie, J. L., Morina, J. C., & Franklin, R. B. (2014). Salinity affects microbial
activity and soil organic matter content in tidal wetlands. Global Change Biology, 20(4), 1351–
1362. https://doi.org/10.1111/gcb.12431
Nehren, U., & Wicaksono, P. (2018). Mapping soil carbon stocks in an oceanic mangrove
ecosystem in Karimunjawa Islands, Indonesia. Estuarine, Coastal and Shelf Science, 214.
https://doi.org/10.1016/j.ecss.2018.09.022
Nóbrega, G. N., Ferreira, T. O., Siqueira Neto, M., Queiroz, H. M., Artur, A. G., Mendonça, E.
D. S., Silva, E. D., & Otero, X. L. (2016). Edaphic factors controlling summer (rainy season)
greenhouse gas emissions (CO2 and CH4) from semiarid mangrove soils (NE-Brazil). Science
of the Total Environment, 542, 685–693. https://doi.org/10.1016/j.scitotenv.2015.10.108
Odeh, I. O. A., Mcbratney, A. B., & Chittleborough, D. J. (1995). Further results on prediction of
soil properties fromterrain attributes: Heterotpic cokriging and regression-kriging. Geoderma,
67(3–4). https://doi.org/10.1016/0016-7061(95)00007-B
Pearson, R. L. & Miller, L. D. (1972). Remote mapping of standing crop biomass for estimation of
the productivity of shortgrass prairie, Pawnee National Grasslands, Colorado. In Proceedings
of the 8th International Symposium on Remote Sensing of the Environment, Ann Arbor, MI, p. 2.
Perry, C. R., & Lautenschlager, L. F. (1984). Functional equivalence of spectral vegetation
indices. Remote Sensing of Environnment, 14(1–3), 169–182. https://doi.org/10.1016/0034-
4257(84)90013-0
Piccini, C., Marchetti, A., & Francaviglia, R. (2014). Estimation of soil organic matter by geo-
statistical methods: use of auxiliary information in agricultural and environmental assessment.
Ecological Indicators, 36, 301–314. https://doi.org/10.1016/j.ecolind.2013.08.009
Portela, M. G. T., De Espindola, G. M., Valladares, G. S., Amorim, J. V. A., & Frota, J. C. O.
(2020). Vegetation biomass and carbon stocks in the Parnaíba River Delta, NE Brazil. Wetlands
Ecology and Management, 1–16. https://doi.org/10.1007/s11273-020-09735-y
Rouse, J.W., Haas, R. H., Deering, D. W., Schell, J. A., & Harlan, J. C. (1974). Monitoring the
vernal advancement of retrogradation (greenwave effect) of natural vegetation. NASA/GSFC,
Type III, Final Report, Gree, Belt, MD, 371p.
Sahu, S. C., Kumar, M., & Ravindranath, N. H. (2016). Carbon stocks in natural and planted
mangrove forests of Mahanadi Mangrove Wetland, East Coast of India. Current Science,
100(12), 2253–2260. https://doi.org/10.18520/cs/v110/i12/2334-2341
Santos, H. G., Jacomine, P. K. T., Anjos, L. H. C., Oliveira, V. A., Lumbreras, J. F., Coelho, M. R.,
Almeida, J. A., Araújo Filho, J. C., Oliveira, J. B., & Cunha, T. J. F. (2018). Sistema Brasileiro
de Classificação de Solos (5 Ed. Rev. e Ampl ed., p. 590p). Embrapa.
Sasmito, D. S., Kuzyakov, L., Lubis, A. A., Murdiyarso, D., Hutley, L. B., Bachri, S., Friess,
D. A., Martius, C., & Borchard, N. (2020). Organic carbon burial and sources in soils of
coastal mudflat and mangrove ecosystems. Catena, 187(1–11), 2020. https://doi.org/10.1016/
j.catena.2019.104414
Schmidt, M. W., Torn, M. S., Abiven, S., Dittmar, T., Guggenberger, G., Janssens, I. A., &
Nannipieri, P. (2011). Persistence of soil organic matter as an ecosystem property. Nature, 478,
49–56. https://doi.org/10.1038/nature10386
Prediction of Soil Carbon Stock in the PIAUI State Coast by Remote Sensing 261

Somarathna, P. D. S. N., Malone, B. P., & Minasny, B. (2016). Mapping soil organic carbon content
over New South Wales, Australia using local regression kriging. Geoderma Regional, 7(1), 38–
48. https://doi.org/10.1016/j.geodrs.2015.12.002
Teixeira, P. C., Donagemma, G. K., Fontana, A., & Teixeira, W. G. (2017). Manual de métodos de
análise de solo (3. ed. rev. e ampl. ed.). Embrapa. 575p.
Valladares, G. S. (2003). Caracterização de organossolos, auxílio à sua classificação 2003. 129f.
Tese (Doutorado em Ciências)—Instituto de Agronomia da UFRRJ, Seropédica.
Valladares, G. S., Pereira, M. G., Benites, V. M., Anjos, L. H. C., Ebeling, A. G., & Guareschi, R.
F. (2016). Carbon and Nitrogen Stocks and Humic Fractions in Brazilian Organosols. Revista
Brasileira de Ciência do Solo., 40, 1–16. https://doi.org/10.1590/18069657rbcs20151317
Vilhena, M. P. S. P., Costa, M. L., Berredo, J. F., Paiva, R. S., & Moreira, M. Z. (2018). The
sources and accumulation of sedimentary organic matter in two estuaries in the Brazilian
Northern coast. Regional Studies in Marine Science, 18, 188–196. https://doi.org/10.1016/
j.rsma.2017.10.007
Wang, G., Guan, D., Peart, M. R., Chen, Y., & Peng, Y. (2013). Ecosystem carbon stocks of
mangrove forest in Yingluo Bay, Guangdong Province of South China. Forest Ecology and
Management, 310, 539–546. https://doi.org/10.1016/j.foreco.2013.08.045
Xu, H. (2006). Modification of normalised difference water index (NDWI) to enhance open water
features in remotely sensed imagery. International Journal of Remote Sensing, 27(14), 2025–
2033. https://doi.org/10.1080/01431160600589179
Yang, R., Rossiter, D. G., Liu, F., Lu, Y., Yang, F., Yang, F., Zhao, Y., Decheng, L., & Zhang,
G. (2015). Predictive mapping of topsoil organic carbon in an alpine environment aided by
Landsat TM. PLoS One, 10(10), 1–20. https://doi.org/10.1371/journal.pone.0139042
Methods and Challenges in Digital Soil
Mapping: Applied Modelling with R
Examples

Elpídio Inácio Fernandes-Filho, Cássio Marques Moquedace,


Luís Flávio Pereira, Gustavo Vieira Veloso, and Waldir de Carvalho Junior

1 Introduction

The way we make science has undergone transformations since the late twentieth
century, leading to a new era in data analysis and modeling. These transformations
have been boosted by the development and dissemination of new computational
techniques, such as machine learning, open-source programming languages (R
and Python, for instance), cloud computing platforms, geoprocessing, and remote
sensing software. In pedology, digital soil mapping (DSM) has been recognized as
part of a new subdiscipline of soil science: pedometrics (McBratney et al., 2003).
Machine learning algorithms are often used in DSM because they do not usually
require the assumptions of classical statistics, which aids in the use of soil data
without transformations that could impact model performance and interpretability.
Furthermore, the diversity and flexibility of machine learning models provide a
useful tool for the recognition of nonlinear, hierarchical, and complex patterns that
are very common in Earth systems.
Modeling the spatial distribution of soil classes or attributes is difficult for
several reasons. First, the majority of spatial covariates used to model subsurface
soil properties and classes are picturing traits above or at the ground surface. For
example, in optical remote sensed data, sensors can only obtain the spectral response
of land cover. When these types of data are used, machine-learning algorithms
can only find indirect patterns, which leads to models with more uncertainty and
errors. Another example is the use of lithology as a proxy for the parent material.

E. I. Fernandes-Filho · C. M. Moquedace · L. F. Pereira (✉) · G. V. Veloso


Department of Soils, Federal University of Viçosa, Viçosa, MG, Brazil
e-mail: [email protected]; [email protected]; [email protected]
W. de Carvalho Junior
Embrapa Soils, Brazilian Agricultural Research Corporation, Rio de Janeiro, RJ, Brazil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 263
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2_19
264 E. I. Fernandes-Filho et al.

This can lead to incorrect predictions in places where soils are not formed from
materials in situ (allochthonous soils). Moreover, it is important to highlight the
difficulty of including a time factor in these models. Soils are usually the products of
millennia of years, and different dynamic formation factors modify their properties.
However, we used covariates that represent only the current soil formation factors,
which cannot represent the complete evolutionary history of many soils. All of these
problems introduce noise into the models, which is reflected in the generally modest
prediction performances and high uncertainties observed in digital soil mapping
studies (Gomes et al., 2019; Ballabio et al., 2016; Demattê et al., 2016).
We aimed to systematically describe the steps to be followed in a digital soil-
mapping project using machine learning techniques, presenting a practical and open
data example. Complementary, we discuss some aspects related to machine learning
modeling in the natural sciences.

2 Soil Sampling for DSM Purposes

Field sampling is the first, most important, and most expensive step in conventional
soil surveying and DSM (Webster & Oliver, 1990; Santos et al., 1995). The
sampling density is defined by considering the mapping scale: the more detailed
the scale, the larger the sample density. It is also important to keep in mind
that areas with high environmental diversity generally present high pedodiversity,
requiring greater sample densities. Therefore, sample allocation commonly starts
by selecting covariates that represent the entire environmental diversity across
the study area. These covariates must be related to soil formation factors such
as lithology and geological structures, elevation and slope, precipitation and air
temperature, and vegetation type. Legacy soil maps may also be included. In DSM,
statistical approaches are usually applied to define the best sampling distribution
for the selected covariates (Brus, 2019; Carvalho Junior et al., 2022; Minasny &
McBratney, 2006).
Statistical sampling techniques can be primarily divided into probability (the
entire population has a positive and known probability of selection) and non-
probability sampling (anyone which does not satisfy the previous rule). DSM
is mostly focused on stochastic model-based estimation (e.g., linear regression,
random forests, and kriging), which does not require a probability approach
and offers the opportunity to optimize sampling (Brus, 2019). There are many
approaches to optimize sampling. The most suitable one must be selected according
to the chosen model, sampling costs, and project aims.
The datasets obtained by any sampling method may be very unbalanced, and
need to be preprocessed before modelling to avoid model inconsistences and low
performances. The main task is merging rare classes, because keeping them during
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 265

the modelling may generate problems. For example, rare classes may be not
included in the training step due to the data partition, and will not be mapped, but
predicted as other major classes. Even when included in training, the few samples
available will not allow a decent pattern recognition, leading to the same problem.
Other common problem is that rare classes are often associated with rare levels
in categorical predictors (factors). In this case, when a level is not included in the
training (for example, a very specific lithology), the model will not be able to do
any prediction, because there will be samples with levels that were not included in
the model building. There are several approaches to merge soil classes, such as the
taxonomic proximity, or statistical clustering considering quantitative soil attributes.
These pedological-based approaches are recommended, since they tend to provide
more consistent maps with soil classes associations correctly grouped.
In this chapter, we use legacy data with no practical example of soil sampling
allocation. However, Brus (2019) provided an extensive overview and open data
tutorials for sampling designs in DSM, addressing the most suitable ones for each
model-based approach. It includes important techniques such as cLHS (Minasny
& McBratney, 2007), commonly used in sampling for mapping soil attributes, and
approaches for supplementary sampling in areas with legacy data.

2.1 Legacy Data Used in This Chapter

We used legacy data (447 soil profiles) collected for the ecological and socioe-
conomic zoning of the Vale do Jamari region in the state of Rondônia, Brazil
(Cochrane & Cochrane, 2006) (Fig. 1).

Fig. 1 Soil profiles used in this study


266 E. I. Fernandes-Filho et al.

a b
60
Udult

40 Ustult

Soil classes
Frequency

Udalf
20

Other

0
4 5 6 7 8 0 100 200
pH Samples

Fig. 2 Sample distribution of soil water-pH (a) and soil classes (b)

Soil classes were harmonized in the second level of Soil Taxonomy (1999), and
soil water pH data were harmonized in the 0–20 cm range through mass-preserving
spline functions using the mpspline2 package in R (O’Brien, 2022). We adopted
a minimum of 50 samples per class. Classes with fewer than 50 samples were
grouped in a class called “Other” (Fig. 2). A vignette with codes for the execution of
this step is available in https://github.com/moquedace/pedometrics_1_spline_class_
grouping.

3 Data Preparation

3.1 Data Partition

As discussed previously, the soil is a complex system in its constitution and


evolutionary history. In this sense, DSM requires a large number of samples to train
and tune complex models capable to detect multiple interactions between soils and
explanatory variables. However, the number of available samples is usually very
limited owing to the high cost of collecting soil samples and laboratory analysis.
Complex models trained with few observations tend to present high variance in
results because even weak perturbations (e.g., different random splits of samples
into training and test sets) may cause marked outcomes in pattern recognition
and model building. This high variance of models results in a high variance in
predictions for the same location, that is, high uncertainties in the final map.
Splitting the samples into training and test sets is critical. Large training sets
tend to enhance model building through improved pattern recognition. However, the
resulting small test sets often imply a less reliable measure of model performance
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 267

(Chen et al., 2021). This trade-off must be optimized using the most appropriate
splitting method for each case. Below, we present some suitable methods for DSM
using a few samples.

3.1.1 Holdout

This is the best approach for independently testing the performance of the models.
The dataset is randomly separated into training and testing sets. The majority of the
samples (usually 60–80%) are allocated to the training set to provide satisfactory
pattern recognition for model building. As discussed previously, the method is
simple, fast, and flexible but may result in high variance in the models’ performance
and predictions when applied to small datasets. In addition, holding out a test set
in very small datasets can result in a poor training set, severely reducing the overall
performance of the models.

3.1.2 Bootstrap

Bootstrapping is a resampling method with replacement. Some samples may be


selected more than once for the training set, whereas others will not be selected,
generating an independent test set. Thus, it is possible to generate a training set
of the same size as the original dataset. Commonly, we use a resampling partition
where one-third of the samples compose the test set. The disadvantage for small
datasets is that the redundancy in the training set increases, resulting in modest
gains in pattern recognition.

3.1.3 K-fold Cross-Validation

In this method, k-folds (subsets) of almost equal size are created. Thus, k-1 folds
are used to train the model, and the remaining fold is used to test the performance
of the model. Then, the model is trained and tested k times, varying the test fold.
The models’ performance is calculated as the averaged performance of these k tests.
After estimating the overall performance, a final model is trained using all folds and
used to make the predictions. We generally use from three to ten folds. A larger
number of folds benefits the training set, being more suitable for improving pattern
recognition in small datasets. On the other hand, the higher the number of folds, the
bigger the computational requirements and processing time. The cross-validation
can also be repeated more than once, arbitrarily changing the fold composition. The
final performance is calculated by averaging all repetitions, decreasing the variance
of models. This approach is called repeated cross-validation.
268 E. I. Fernandes-Filho et al.

3.1.4 Leave-One-Out Cross-Validation

It is a specific case of k-fold cross-validation, in which the number of folds is equal


to the number of samples in the dataset (k = n). Therefore, k − 1 samples are used to
train the model, and only one sample is used in the test. This method is particularly
useful when very small datasets do not allow holding out a test set or fold without
threatening the building of a model. This method may increase the computational
cost because of the large number of trained models.

3.1.5 Nested Leave One Out Cross-Validation

This method uses a dual loop to allow external testing of models trained using a
leave-one-out cross-validation. The external loop separates a single sample for the
external testing of the models, which is likely in the holdout approach. The internal
loop uses leave-one-out cross-validation to train and validate the models with the
remaining samples. The final performance is calculated as the main performance
for all external tests, and the final model is fitted using all samples, likely in the
k-fold and leave-one-out cross-validations.

3.2 Defining Covariates

Covariates are explanatory variables that are used to model soil classes and
properties. These covariates were related to the soil formation factors and soil traits.
It may also be useful to consider the position (geographical coordinates) of the
samples, which provides the opportunity to capture the effects of environmental
traits not included in the covariate set. In some models (e.g., kriging), the unique
covariate used is the position. McBratney et al. (2003) presented a systematization
of these relations in the DSM, the so-called SCORPAN model:

Sc,a = f (s, c, o, r, p, a, n) + e
.

where Sc, a are soil classes or attributes, s are other soil properties or attributes, c is
climate, o are organisms, r is relief, p are the parent materials and geological traits,
a is the process age, and n is the position. An e term was added to represent the
spatially autocorrelated residuals.
In addition to the primarily available data, new covariates can be created by
editing, clustering, combining, or mathematically reprocessing SCORPAN factors.
For example, dozens of morphometric covariates can be obtained from a digital
elevation model. This process is known as “feature engineering,” and its major
objective is to enhance the ability of models to recognize patterns among the
dependent and explanatory variables.
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 269

3.3 An Example of Data Preparation

To represent the relief factor, we used the NASADEM digital elevation model
(NASA JPL, 2020) with 30 m spatial resolution to derive 51 morphometric
covariates through the RSAGA package (Brenning et al., 2018) (see the vignette
in https://github.com/moquedace/pedometrics_2_morphometric). The climate was
represented by 19 bioclimatic variables from the WorldClim 2 database with a 1 km
spatial resolution (Fick & Hijmans, 2017). Parent material was represented by a
lithological map, and a map of soil classes was used as a covariate to map the
pH-water values (Cochrane & Cochrane, 2006). All covariates were harmonized to
30 m of spatial resolution and reprojected to the South America Lambert Conformal
Conic system (EPSG:102015) using the terra package (Hijmans, 2022).
In the data partition, we used a holdout of 80% of the samples for the training
models and 20% for the external test. The training set was also portioned in 10
folds for internal cross-validation during the model fitting and hyperparameter
optimization (discussed later).

4 Selection of Predictors

In general terms, model performance is increased by increasing the number of


covariates, owing to better pattern recognition. However, the fitted models are also
more complex and less explainable. Another problem is that an excessively large
set of covariates may increase overfitting (pattern recognition in data noise), which
results in lower predictive performance for new data. This trade-off can be solved
by selecting only the least redundant and most important covariates to be used as
effective predictors in the final model. Many strategies for predictor selection can
be sequentially combined to improve the process.

4.1 Variance-Based Selection

Covariates with very low or zero variance are bad predictors, because they do not
show patterns that can be recognized. Thus, these covariates only increase model
complexity and computational requirements without improving their predictive
performance, and they may even lead to errors in the building of the model. This
type of issue is often observed when covariates with a low spatial resolution are used
to model relatively small areas. Common examples include climatic, geological, and
soil legacy maps on poorly detailed scales.
These covariates can be removed using the caret package (Kuhn et al., 2023)
through the NearZeroVar function. The function detects and removes covariates
with low or zero variance in the dataset.
270 E. I. Fernandes-Filho et al.

4.2 Redundancy-Based Selection

Even with high variance, strongly correlated covariates carry the same information,
increasing the model complexity without substantial improvement in predictive
performance. Highly correlated covariates may also threaten the predictors’ impor-
tance ranking, especially in decision tree-based models, such as Random Forests
and bagging. This occurs because in a group of correlated covariates, one can be
successively replaced by another during training. This replacement decreases the
individual importance of each predictor, which reduces the calculated importance
of the entire group and threatens the model’s interpretability. There are several
approaches for handling highly correlated covariates. Some of these are described
below.

4.2.1 Manual Selection

From a pair or group of correlated covariates, an expert on the studied phenomena


selects the most suitable one based on its importance, interpretability, and other
criteria. This method has the disadvantage of high subjectivity and is very laborious
for large covariate sets.

4.2.2 Dimensionality Reduction

Some statistical transformations can reduce the number of useful covariates and
their correlations, without reducing the variance of the dataset. The most commonly
used method is the principal component analysis (PCA). The PCA transforms a
dataset with p covariates and n samples into a new dataset with p decorrelated
components and n samples. The main advantage is that the first component
contains almost the entire variance of the dataset and is fully decorrelated. Thus,
PCA can solve both the problems of a high number and a high correlation of
predictors. However, this approach has several disadvantages. The components can
carry information about distinct covariates, thereby hampering the interpretability
of the model. Another problem is that some components discarded for carrying
low variance may be important for specific patterns, reducing the performance of
projects in which the prediction of specific cases and rare phenomena is the main
objective.

4.2.3 Correlation Analysis

A very simple but efficient way to choose which covariates should be removed
from a highly correlated group is to use the strength of the correlation itself. The
most redundant covariate is removed from a pair of strongly correlated covariates.
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 271

This approach is easily implemented using the findCorrelation function in the caret
package (Kuhn et al., 2023).
The findCorrelation function has two inputs: The first is the cutoff, the value
of the maximum strength of correlation to be tolerated (the default is 0.9). The
second input is x, a previously calculated matrix of correlation, from which each
pair of strongly correlated covariates (correlation > cutoff) will be identified. For
each covariate, the function calculates the mean of the modulus of its coefficients
of correlation with all other covariates. This mean value is a measure of the overall
redundancy. Subsequently, from each pair of strongly correlated covariates, the one
with the largest mean coefficient of correlation (the most redundant) is removed.
The output is the list of covariates that must be removed.
The correlation matrix is calculated using the cor native function in R language.
The cor function provides the option of calculating matrices for three different
correlation coefficients: Pearson (linear, continuous relations in normally distributed
covariates), Kendall, and Spearman (both for linear or nonlinear, monotonic,
continuous, or ordinal relations, disregarding the distributions). All correlation
coefficients range between −1 and 1.

4.3 Importance-Based Selection

After removing the variables based on variance and redundancy, dozens or hundreds
of covariates remained in the dataset. Not all of them will have relevant predictive
power, leading to a complex, overfitted, and computationally dispendious model.
Therefore, covariates with low importance (i.e., low predictive power) should be
eliminated before fitting the final model. There are three main approaches to
removing low-importance covariates:
Filter methods: Covariates are selected by their statistical association with the
dependent variable measured by a test such as correlation analysis, chi-square
tests, or ANOVA.
Wrapper methods: Covariates is selected by fitting an initial model to several data
subsets and verifying the predictive power of the covariates. This approach is
model specific and selects the best subset of covariates for each model.
Embedded or built-in methods: The algorithm internally responsible for pattern
recognition in the model performs a covariable selection. This approach
is generally an automated combination of filter and wrapper approaches.
Some examples of embedded selections are decision trees, such as random
forests.
Filters have the lowest computational demands because they do not require model
training. They have the disadvantage of testing only one covariate at a time
without considering the interactions among covariates. Wrapper and embedded
approaches have greater computational demands but cover these interplays. Embed-
272 E. I. Fernandes-Filho et al.

ded approaches are typically less adjustable and interpretable because they are
implemented during model training. Wrappers are the most common approach used
in DSM.

4.3.1 The Recursive Feature Elimination algorithm

Recursive Feature Elimination (RFE) is one of the most widely used wrapper
methods for DSM. It is a backward feature selection method that uses the results
of models to select the best subset of covariates to optimize a performance metric.
The algorithm starts training the selected model with all covariates sorted by
their descending importance. Subsequently, the model is successively trained by
removing the less important covariates. As covariates were removed, predictive
performances are recorded. The algorithm returns a table with subsets of ranked
covariates and their predictive performance. The user can decide whether to use the
subset with the best performance or a smaller one with slightly worse performance
to build a more parsimonious model. The eliminated covariates did not lead to
substantial gains or losses in predictive performance.
The advantages of using RFE are related to building a parsimonious model,
such as better interpretability, much smaller computational requirements, and a
significant reduction in the risks of overfitting. However, it is necessary to keep in
mind that the RFE results are model-specific, which means that different models
select different subsets. Therefore, an RFE must be run for each model used
in a project. A possible disadvantage of using RFE is that it can be consid-
ered “ambitious” because once a variable is discarded, it is not evaluated again.
In this sense, we may build a local optimal model, whereas a global optimal
model could actually be built with a slightly different covariate set (Kuhn &
Johnson, 2013).
RFE can be applied using the function rfe in the caret package (Kuhn et al., 2023).
The basic inputs are a dependent variable (y), the covariates (x1 –xn ), the model to be
fitted (model), the metric of performance to be optimized (metric), and the partitions
of the ranking to be tested as subsets of covariates (size). The ranking of importance
is calculated using model-specific techniques (Kuhn, 2019). All subsets are tested on
datasets with few covariates. For example, for a dataset with ten covariates, the size
parameter must be set to size = c(2, 3, 4, 5, 6, 7, 8, 9). However, a large number of
covariates may require unavailable computational power, and it is better to test only
some subsets. For example, for a dataset with 100 covariates, the size parameter
can be set to size = c(2, 4, 7, 10, 15, 20, 30, 50, 75, 99). Several performance
metrics can be used to optimize the model. The most common ones for regression
are mean absolute error (MAE), root mean squared error (RMSE), and coefficient of
determination (R2 ). For classification, the most common parameters are the kappa
coefficient and accuracy. Other metrics can also be used for various objectives. It
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 273

is also necessary to keep in mind whether the optimization is to minimize (RMSE,


MAE, and other measures of error in general) or maximize (R2 , Kappa, accuracy,
and other measures of goodness-of-fit) the metric.

4.4 An Example of Predictors Selection

Sequential predictor selection was adopted with regard to variance, redundancy, and
importance. In a previous check, no covariate presented zero or near-zero variance;
therefore, all covariates were kept. In the selection by redundancy, we applied a
selection by Spearman’s correlation, considering a cutoff of 0.95. The remaining
covariates were submitted for selection by importance using RFE. We optimized
subsets for the random forest model considering the minimization of MAE (for
water pH values) and maximization of Kappa (for soil classes) in training with
ten-fold simple cross-validation. To build a parsimonious model, the chosen subset
had a maximum loss of 2% of the maximum performance observed. The process
was repeated 100 times to verify the stability of subset selection. The results are
presented in Figs. 3 and 4, and the code for these analyses is available in https://
github.com/moquedace/pedometrics_3.1_rfe_model_class and https://github.com/
moquedace/pedometrics_3.2_rfe_model_reg.
The best subsets were approximately 10–15 of the most important covariates
for both regression and classification. These covariates represent less than 20%
of the original dataset, which led to an enormous reduction in the computational
requirements. It is also possible to notice a reduction in overall performance due
to overfitting for subsets with many unimportant covariates, especially in the kappa
metric.

Fig. 3 Model performance in 100 runs of the RFE selection for soil water-pH prediction using
several subsets of covariates ranked by importance. The red points are the mean value
274 E. I. Fernandes-Filho et al.

Fig. 4 Model performance in 100 runs of the RFE selection for soil class prediction using several
subsets of covariates ranked by importance. The red points are the mean value

5 Modelling, Prediction and Predictors Importance

As previously discussed, the data partition approach is very important for fitting
a good model and efficiently assessing its performance. To optimize the trade-
off between training and testing, we propose a combined data-partitioning method
presented below. First, the data is partitioned using a holdout with 80% of the
samples for training and 20% for testing. The training test is then used to train and
tune the model with ten-fold cross-validation.
Tuning models in an essential task in DSM and is performed through hyper-
parameter optimization. Our proposed approach for tuning models is as follows:
several models are trained using different configurations of the available hyper-
parameters in each machine learning algorithm. The configuration with the best
cross-validation performance is chosen as the best one. The tuned model is tested
by using a holdout test set. This process is repeated 100 times to vary which samples
were used to train, tune, and externally test the models. Finally, 100 different
models, predictions, and robust performance evaluation metrics are obtained for a
single dataset.
Compared with other natural sciences, DSM projects often present poor pre-
dictive performance metrics. To evaluate whether a fitted model is useful, it is
recommended to compare its performance with that of a null model. A null model
is a simple model with very low computational cost, such as the mean value for
regression studies. If a complex fitted model performs similarly to a null model, its
computational cost renders it unfeasible. In addition to a null model, several models
should be created using different algorithms, because each algorithm has a different
approach to pattern recognition. Because of this, there is no best model in the DSM,
making it necessary to test the best model for each project.
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 275

After model testing and selection, the next step is map prediction. To quantify the
uncertainties and variance of the predictions, we suggest considering the predictions
from 100 fitted maps. The final map is represented by the mean map for regression
and the mode map for classification. In regression, uncertainties can be evaluated
using quantile maps of 5% and 95% of predictions, as well as maps of the coefficient
of variation of predictions (Hengl et al., 2017). In classifications, uncertainties can
be assessed using variety maps. Variety is the number of classes in which a pixel
was assigned in the 100 predictions, ranging from 1 to the total number of classes.
The importance of each predictor in model prediction is crucial for model inter-
pretability and map validation. A common way to evaluate predictor importance is to
plot the rank of importance, calculated as described previously. Other techniques can
also be used, such as frequency of selection, spatial dependence plots, and Shapley
value distributions (Wadoux et al., 2023).

5.1 An Example of Model Fitting and Prediction

Using the selected predictors and our data partition and tunning approaches, we
fitted random forest models, tuning their mtry hyperparameters to predict soil water
pH (minimizing MAE) and soil classes (maximizing Kappa). The performance
metrics are shown in Figs. 5 and 6. Random Forest models performed substantially
better than the null model (mean value) and had a good predictive performance
in general regarding the common values observed for DSM studies for both
classification and regression (Gomes et al., 2019; Hengl et al., 2017; Ballabio et
al., 2016; Demattê et al., 2016; Carvalho Junior et al., 2022; Wadoux et al., 2023).

Fig. 5 Random Forest and null model performances in 100 runs for soil water pH ρc is the Lins’
Concordance Correlation Coefficient
276 E. I. Fernandes-Filho et al.

The mean and uncertainty maps for the classification and regression are shown
in Figs. 7 and 8, respectively. Maps of soil classes and soil-water pH showed good
pedological consistency. The maps also presented low uncertainties, as expressed

Fig. 6 Random Forest performances in 100 runs for soil classes. F1 is the F1 score

Fig. 7 Maps of quantiles (5%) (a), 95% (b), the mean (c), and the coefficient of variation (d) for
soil water-pH
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 277

Fig. 8 Maps of mode (a) and variety (b) for soil classes

Fig. 9 Predictors’ importance for random forest in the prediction of soil water-pH. Red numbers
indicate the frequency of selection for each variable in the 100 runs

by the low values of the coefficient of variation and variety, as well as small spatial
differences among the 5% and 95% quantile maps.
The importance of the predictors is shown in Figs. 9, 10, 11, 12, and 13. The
most important and selected predictors for both soil classes and soil water-pH were
distributed across all considered SCORPAN factors, which indicates a good pattern
recognition of pedogenetic processes and soil-landscape interplays. R examples for
all the analyses in this section are available in:
https://github.com/moquedace/pedometrics_3.1_rfe_model_class, https://github.
com/moquedace/pedometrics_3.2_rfe_model_reg, https://github.com/moquedace/
pedometrics_5.1_importance_class, https://github.com/moquedace/pedometrics_5.
2_importance_reg, https://github.com/moquedace/pedometrics_4.1_predict_class,
and https://github.com/moquedace/pedometrics_4.2_predict_reg.
278 E. I. Fernandes-Filho et al.

Fig. 10 Predictors’ importance for random forest in the prediction of the Udalf soil class. Red
numbers indicate the frequency of selection for each variable in the 100 runs

6 Conclusions

This chapter deals with the steps to execute a DSM processing in R. Some aspects
were discussed and presented with vignettes, allowing the readers to try it and
adapt to her database. This DSM processing can be useful to guide efforts to build
DSM protocols to be used in the PronaSolos, a Brazilian program for detailed soil
surveying in national scale.
Some highlights here presented as a suggestion to be adopted as routine steps in
DSM projects are: (1) Data curation, with a proper management of rare classes and
its categorical predictors; (2) A sequential selection of predictors, considering vari-
ance, redundancy, and importance-based methods; (3) A proper data partition, with a
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 279

Fig. 11 Predictors’ importance for random forest in the prediction of the Udult soil class. Red
numbers indicate the frequency of selection for each variable in the 100 runs

set of samples to externally test models, beyond just a cross-validation; (4) Multiple
runs (we suggest 100) for uncertainty quantification of model performances and
generated maps.
280 E. I. Fernandes-Filho et al.

Fig. 12 Predictors’ importance for random forest in the prediction of the Ustult soil class. Red
numbers indicate the frequency of selection for each variable in the 100 runs
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 281

Fig. 13 Predictors’ importance for random forest in the prediction of the “Other” soil class. Red
numbers indicate the frequency of selection for each variable in the 100 runs
282 E. I. Fernandes-Filho et al.

References

Ballabio, C., Panagos, P., & Monatanarella, L. (2016). Mapping topsoil physical properties at
European scale using the LUCAS database. Geoderma, 261, 110–123. https://doi.org/10.1016/
j.geoderma.2015.07.006
Brenning, A., Bangs, D., & Becker, M. (2018). RSAGA: SAGA Geoprocessing and Terrain Analysis
(1.3.0). https://cran.r-project.org/package=RSAGA
Brus, D. J. (2019). Sampling for digital soil mapping: A tutorial supported by R scripts. Geoderma,
338, 464–480. https://doi.org/10.1016/j.geoderma.2018.07.036
Chen, S., Xu, H., Xu, D., Ji, W., Li, S., Yang, M., Hu, B., Zhou, Y., Wang, N., Arrouays, D., &
Shi, Z. (2021). Evaluating validation strategies on the performance of soil property prediction
from regional to continental spectral data. Geoderma, 400, 115159. https://doi.org/10.1016/
J.GEODERMA.2021.115159
Cochrane, T. T., & Cochrane, T. A. (2006). Diversity of the land resources in the Amazonian
state of Rondônia, Brazil. Acta Amazonica, 36(1), 91–102. https://doi.org/10.1590/s0044-
59672006000100011
de Carvalho Junior, W., Calderano Filho, B., Bhering, S. B., da S. Chagas, C., Vasques, G. M.,
Pereira, N. R., de Macedo, J. R., & Dart, R. D. O. (2022). Desenho amostral de solos na
presença de covariáveis e cLHS para execução do mapa de solos de Mato Grosso do Sul. Rio
de Janeiro: Embrapa Solos. 9 p. (Embrapa Solos. Comunicado técnico, 80).
Demattê, J. A. M., Alves, M. R., da Terra, F. S., Bosquilia, R. W. D., Fongaro, C. T., & da
Barros, P. P. S. (2016). Is it possible to classify topsoil texture using a sensor located 800
km away from the surface? Revista Brasileira de Ciência do Solo, 40. https://doi.org/10.1590/
18069657rbcs20150335
dos Santos, H. G., Hochmüller, D. P., Cavalcanti, A. C., Rêgo, R. S., Ker, J. C., Panoso,
L. A., & do Amaral, J. A. M. (1995). Procedimentos normativos de levantamentos
pedológicos. EMBRAPA-SPI; EMBRAPA-CNPS. 108 p. Disponível em: http://
ainfo.cnptia.embrapa.br/digital/bitstream/item/149478/1/CNPSProcedimentos-normativos-
evantamentospedologicos1995.pdf. Acesso em: 17 nov. 2022.
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1-km spatial resolution climate surfaces
for global land areas. International Journal of Climatology, 37(12), 4302–4315. https://doi.org/
10.1002/joc.5086
Gomes, L. C., Faria, R. M., de Souza, E., Veloso, G. V., Schaefer, C. E. G., & Fernandes Filho, E. I.
(2019). Modelling and mapping soil organic carbon stocks in Brazil. Geoderma, 340, 337–350.
https://doi.org/10.1016/j.geoderma.2019.01.007
Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda, M., Blagotić, A.,
Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara, M. A., Vargas,
R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S.,
& Kempen, B. (2017). SoilGrids250m: Global gridded soil information based on machine
learning. PLoS One, 12(2), e0169748. https://doi.org/10.1371/journal.pone.0169748
Hijmans, R. J. (2022). terra: Spatial Data Analysis (1.5-23). https://rspatial.org/terra/
Kuhn, M. (2019). caret: Classification and regression training [Computer software manual]. http:/
/topepo.github.io/caret/index.html
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. In Applied predictive modeling.
Springer New York. https://doi.org/10.1007/978-1-4614-6849-3
Kuhn, M., et al. (2023). The caret package. R Foundation for Statistical Computing, Vienna,
Austria. https://cran.r-project.org/package=caret
McBratney, A. B., Mendonça Santos, M. L., & Minasny, B. (2003). On digital soil mapping.
Geoderma, 117(1–2), 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4
Minasny, B., & Mcbratney, A. B. (2006). A conditioned Latin hypercube method for sampling in
the presence of ancillary information. Computers & Geosciences, 32(9), 1378–1388. https://
doi.org/10.1016/j.cageo.2005.12.009
Methods and Challenges in Digital Soil Mapping: Applied Modelling with R Examples 283

Minasny, B., & Mcbratney, A. B. (2007). Latin hypercube sampling as a tool for digital soil
mapping. In P. Lagacherie, A. B. Mcbratney, & M. Voltz (Eds.), Digital soil mapping: An
introductory perspective (Vol. 12, pp. 153–165). Elsevier. (Developments in soil science, 31).
https://doi.org/10.1016/S0166-2481(06)31012-4
NASA JPL. (2020). NASADEM Merged DEM Global 1 arc second V001. In NASA EOSDIS
Land Processes DAAC. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/
MEaSUREs/NASADEM/NASADEM_HGT.001
O’Brien, L. (2022). mpspline2: Mass-preserving spline functions for soil data (0.1.6). https://
cran.r-project.org/package=mpspline2
Soil Taxonomy. (1999). A basic system of soil classification for making and interpreting soil
surveys. Agriculture Handbook, 436, 96–105.
Wadoux, A. M. J., Saby, N. P., & Martin, M. P. (2023). Shapley values reveal the drivers of soil
organic carbon stock prediction. The Soil, 9(1), 21–38. https://doi.org/10.5194/soil-9-21-2023
Webster, R., & Oliver, M. A. (1990). Statistical methods in soil and land resource survey (p. 316).
Oxford University Press.
Index

A Explainability, 57, 131, 139, 148, 154, 156,


Aluminium (Al), 161, 162, 165–167, 187–189 157
Archaeological record, 63, 72

F
B Ferralsols, 94, 118–122, 160, 167, 213, 237,
Bases content, 190 241, 243
Blue carbon, 245

G
C Geoprocessing, 263
Clay, 15, 23, 25, 26, 32, 33, 36, 37, 39, 51, 52, Geostatistics, 64, 69, 130
59, 66, 68–71, 78, 79, 81, 82, 84–88,
105, 110, 122, 130, 132–134, 136, 138,
142, 144, 149, 151, 166, 173, 175, 179, I
181, 186, 188, 189–192, 198, 213, 220, Interpretability, 131, 139, 148, 154, 156, 157,
230, 252–255 263, 270, 272
Coastlines, 226 Inverse distance weighted (IDW), 81, 87, 88
Creosote, 78, 88

D K
Database management system (DBMS), 1, 2 Kriging, 66, 68, 69, 71, 130, 131, 134, 136,
Digital soil mapping (DSM), 15–17, 32, 139–141, 143–147, 154, 155, 157, 177,
115–126, 130, 131, 137–139, 159, 164, 203, 246, 249, 252, 253, 257, 264, 268
235–243, 246, 263–281
Digital terrain model (DTM), 235, 236, 238
M
Machine learning (ML), 16, 20, 22, 24, 31–43,
E 48, 50, 51, 58, 59, 105, 108, 130, 131,
Electrical conductivity (EC), 75–89, 198–204, 137, 143, 159, 164, 200, 201, 206,
208, 227 235–237, 263, 264, 274
Erosion, 20, 24, 33, 67, 94, 99, 137, 159, 197, Macroporosity, 95, 98
226 Marajó Island, 225–232
Expert Knowledge (EK), 131, 141, 142 Mato Grosso do Sul, 116

© The Editor(s) (if applicable) and The Author(s), under exclusive license to 285
Springer Nature Switzerland AG 2024
W. de Carvalho Junior et al. (eds.), Pedometrics in Brazil, Progress in Soil Science,
https://doi.org/10.1007/978-3-031-64579-2
286 Index

Microporosity, 80, 83, 95, 98 130–134, 136, 142, 149, 151, 153, 173,
Mountainous areas, 102, 236, 241 188–190, 192, 194, 213, 218, 220
Multiscale, 16, 17, 19–22, 24, 25 Semivariogram, 66, 68, 69, 71, 130, 139–141,
Munsell Soil Color Charts (MSCC), 5, 211, 145, 146
221–223 Sentinel-1, 200, 201, 203, 204, 206–208
Silt, 22–26, 32, 33, 36, 37, 39–42, 51, 52, 54,
68–71, 79, 82, 84, 130, 132–134, 136,
P 142, 149, 151–153, 213, 253
Parnaíba Delta, 246 Similarity index, 105, 112
Pedometrics, 76, 172, 181, 235–243, 263, 266, Soil
269, 273, 277 database, 48–49, 53, 116, 130–132
Pedometry, 16, 172, 173, 182 data integration, 102
Pedotransfer functions (PTFs), 31, 47–60, 130, hydrology, 2, 59, 162
177, 178 magnetism, 159
Penetrometric, 85 physics, 16
Planosols, 119, 120, 122, 187, 192 salinization, 198–201
Plausibility, 131, 148, 154, 156, 157 science, 1–13, 15, 32, 64, 101, 172, 235,
Pore connectivity density, 95, 96, 99 263
PostgreSQL, 2, 4 sensing, 181, 225–232
Prediction methods, 253, 258 survey, 3, 5, 15, 19, 22, 32, 48, 64,
101–105, 107, 109, 111, 112, 115, 116,
133, 159, 171, 175, 227, 235, 236, 243,
R 247, 264
Rain forest, 225 Soil survey and mapping, 1–3, 16, 19, 50,
Random Forest (RF), 17, 22–23, 36–39, 42, 171–182, 200, 225–232, 236, 242
50, 53, 54, 56–58, 105, 131, 139, 141, Spectroscopy, 76, 171–182, 200
143, 147–156, 197–208, 236, 237, 241, Stratigraphy, 23, 81, 231
243, 264, 270, 273, 275–281 Surface geophysics, 135, 138, 197
Real time project management, 3, 7, 172 Sustainable Development Goals (SDGs), 225
Recursive Feature Elimination (RFE), 34,
40, 53, 54, 131, 138, 142, 148–149,
153–157, 164, 272–274 T
Reference area, 102, 103, 105, 112 Tropical
Remote sensing, 39, 198, 200, 206, 225, 229, forest, 174
231, 232, 236, 245–258 soil, 173, 212
RGB, 212, 214, 215, 218–221, 223

S V
Salt-affected soils, 199, 202, 205, 225, 228, Vegetation, 2, 18, 24, 33, 67, 69, 94, 99, 116,
230 117, 122, 125, 129, 134, 163, 175, 199,
Sample design, 116, 265 200, 205, 226–229, 231, 236, 245–254,
Sand, 22–26, 32, 33, 36, 37, 39, 40, 41, 43, 256–258, 264
51, 52, 66, 68–71, 79, 82, 84, 102, 123,

You might also like