0% found this document useful (0 votes)
6 views

Infant Mortality in Brazil a Survival Analysis Using Machine Learning Models7

The document presents a survival analysis of infant mortality in Brazil using machine learning models, highlighting the limitations of traditional Cox proportional hazards models. It utilizes a dataset of 2.9 million observations from the Brazilian Unique Health System to predict infant mortality risk, finding that machine learning models like Survival Support Vector Machines and Extreme Gradient Boosting outperform the Cox model in predictive accuracy. The study emphasizes the importance of interpretable machine learning in identifying non-linear relationships between risk factors and infant mortality, which can aid policymakers in developing effective health strategies.

Uploaded by

fazal wahab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Infant Mortality in Brazil a Survival Analysis Using Machine Learning Models7

The document presents a survival analysis of infant mortality in Brazil using machine learning models, highlighting the limitations of traditional Cox proportional hazards models. It utilizes a dataset of 2.9 million observations from the Brazilian Unique Health System to predict infant mortality risk, finding that machine learning models like Survival Support Vector Machines and Extreme Gradient Boosting outperform the Cox model in predictive accuracy. The study emphasizes the importance of interpretable machine learning in identifying non-linear relationships between risk factors and infant mortality, which can aid policymakers in developing effective health strategies.

Uploaded by

fazal wahab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/378858285

Infant Mortality in Brazil: A Survival Analysis using Machine Learning Models

Preprint · March 2024


DOI: 10.13140/RG.2.2.32819.64805

CITATIONS READS
0 129

3 authors, including:

Leonardo Matsuno Frota


Federal University of Paraná
5 PUBLICATIONS 7 CITATIONS

SEE PROFILE

All content following this page was uploaded by Leonardo Matsuno Frota on 10 March 2024.

The user has requested enhancement of the downloaded file.


Infant Mortality in Brazil: A Survival
Analysis using Machine Learning Models
Leonardo Matsuno da Frota, Marcos Hasegawa, Paulo Jacinto
March 10, 2024

Abstract
The persistence of infant mortality in middle-income countries like
Brazil is a critical health challenge of the 21st century. Health care poli-
cymakers increasingly use statistical methods such as survival analysis
to identify factors associated with mortality rates. A common choice in
survival analysis is the Cox proportional hazards model. It is argued
that in the presence of non-proportional hazards, the Cox model has
limitations. Machine learning models are efficient at prediction and are
a methodological alternative to models with the proportional hazards
assumption. Using 2.9 million observations from the Brazilian Unique
Health System (SUS) 2017 data, we estimated a set of different machine
learning models (Survival Support Vector Machines, Random Survival
Forest, and Extreme Gradient Boosting) to predict which infants have
the highest risk of not surviving the first year of life. We found that by
the concordance index measure, the Survival Support Vector Machines
(c-index: 0.84), the Extreme Gradient Boosting (c-index: 0.83), and
the Random Survival Forest (c-index: 0.81) models can generate very
accurate mortality predictions. However, the Cox model also achieves
accurate mortality predictions (c-index: 0.83) despite the presence
of non proportional hazards. The SHAP framework of interpretable
machine learning was used to identify factors affecting Brazil’s infant
mortality rates. Factors such as cesarean sections and gestational weeks
affect mortality nonlinearly and mean variable effects such as those
found in standard regression models can be misleading. Finally, we
argue that interpretable Machine Learning models can support poli-
cymakers in designing health frameworks that tackle the challenge of
infant mortality in middle-income countries.
Key Words: Brazil; Newborns health; Infant mortality; Survival analysis;
Machine learning; Random survival forest.

1
1 Introduction
The deaths of children are particularly tragic events, as they are early and,
in most cases, preventable deaths. Indeed, infant mortality is an impor-
tant indicator of a population´s health and well being. As such, one of the
United Nations Sustainable Development Goals is to reduce global newborn
and infant mortality rates (Assembly, 2015). Quality health care and bet-
ter socioeconomic conditions are instrumental for achieving this objective.
Furthermore, the emergence of data driven health care can be an important
ally in supporting this goal by identifying risk factors and helping design
efficient policy frameworks (Grossglauser and Saner, 2014).
In Brazil, a middle income country, the infant mortality rate shows
a decreasing trend from 1980 (78.5 deaths per thousand live births) to
2015 (12.1 deaths per thousand live births), according to World Bank data.
Nevertheless, these rates are still high relative to developed economies
(less than 5 deaths per thousand) (Bank, 2021). An important reason as to
why research on infant mortality continues to be relevant in development
economics and health policy. The empirical literature on this field is vast
but a particular method that is insightful in this theme is survival analysis.
Particularly in Brazil, there are several studies using survival analysis
that shed light in different perspectives of infant mortality. For instance,
a survival analysis in intensive care units identify low birth weight (below
2500g) as a major risk factors for neonatal deaths (Risso and Nascimento,
2010). Another studies found higher mortality among children with low
birth weight (below 2500g), born in public hospitals, as well as mothers
with less schooling , and with insufficient prenatal visits (Cardoso et al.,
2013; Pinheiro et al., 2010; Garcia et al., 2019). Studies also discuss health
conditions such as sepsis and congenital heart diseases as major risk factor
for newborns survival (Lopes et al., 2018; Freitas et al., 2019). The most
compreehensive survival analysis study uses 17.6 million births between
2011 and 2018 identifying three newborn characteristics that drive infant
mortality: premature births (less than 37 gestational weeks), low weight and
small for gestational age (babies that are below the 10th percentile weight for
the same gestational weeks) (Paixao et al., 2021b). All these survival analysis
studies in Brazil use Cox regression models that rely on the proportional
hazards hypothesis. An assumption that is not warranted in every context
and should be tested (Royston and Parmar, 2014; Grambsch and Therneau,
1994).
Machine Learning (ML) algorithms are a modelling alternative that can
deal with non proportional hazards. In Netherlands, there is evidence that

2
interpretable machine learning can provide efficient predictions and iden-
tification of risk factors for cancer mortality (Moncada-Torres et al., 2021).
There is also evidence that Random Survival Forests can improve survival
predictions on pacients with heart failures and cardiovascular diseases in
general (Miao et al., 2015, 2018). A study in Uganda uses a Random Survival
Forest modeling strategy to identify the determinants of infant mortality
and shows how the proportional hazards assumption diminishes the model
robustness (Nasejje and Mwambi, 2017).
There are studies that use machine learning to discuss infant mortality
in Brazil, but they do not use survival analysis methods. A study uses
several ML algorithms and proposes a governance framework to identify
the determinants of infant mortality (Ramos et al., 2017). Another study
uses data available in the gestational period to argue that it is possible to
identify infants with high risk of mortality before birth (Valter et al., 2019).In
a small sample of 15 thousand births in Sao Paulo, there is also evidence that
interpretable machine learning can identify newborns at high risk of death
using public health databases (Beluzo et al., 2020). A more recent study
with 1.2 million births in Sao Paulo finds that the extreme gradient boosting
(XGBOOST) model has the best predictive performance in identifying infant
mortality (Batista et al., 2021).
On a methodological level, our contribution is to show the efficiency
of machine learning survival models in predicting infants with high risk
of death, but also the prediction efficiency of the Cox model even when
there is non-proportional hazards. Also to show how interpretable machine
learning algorithms can assess non-linearities in the determinants of infant
mortality. We also contribute in using micro-data of 2.9 million births
from the Unique Health System (SUS), this is the first paper to use survival
analysis with machine learning to assess infant mortality in Brazilian micro-
data. Our main findings are that Survival Support Vector Machines (SSVM),
Random Survival Forests (RSF) and Extreme Gradient Boosting (XGBOOST)
models achieve a strong predictive performance (concordance index ¿ 0.8)
in identifying infants that died in the first year of life. However, only
the Survival Support Vector Machines and the Extreme Gradient Boosting
models beat the Cox regression benchmark performance.Furthermore, the
SHAP1 algorithm of interpretable machine learning shows non-linearities
in the relationship between individual features such as gestational weeks
and c section that are not explicit in typical survival models. We also discuss
1 the SHAP algorithm uses the game theoretical concept of Shapley values to explain the
predictions of machine learning models (Lundberg and Lee, 2017).

3
public policy implications and caveats in the development of predictive
frameworks that help predict and identify risk factors associated with infant
mortality.

2 Data
We use two different datasets from the Brazilian Unified Health System
(SUS) in this research. The Live Birth Information System (SINASC) and
the Mortality Information System (SIM). The Live Birth Information System
(SINASC) function collects and processes demographic and epidemiological
data on newborn characteristics and mother characteristics. It is structured
around the Live Birth Declaration (DN). The system is universal in the Brazil-
ian territory, and it is expected that the professionals working in the health
services or in the registry offices fill in the Live Birth Declaration (DN). The
Mortality Information System (SIM) is a system of national epidemiological
surveillance whose objective is to capture data on the country’s deaths to
provide information on mortality for all instances of the Brazilian health
system. It is structured around the declaration of death (DO).

Table 1: Original Variables Description


Variable Name Definition Type Source
DN Infant ID Numerical Livebirth Dataset (SINASC)
Birth Location Place of Birth Categorical Livebirth Dataset (SINASC)
Mother Age Mother´s age in years Numerical Livebirth Dataset (SINASC)
Marital Status Marital Status Categorical Livebirth Dataset (SINASC)
Schooling Mother´s Education in levels Categorical Livebirth Dataset (SINASC)
Live children Number of living children Numerical Livebirth Dataset (SINASC)
Number of dead children Number of dead children Numerical Livebirth Dataset (SINASC)
Gestational Weeks Gestational Weeks Numerical Livebirth Dataset (SINASC)
Parity Type of Pregnancy (Unique; Double; Triple) Categorical Livebirth Dataset (SINASC)
C Section Type of delivery: Vaginal or C-Section Categorical Livebirth Dataset (SINASC)
Prenatal Visits Number of pre-natal care visits Numerical Livebirth Dataset (SINASC)
Sex Infant Sex Categorical Livebirth Dataset (SINASC)
APGAR1 1st minute APGAR Numerical Livebirth Dataset (SINASC)
APGAR5 Fifth minute APGAR Numerical Livebirth Dataset (SINASC)
Race Race/Ethnicity Categorical Livebirth Dataset (SINASC)
Birth Weight birth weight in grams Numerical Livebirth Dataset (SINASC)
Genetic Anomaly Genetic Anomaly Categorical Livebirth Dataset (SINASC)
Previous Gestations Number of Previous Gestations Numerical Livebirth Dataset (SINASC)
Vaginal births Number of Vaginal Births Numerical Livebirth Dataset (SINASC)
Cesarean births Number of c-sections Numerical Livebirth Dataset (SINASC)
Induced labor Induced Labor Categorical Livebirth Dataset (SINASC)
Fetus Presentation Fetus position before Labor Categorical Livebirth Dataset (SINASC)
C section before start C Section began before labor Categorical Livebirth Dataset (SINASC)
Death Death in the first year of life Binary Mortality Dataset (SIM)
Birth Date Date of Infant´s Birth Calendar Livebirth Dataset (SINASC)
Date of Death Date of Infant´s Death Calendar Mortality Dataset (SIM)
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The table describes the variable name that we adopted in our estimations, the definition of each variable based on SUS
data, the variable type and the corresponding data source from the SUS.

Our sample is created by matching information from the two above


datasets using 2017 data. Each birth has an associated number in the Live
Birth Information System - the ”DN” code. If the infant was born and died in

4
the same year, he will have a DN and a DO code. Then, to get infant mortality,
we match the Mortality Information System with the Live Birth Information
System using the ”DN” code and ”DO” code. In this way, we have a
dataset containing newborn and mother characteristics and information
regarding infant mortality in the first year of life - in this particular case,
information from 2017. Table 1 shows the set of variables that will be used
in the statistical analysis and their respective definitions on the original
datasets.
An essential characteristic of our dataset is that we can have specific date
information for all births and the death dates of the infants that did not
survive the first year. These two variables allow us to create time-to-event
indicators necessary to the proper usage of survival analysis models - our
modeling choice to assess infant mortality in Brazil.
Since a substantial part of our original data is categoric, we perform
several variable transformations to prepare the original data (table 1) for the
statistical estimations. We also remove null observations and observations
with infinite numeric values on the dataset due to measurement error. The
transformed variables used in the models are shown in table 2 which de-
scribes the set of numeric variables and their summary statistics for live and
dead infants and in table 3 which describes the distribution of births and
deaths by each dummy variable.

Table 2: Summary Statistics

Variable Name Alive Dead


38.56 31.92
Gestational Weeks
(2.03) (6.19)
26.74 26.66
Mother Age
(6.69) (7.32)
8.03 5.91
Prenatal Visits
(2.75) (2.99)
Observations 2646432 20310

Source: Prepared by the authors using Unified Health System (SUS) data.
Note: The table describes the mean and standard deviation (in parenthe-
ses) for the numeric variables used in the models. Columns divide the
sample between infants who survived and did not survive the first year of
life.

We create dummies for Genetic Anomaly (1 for genetic anomaly zero


otherwise), Birth Place (1 for Hospital Birth and zero otherwise), Border (1
for border municipality birth and zero otherwise), Capital (1 for birth in
a state capital and zero otherwise), Birth type (1 for c-section and zero for

5
vaginal birth), Low APGAR 5 2 (1 for low APGAR5 score and zero otherwise),
Low Birth Weight (1 for low birth weight and zero otherwise), Mother Marital
Status (1 for married and zero otherwise), Premature (1 for premature birth
and zero otherwise), Mother Race (1 for white mother and zero otherwise),
Schooling (1 if the mother went to college and zero if not), Baby Gender (1
for male infant and zero for female).
After the data transformations, our sample has 2.64 million births and
20310 deaths. The proportion of dead infants is less than 1% of the total
births. There are differences in the proportions between live and dead infants
that shed light on the risk factors for infant mortality. For instance, 14.91%
of newborns with genetic anomalies do not survive, 28.5% of low APGAR5,
and 6.39% of low weight babies also die during the first year of life. There are
also differences in the means of numeric variables that indicate risk factors.
The mean of prenatal visits is 8.03 for newborns who survived the first year
and 5.91 for newborns who did not. Finally, the mean of gestational weeks
is seven weeks lower for dead infants.
2 theAPGAR score is a test performed on newborns shortly after birth that assesses their
general state and vitality. This assessment is done in the first minute of birth and is repeated
again 5 minutes after delivery, taking into account baby characteristics such as heartbeat,
color, breathing and natural reflexes (Casey et al., 2001)

6
Table 3: The distribution of births and
deaths by features
Number of observations
Variable Name Total
(proportion)
Alive N (%) Dead N (%)
Genetic Anomaly
19828 3474
Genetic Anomaly 23302
(85.09) (14.91)
2626604 16836
Non-Genetic Anomaly 2643440
(99.36) (0.64)
Birth Place
2623928 20090
Hospital Birth 2644018
(99.24) (0.76)
22504 220
Non-Hospital Birth 22724
(99.03) (0.97)
Border
159978 1420
Border Municipality Birth 161398
(99.12) (0.88)
2486454 18890
Otherwise 2505344
(99.25) (0.75)
Capital
636116 4890
State Capital Municipality Birth 641006
(99.24) (0.76)
2010316 15420
Otherwise 2025736
(99.24) (0.76)
Birth Type
1504026 10737
C-Section 1514763
(99.29) (0.71)
1142406 9573
Vaginal Birth 1151979
(99.17) (0.83)
APGAR5
19070 7600
Low APGAR5 26670
(71.50) (28.50)
2627362 12710
Normal APGAR5 2640072
(99.53) (0.48)
Birth Weight
205584 14024
Low Weight 219608
(93.61) (6.39)
2440848 6286
Normal Weight 2447134
(99.74) (0.26)
Mother Marital Status
896252 5781
Married 902033
(99.36) (0.64)
1750180 14529
Non-Married 1764709
(99.18) (0.82)
Premature
2524928 7766
Non-premature Birth 2532694
(99.69) (0.31)
121504 12544
Premature Birth 134048
(90.64) (9.36)
Mother Race
1677205 13594
White Mother 1690799
(99.20) (0.80)
969227 6716
Otherwise 975943
(99.31) (0.69)
Schooling
549756 3368
Mother went to College 553124
(99.39) (0.61)
2096676 16942
Did not go to College 2113618
(99.20) (0.80)
Baby Gender
1355840 11117
Female 1366957
(99.19) (0.81)
1290592 9193
Male 1299785
(99.29) (0.71)
2646432 20310
Observations 2666742
(99.23) (0.72)
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The table describes the features’ summary statistics for infants that survived
or not the first year of life.

7
3 Infant Mortality in Brazil
Since 1988 with the universalization of healthcare through the Unified
Health System (SUS), Brazil has seen an effort to expand health services
to its population (Paim et al., 2011). The nineties were a period of severe
economic stress for Brazil, and the main public policy focus was the end of
hyperinflation and macroeconomic stabilization. It was not until the 2000s
that the economic resources to support large-scale social policies became
more available. Indeed, an important driver of infant mortality reduction in
the 21st century was the combination of family health expansion policies
together with conditional cash transfer programs (Guanais, 2015; Russo
et al., 2019)
In Brazil, the improvement in social indicators after the 1989’s Constitu-
tion can be seen in many dimensions but particularly in healthcare (Viellas
et al., 2014). To get a better perspective on these improvements, figure 1
shows trends in global infant mortality rates using World Bank’s infant mor-
tality indicator - specifically, the mortality rate per 1000 live births (Bank,
2021). Following the worldwide pattern of infant mortality reduction, Brazil
has reduced its mortality from 47.1 deaths per 1000 births in 1990 to 13.5 in
2015, substantially reducing its gap from the European and North American
averages.
From the previous discussion, we have seen that Brazil has had substan-
tial progress in reducing infant mortality. Still, there is substantial inequality
between states as can be seen in figure 1 that uses regional infant mortal-
ity data adapted from Szwarcwald et al. (2020). The worse state, Amapa
(located in the northern region), has a 20.8 mortality rate. The best state,
Santa Catarina (southern region), has a 9.9 rate. Overall, there is a pattern of
richer states in the south and southeast having statistics similar to developed
countries. In contrast, poorer states in the north and northern regions are
still far from these objectives.

8
Figure 1: Worldwide Trends in Infant Mortality

Source: Prepared by the authors based on World Bank data.


Notes: The figure represents the temporal trends of infant mortality in different regions of the world.

Finally, using data from the livebirth (SINASC) and mortality systems
(SIM), figure 3 shows the mortality rate (below one year) per 1000 births in
all Brazillian municipalities. We can see that there is still a lot of variation
in different municipalities, even within states. The Brazillian unified health
system (SUS) has a decentralized institutional framework where municipali-
ties share the responsibility and expenses of providing health services with
the federal government. In this way, municipalities with better socioeco-
nomic conditions - such as income, educational attainment, and piped water
provision - in general, do better in terms of health outcomes (Bugelli et al.,
2021; Gamper-Rabindran et al., 2010).

9
Figure 2: Infant Mortality in Brazillian States

Source: Prepared by the authors based on Szwarcwald et al. (2020) data.


Notes: The figure shows temporal trends of infant mortality in different States of Brazil.

4 Method
4.1 Conceptual Framework
Infant mortality is a complex problem that is better comprehended using
a multifaceted framework. That is the critical point of Mosley and Chen
(1984) seminal work on infant mortality in developing countries. It defined
an analytical framework to integrate the social science, and medical science

10
Figure 3: Infant Mortality in Brazillian Municipalities

Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The figure represents the map of infant mortality in different municipalities of Brazil. In the color scale, red represents a high
infant mortality and purple represents a low infant mortality for the year of 2019.

approaches to understand child survival that influenced a vast number of


subsequent studies (Abate et al., 2020).
In Mosley and Chen´s framework, infant mortality is understood as
being affected by socioeconomic, biological, and healthcare variables. These
factors are organized in a hierarchical structure where different variables
can be modeled and labeled according to their importance to the dependent
variable (Mosley and Chen, 1984). For instance, socioeconomic variables
affect infant mortality mediated by the newborn biological characteristics -
which are more important variables.
Studies that use this framework to understand the determinants of in-
fant mortality in Brazil must adapt it to the particularities of the country´s
data sources. Therefore, studies in the Brazilian context typically model
infant mortality as being influenced by three sets of factors: distal, inter-
mediate, and proximal (Garcia et al., 2019; de Souza et al., 2019). Figure 4
is a schematic representation. Distal factors are socioeconomic and de-

11
Figure 4: Mosley and Chen Theoretical Framework

Source: Prepared by the authors based on Mosley and Chen (1984).


Notes: The figure represents a simplified and schematic representation of Mosley
and Chen´s seminal infant mortality framework.Distal determinants affect mortal-
ity trough its influence in intermediate and proximate ones.

mographic variables. Intermediate factors are maternal and reproductive


variables. Proximal factors are newborn characteristics. Therefore, the
pathway through which socioeconomic variables affect infant mortality is
through maternal and newborn characteristics.

4.2 Empirical Strategy


Our empirical strategy is to different survival analysis approaches to under-
stand the determinants of infant mortality in Brazil. The first approach is
to use the cox regression model- the classical model for survival analysis
in statistical inference. The second approach is to use different machine
learning models to estimate the survival function (Moncada-Torres et al.,
2021; Chmiel et al., 2021). The usage of statistical inference methods such
as the Cox regression model along with machine learning methods aims to

12
understand the research problem at hand - infant mortality - through the
two main approaches of statistical modelling.3

4.2.1 Cox Regression Model


The most used model in survival analysis is the Cox regression model (Cox,
1972). The key feature of the Cox model is that it assumes that hazards are
proportional. Based on this proportionality, it is possible to estimate the
effects of covariates on the survival probability without any assumptions
regarding the distribution of survival time. No statistical distribution is
assumed for the hazard function, only that the covariates act multiplicatively
on the hazard.
It is important to assess the suitability of the Cox regression model to the
particular modeling problem that we are interested in and the particular
dataset at hand. That is, to assess if the covariates and data being used in the
model are in accordance with the proportional hazards assumption. The vio-
lation of this basic assumption can lead to inconsistencies in the estimation
of the model coefficients (O’neill, 1986). Model evaluation techniques are
based on the Cox regression residuals, and Schoenfeld’s residue is the most
common technique (Schoenfeld, 1982). The hypothesis test is then to check
the correlation coefficient between the standardized Schoenfeld residuals
and a function of time for each covariate. Correlations close to zero show
evidence in favor of the assumption of proportional hazards.
As discussed above, the Cox regression model relies on the assumption
of proportional hazards, which in many applications is not reasonable. In
addition, there might be non-linearity in the relationship between infant
mortality and the covariates that are not straightforward to incorporate
in parametric models - the usual alternative to the Cox regression model
framework. However, both these challenges can be solved using specific
machine learning models.
3 breiman2001statistical argues that there are two cultures in statistical modeling.
Roughly speaking, the first culture, called data modeling culture, is the one that dom-
inates the statistical community because the main objective is to interpret the parameters
involved in the model; in particular, there is interest in hypothesis testing and confidence
intervals for these parameters. Under this approach, testing whether the model’s assump-
tions are valid is important. The focus is on inference rather than on prediction. The
second culture, called algorithmic modeling culture, is the one that dominates the machine
learning community. In this, the main objective is the prediction of new observations. It
is not assumed that the model used for the data is correct; it is only used to create good
algorithms to predict new observations well. There is often no explicit probabilistic model
behind the algorithms used.

13
4.2.2 Machine Learning
Statistical learning or machine learning refers to a set of prediction tools
designed to understand the available data (Friedman et al., 2001).4 The
function of machine learning algorithms is to discover the relationship
between the variables of a system, its inputs, and outputs, from sampled
data (Cherkassky and Mulier, 2007)5
There are usually two types of problems in this literature. The first is
supervised learning, where for every training data (or pattern) available,
there is a known correct answer. In this case, we say the data is labeled. The
second is unsupervised learning, where there is no desired output associated
with each pattern, so the data is unlabeled. In this scenario, we want the
model to capture, represent or express properties existing in the dataset. All
the models used in this research are supervised statistical learning models.
The first step in building good prediction functions to discover the un-
derlying data patterns is to create a criterion to measure the performance of
a given prediction function. This is typically done through the mean square
error in a regression context. When we measure the performance of an
estimator based on its quadratic error, creating a good prediction function is
equivalent to finding a good estimator for the regression function (Friedman
et al., 2001). Indeed, estimating the regression function is, in this sense, the
best way to create a function to predict new observations based on observed
covariates - i.e., to learn patterns from data.
Therefore, the purpose of regression methods from a predictive perspec-
tive is to provide, in different contexts, methods that present good estimators
of the regression function, that is, low error estimators. Therefore, we want
to choose a function within a class of candidates that has good predictive
4 Following Friedman, Hastie and Tibshirani (2001), let us assume that we observe Y and

n different variables: X1 , X2 , ..., Xn . Moreover, we suspect that there is some relationship


between X and Y. So we can write:

Y = f (X) +  (1)
f is assumed to be some fixed but an unknown function of the variables X1 , X2 , ..., Xn , and
of  an error term that is independent of X and has zero mean. Thus, f represents systematic
information of the set of characteristics that X transmit about the result Y. Statistical learning
then (Machine learning) is the set of different approaches and tools used to estimate f.
5 An analogy to machine learning is a doctor progressing in residency: learning rules

from the data. Starting with observations at the patient level, the algorithms analyze a large
number of variables, looking for combinations that reliably predict outcomes (Obermeyer
and Emanuel, 2016).

14
power (low quadratic error).6 Choosing functions with minimum error can
induce a methodological error: learning the parameters of a prediction func-
tion and testing them with the same data. This model would have a solid
predictive performance in the particular sample where it was trained, but it
would generalize poorly with unseen data.7
A standard method in statistical learning to obtain a model with a better
capacity for generalization consists of dividing the dataset into different
subsamples or sets. The training set will be used to adjust the model’s
parameters, and the validation set will be used to monitor the model’s
generalizability, which then is put to prove in the test set. An important
assumption is that observing model performance against validation data
indicates how it will behave when exposed to samples not seen in training.
In other words, the validation performance is interpreted as an estimate
of the generalization error. Thus, minimizing the error with the validation
set is expected to increase the generalizability. Therefore, it is expected
that the configuration of the model that leads to the smallest error with
the validation set, which is not used for parameter adjustment, has the best
possible performance with new samples (Guyon et al., 1997).
A more elaborate technique for splitting the data is called k-fold cross
validation (Refaeilzadeh et al., 2009). It consists of dividing the set of
samples available for training into k folders and carrying out k training
sessions, each considering k -1 folders to adjust the parameters and one
folder for validation. Every available sample will appear k -1 times in the
training set and one time in the validation set. The k training sets will have a
different composition, as will the k validation sets. The model’s performance
is then taken as the average of the performances in the k validation folders.
6 The task of training the model is to find the best θ parameters that best fit the training

data xi and yi results. To train the model, we need to define the objective function to
measure how well the model fits the training data. A characteristic of objective functions is
that they consist of two parts: the loss of training and the regularization term:

obj(θ) = L(θ) + ω(θ) (2)


Where L is the training loss function and ω is a regularization term. The loss function
is a measure of the model’s predictive power concerning the train set. The regularization
term controls the so-called ”complexity” of the model, avoiding the problem of overfitting
(Friedman et al., 2001).
7 The second situation is underfitting, where the model was not able to adequately

approximate the actual mapping, not even in the data used in training. This can occur
because the degree of flexibility of the model is insufficient given the complexity of the
mapping to be approximated, or also by convergence problems of the training process
(Jabbar and Khan, 2015).

15
Since cross-validation gives us a way to infer the quality of generalization of a
model, this technique is used to choose values for a model’s hyperparameters
- settings or configurations of the model that cannot be estimated from data
(Probst et al., 2019).
The overall description of the machine learning models above is adapted
to different supervised learning algorithms to deal with different empirical
contexts. In survival analysis, random survival forests, survival support
vector machines, and extreme gradient boosting algorithms are standard
choices with desirable statistical properties.

4.2.3 Survival Support Vector Machine


Survival Support Vector Machine (SSVM) algorithms are an extension of
support vector machines (SVM) that can account for censoring in the data
(Pölsterl et al., 2015). The original SVMS was developed for binary classifi-
cation purposes. It does so, in short, by building a hyperplane as a decision
surface that separates the distinct classes in a particular dataset. The ’sup-
port vectors’ are the data points that have a minimum distance from the
separating hyperplane (James et al., 2013). However, not all datasets have
linearly separable patterns. For non-linearly separable patterns, therefore,
the method uses an appropriate mapping function to make the mapped set
linearly separable.8
One class of mapping functions that are computationally efficient are
called kernels - functions that can project data from lower to higher dimen-
sional spaces. One right choice of a transformation function will result in a
higher dimensional feature space that is separable. The statistically robust
method to choose a particular kernel is to use cross-validation techniques
(James et al., 2013).
Support Vector Machines in the context of survival analysis can be under-
stood in two ways. The first is a ranking problem, where the model aims to
understand the accurate ordering of data observations (samples) according
to their survival time - it is ranking observations according to their risk.
The second is a regression problem, where the model learns to predict the
survival time of different observations (Pölsterl et al., 2015). Survival Sup-
port Vector Machines uses the kernel function transformation to deal with
non-linearities between the variables - a valuable characteristic in predicting
8 More formally, let the input set S be represented by the pairs (x1 ,y1 ),...,(xn ,yn ), where yi ,
i=1,2,...n is the label of each input i. The feature space is a space of higher dimensionality in
which the input set S will be mapped, using a function φ, in order to obtain a new linearly
separable data set S’, represented by (φ(x1 ),y1 ), ..., (φ(xn ),yn )(Jameset al., 2013).

16
survival times.

4.2.4 Ensemble of Machine Learning Models


Different machine learning models can solve a particular problem differently,
which makes the idea of combining these models in a committee pertinent.
The diversity of solutions can make the committee more robust in terms
of generalizability (Tresp, 2001). In a particular kind of committee, an
ensemble, machines are trained from the available data and some kind of
combination of their outputs. The idea of combining machine learning
models is based on the idea that diversifying perspectives can bring a better
generalization - a model that performs better with unseen data (Bishop,
2006)
Consider a regression problem where there is a need to approximate
an ideal function from a set of data. An ensemble is a set of individual
regression models. If we assume that the errors have a null mean and are
uncorrelated - an idealized condition - the ensemble will have a mean square
error smaller than the mean of the individual errors. One way to approach
this condition is to perform a bootstrap aggregation procedure - bagging -
to generate the datasets of the individual machine learning models. In this
procedure, if there is data in the training set, each set will be composed of
samples obtained with replacement (Bishop, 2006).
Another classic ensemble approach is called boosting. In this case, a
sequential training scheme is adopted; the machines are trained in sequence.
The training of each machine is based on a dataset in which the data is
weighted according to the performance of the previous machines (Schapire,
1999).
A common ensemble machine learning model that uses bagging is the
random forest model, whereas the boosting method is used in the extreme
gradient boosting model. Both of these models have been adapted to address
survival analysis problems.

4.2.5 Random Survival Forest


In the case of ensembles, the idea is that each machine seeks to deal with the
task at hand from different perspectives, generating answers that, combined,
can lead to a better generalization. The application of this idea in the context
of decision trees gives rise to the random forest concept (Breiman, 2001a).
The extension of this ensemble to the notion of random forest typically
involves the construction of trees from subsets of features.

17
Decision trees configure methods that use a tree-based graphical repre-
sentation, whose objective is to identify groups of individuals with character-
istics of common interest. For this purpose, a recursive method divides the
initial sample into subsamples based on observed results of the explanatory
variables and their interactions. The tree induction process is started through
a sample called a root node divided into subsamples, called child nodes or
intermediate nodes. These subsamples, when subdivided, are called parent
nodes, as they generate child nodes. When a subsample can no longer be
subdivided according to some stopping criteria, it is called an end node or
leaf node. This process is called recursive because each subsample generates
new subsamples (Song and Ying, 2015).
The Random Forest method combines the idea of bagging and the random
selection of explanatory variables in the tree induction process. In this
case, a data set of N samples is sampled (with replacement), generating M
”new” sets, which, in turn, are used to build M trees. The responses from
these trees are combined to generate the ensemble’s output. This selection
is a drawing made at each tree node, randomly selecting some candidate
variables to divide this node. Using this technique, different sets of variables
may appear at different levels in each tree. With this, the technique becomes
more sensitive to interactions between variables, in addition to resulting
in decorrelated trees, due to the random drawing of candidate variables to
divide the node made in each partition (Breiman, 2001b).
Random Survival Forests are a particular kind of ensemble tree suited
to analyzing survival data. The rules for splitting the trees in the model
use censoring and survival information (Ishwaran et al., 2008). To create
the samples and subsamples - the parent and child nodes - in growing the
decision tree, the random survival forest algorithm allows different splitting
criteria. The most intuitive and used one is the log-rank criteria: where the
splits follow the difference in survival times between groups, as measured
by the log-rank test (Bou-Hamad et al., 2011). A key advantage of Random
Survival Forests is that interactions between variables and non-linearities
do not come automatically out of parametric model survival models. In
contrast, random survival forests can deal naturally with these challenges
because of their decision tree structure (Ishwaran et al., 2008).

4.2.6 Extreme Gradient Boosting


The Extreme Gradient Boosting (XGBoost) algorithm is an ensemble model.
As we described earlier, they are methods that use a combination of results
from weak predictors - called base learners - to produce a better predictive

18
model. Weak predictors - or weak learners - are models that, when used
individually, have an accuracy that is marginally better than a random guess
(Chen and Guestrin, 2016).
It is important to highlight that the random forest and gradient boosting
models are similar. They both are based on weak predictors to make the final
prediction, however, the random forest model uses an average of predictions
from weak learners, whereas the gradient boosting model uses the boosting
method. In the Boosting technique, each weak classifier is trained with a
set of data, sequentially and adaptively, where a base model depends on the
previous ones, and in the end, they are combined in a deterministic way. It
builds the model in stages and generalizes them, allowing the optimization
of an arbitrary differentiable loss function (Schapire, 1999).
The choice of the loss function is particularly relevant for survival analy-
sis because it can impact our assumptions regarding the hazard distribution.
One can choose the partial likelihood loss function based on the Cox regres-
sion model class or the loss function weighted by the logarithm of survival
time of the accelerated failure time model class. The first is based on the
proportional hazard assumption, whereas the latter allows for time-varying
hazards (Wei, 1992).
More precisely, the accelerated failure time model takes the following
form in the Gradient Boosting context:

lnY = τ(x) + δZ (3)


in this equation, lnY is the logarithm of the survival time, τ(x) is the result
of a decision tree ensemble, given our vector of controls x. Delta is a scaling
parameter. Z is a random variable with some definite probability distribution
that we will assume is normal. The Gradient Boosting model maximizes the
log-likelihood of Y using the decision tree output τ(x) (Barnwal et al., 2020).
Given a loss criteria, then, the objective of the Gradient Boosting algo-
rithm is to create a chain of weak models, where each one aims to minimize
the error of the previous model. These interactions are repeated a certain
number of times, seeking to minimize the residual generated by weak mod-
els, that is, until the distance between the predicted value and the actual
value is as small as possible. The final model is the sum of the fits of all
weak models. The adjustments of each weak model are multiplied by a
value called the learning rate - which controls the complexity of the model
(the propensity of overfitting). This value is intended to determine the im-
pact of each tree on the final model - the smaller the value, the smaller the
contribution of each tree (Chen and Guestrin, 2016).

19
4.2.7 Model Evaluation and Interpretation
It is typical to use the Concordance index (C index)- a measure of the
correlation between the model predictions and the data - to assess the model
results from the machine learning models (Heagerty and Zheng, 2005). The
C index has a close relationship with the AUC: the area under the receiver
operating characteristic (ROC) curve.9 . It is particularly suited to survival
analysis because it handles censoring in the data.
Understanding the predictions of a particular method is a crucial part of
choosing modeling strategies. Decision tree models such as Random forests
and XGboost allow the implementation of feature (variable) importance
methods. In brief, feature importance can be measured by how much an
accuracy metric changes when a feature (variable) is not used. In survival
analysis settings, it is typical to rank the variables according to their impact
on the C-index as a measure of their importance in the model prediction.
However, most tree-based algorithms only provide the global aggregate
importance of a particular variable but do not provide the direction of the
impact - positive or negative (Rogers and Gunn, 2005).
A method to interpret the predictions of machine learning models is to
use Shapley values. They are originally a concept of game theory. In coalition
game theory, a group of players comes together to create some value. For
instance, one can think of a group of people coming together to form a
company to generate profit. The Shapley value is a method of distributing
the profit fairly among players based on their contributions. More generally,
the Shapley value is the average marginal contribution of a characteristic
value across all possible coalitions (Shapley et al., 1988).
Shapley values have been adapted to interpretable machine learning into
the SHAP framework: a unified approach to interpret model’s predictions.
The objective of the SHAP framework is to interpret the prediction of any
instance of the machine learning model by computing the contribution of
each feature to the prediction (Lundberg and Lee, 2017)

4.2.8 Sampling Strategy


Our dataset consists of 2.65 million birth and 20.5 thousand deaths. The
event that we are trying to comprehend, infant mortality, is present in a
9 The area under the ROC curve (AUC) measures the capacity of a given test to assess
whether a particular criteria is present or not. The standard interpretation is: 1.0 AUC
represents perfect discrimination capacity, whereas 0.5 represents a test with no capacity.
(Hoo et al., 2017)

20
small proportion of the data. That is, the dataset is heavily unbalanced.
Unbalanced data can be defined by the small incidence of a category within
a dataset (minority class) compared to the other (majority class). In most
cases, this means that we have much information about the most incident
category and less about the minority one (Branco et al., 2016).
Unbalanced data can cause problems in machine learning models and
their predictions. Traditional ML algorithms will favor the unbalanced class
heavily because of their objective functions (Cieslak and Chawla, 2008). For
instance, when 99% of births do not result in deaths, the most straightfor-
ward prediction is to infer that every newborn will survive. That will result
in a 99% accuracy metric. However, the model will be useless to identify
newborns that have a lesser likelihood of survival.
One way to remove the bias caused by the difference in the proportion of
the categories is to alter the amount of data that the machine learning models
effectively use. A typical method is undersampling, reducing the number
of observations of the majority class to reduce the difference between the
categories. The result is a dataset that has a similar number of observations
between the classes (Drummond et al., 2003).
There are different undersampling strategies fit for different purposes
and challenges. The simplest strategy is random undersampling, which
consists of randomly removing data from the majority class - the drawback
is the inevitable loss of information. However, the method can be efficient
in different contexts (Estabrooks et al., 2004; Chawla et al., 2004).Another
common method is using distance criteria to evaluate which observations
in the imbalanced class should be added to the training set. For instance,
the nearest neighbor algorithm is used to define a relative distance between
observations, and then the data is undersampled to preserve the information
structure revealed by the algorithm. (Mani and Zhang, 2003).

21
Figure 5: Sampling Strategy

Source: Prepared by the authors.


Notes: The figure shows the sampling strategy used by the authors.

An important precaution is to split the data between train and test sam-
ples before applying an undersampling strategy. If the undersampling
algorithm is applied to the test set as well, we will have a data leakage or
train-test communication problem. That is, information from the test set will
leak to the training set, which will probably overestimate the model’s pre-
dictive performance. For the results to be robust, a machine learning model
cannot be evaluated in the same sample that it is trained (Chiavegatto Filho
et al., 2021).
After the sample splitting, we first balance the train test using a random
undersampling algorithm to summarize our procedure. We also tested the
distance criteria method, but the results were similar, so we chose random
undersampling for computational efficiency. The models are estimated and
optimized using k-fold cross-validation. Afterward, the model performance
is measured in the test set. Figure 5 describes the sampling strategy begin-
ning with the original microdata coming from the Brazilian Unique Health
System (SUS).

22
5 Results
5.1 Cox Regression
Table 4 shows the results for the Cox regression model. The regression table
is structured to highlight the distal, intermediate, and proximal factors. In
the distal factors, schooling has a statistically significant adverse impact
(-0.135) on the hazard (increases the probability of survival), whereas living
in a frontier city positively impacts the hazard (reduces the probability of
survival). Being married (marital status) has a negative impact (-0.073) but
less statistical significance.
In the intermediate factors, parity (0.075) and number of dead children
(0.045) have a statistically positive impact on the hazard, whereas having a
c-section (-0.258), having induced labor (-0.273) or assisted labor (-0.197) ,
and having more prenatal visits (-0.013) reduce the hazard. In the proximal
factors, low APGAR score (2.162), low weight (2.521), having a genetic
anomaly (2.068), and being born a man (0.228) have a statistically significant
positive impact in the hazard ratio.

23
Table 4: Cox Regression Results
Coefficient
Variable Pr(>—z—)
(se)
Distal Factors
-0.001
Mother Age 0.636
(0.002)
0.008
Father Age 0.396
(0.002)
-0.027
Capital Residency 0.450
(0.037)
-0.135***
Schooling 0.0001
(0.035)
-0.073*
Marital Status 0.013
(0.029)
0.234***
Border Residency 1.30e-07
(0.044)
Intermediate Factors
-0.258***
C Section <2e-16
(0.031)
-0.013***
Prenatal Visits 5.45e-07
(0.002)
0.075*
Parity 0.015
(0.031)
-0.214
Birthplace Dummy 0.222
(0.175)
0.045**
Dead Children 0.009
(0.017)
-0.273***
Induced Labor 4.87e-09
(0.046)
-0197*
Assisted Labor 0.017
(0.082)
Proximal Factors
2.162***
Low APGAR1 <2e-16
(0.029)
2.521***
Low Weight <2e-16
(0.031)
2.068***
Anomaly <2e-16
(0.129)
Fetus Presentation 24.47584 3
0.054
Race Dummy 0.051
(0.027)
0.228***
Sex Dummy <2e-16
(0.27)
Number of Observations 2666742
Number of events 20310
0.896
Concordance Index
(se = 0.003 )
Likelihood ratio test on 25 df 21338 <2e-16
Wald test = 28077 on 25 df 28077 <2e-16
Score (logrank) test on 25 df 67422 <2e-16
Source: Prepared by the author.
Notes: The table shows the results for the cox regression models.
Standard errors are in parenthesis. ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

24
Table 5 shows the results for the proportional hazards assumption using
the Schoenfeld residuals test. Mother age, father age, the schooling dummy,
marital status dummy, and living in border dummy are not statistically
significant, whereas living in the capital is. The birthplace dummy, number
of dead children, and the assisted labor dummy are not statistically sig-
nificant, whereas having a c-section, the number of prenatal visits, parity,
and having induced labor is. Finally, having a low Apgar score, low weight,
genetic anomaly, the type of fetus presentation are statistically significant.
Overall, distal factors satisfy the proportional hazards assumption, whereas
intermediate and proximal factors do not.

25
Table 5: Proportional Hazards Assumption Test

Schoefelnd Test
Variable Chi Squared Degrees of Freedom p-value
Distal Factors
Mother Age 0.550 1 0.458
Father Age 0.005 1 0.939
Capital Residency 13.544*** 1 ¡0.001
Schooling 0.286 1 0.592
Marital Status 0.738 1 0.390
Border Residency 0.011 1 0.915
Intermediate Factors
C Section 5.621** 1 0.017
Prenatal Visits 5.336** 1 0.020
Parity 14.271*** 1 ¡0.001
Birth Place Dummy 0.011 1 0.914
Number of dead children 0.033 1 0.855
Induced Labor 3.426* 1 0.064
Assisted Labor 3.496 4 0.478
Proximal Factors
Low APGAR1 316.379*** 1 ¡0.001
Low Weight 66.618*** 1 ¡0.001
Genetic Anomaly 36.031*** 2 ¡0.001
Fetus Presentation 24.475*** 3 ¡0.001
Race Dummy 0.027 1 0.869
Sex Dummy 2.172*** 1 ¡0.001
GLOBAL 476.448*** 25 ¡0.001
Source: Prepared by the authors using Unified Health System (SUS) data.
Note: The table describes the results of the Schoefelnd Test for the propor-
tional Hazard assumption. The null hypothesis is that the hazards ratios are
proportional. ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

The Schoefeldn test indicates that a significant set of covariates do not


satisfy the proportional hazards assumption in our sample. Estimating a Cox
proportional model with non-proportional hazards has consequences. There
can be an overestimation of risks if hazards are increasing and an underesti-
mation if hazards are converging (Schemper, 1992). The interpretation of
model coefficients can be misleading in this setting. If the objective is the
efficient prediction of survival probabilities, this empirical challenge can be
tackled using machine learning models that are robust to non-proportional
hazards.

26
5.2 Machine Learning Models
Figure 6 describes the machine learning model results. It shows the aver-
age predictive performance of the different ML models in the test set as
measured by the concordance index score. The Cox proportional hazards
model (0.837) is a benchmark to assess the other models. The model with
the best predictive performance is the SVM Survival (0.843), followed by
the Gradient Boosting (0.839). The Random Survival Forest (0.815) has the
worse predictive performance. Only the SVM Survival and the Gradient
Boosting have a higher predictive performance than the Cox model.

Figure 6: Machine Learning Model Performance

Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The figure shows the average c-index for the machine learning models using the k-fold cross-validation method.The Random
Survival Forest model has a mean c-index of 0.815, (standard deviation: 0.003). The SVM Survival model has a mean c-index of 0.843
(standard deviation: 0.002). The Cox Proportional Hazards model has a mean c-index of 0.837 (standard deviation: 0.002). The Extreme
Gradient Boosting has a mean c-index of 0.839 (standard deviation: 0.002.)

Table 6 describes the hyper-parameters that were used to estimate the


models. The model parameters were optimized using the randomized search
cross-validation method 10 . We provide a brief description of each. In the
SVM model, the alpha parameter (0.113) controls the model’s regularization:
10 Randomized search implements a random search of parameters, where each configura-
tion is sampled from a distribution of possible parameter values (Hackeling, 2017)

27
higher α value increases the model bias but reduces its variance 11 . The cho-
sen kernel transformation was the radial basis function (RBF)12 . Although,
in general, the Cox proportional hazards model is not considered a machine
learning algorithm, there are ways to estimate it in such a way as to be used
as a benchmark to the other models. So instead of using the full sample to
estimate the model - as in the statistical inference method - the sample is
divided between train and test sets and a regularization parameter is set
to control the model complexity. Therefore, the only hyper-parameter in
the Cox proportional hazards model is the alpha parameter (0.232) which
controls the model´s regularization - analogous to the SVM alpha.

Table 6: Hyper-parameter Tuning

Model Parameter Chosen Value


Random Survival Forests Number of Trees 100
Maximum Depth 6
Minimum Samples Leaf 1
Gradient Boosted Models Number of Trees 50
Maximum Depth 1
Learning Rate 0.5
Survival Support Vector Machines Kernel RBF
alpha 0.113
Cox Proportional Hazards alpha 0.232
Source: Prepared by the authors.
Note: The table describes the parameters for each model. Models were parameterized using a randomized search of different parameter settings to maximize
the models predictive performance.

The decision tree models’ important hyperparameters are the maximum


and minimum depth of the tree, which determine the number of decisions
that the tree will make. We would expect that the deeper the tree is, the
more decisions it has to make and the more perfect its training would be
11 Bias is the inability of a model to capture the true relationship between variables and

the object to be predicted. The model is not learning. On the other hand, if there is a very
small bias, the model is so adjusted to the training data that when used with different data,
it makes many mistakes. The model is overfitted. Variance is the sensitivity of a model to
being used with datasets other than training. If the model is very sensitive to the training
data, it identifies the relationship between them so well that it will be very inaccurate when
faced with different data. regularization is a method that seeks to penalize the complexity of
models, reducing their variance (Tian and Zhang, 2022)
12 Kernel functions are intended to project feature vectors into a high-dimensional feature

space for the classification of problems that lie in non-linearly separable spaces. The model
is then able to classify the output variables into different categories (Musavi et al., 1992).

28
against our tests. However, this does not happen because, at very large
depths, the tree becomes so perfect for the training data that it fails the test
data (overfitting). For the Random Survival Forests (RSF), the maximum
depth is six and the minimum depth is 1, whereas, for the XGBOOST, the
maximum and minimum depth is 1. The total number of trees is 100 for the
RSF and 50 for the XGBOOST. Finally, the XGBOOST model has a learning
rate parameter (0.5) which controls the complexity of the model (Sommer
et al., 2019).
To interpret the model predictions, figure 7 shows the feature (variable)
importance for the Random Survival Forest model. Gestational weeks is
the most relevant predictor, followed by low APGAR5, low weight, genetic
anomaly, mother age and the number of prenatal visits, Similar to the Cox
regression results, the most important variables to the probability of sur-
vival are proximate factors. The distal and intermediate factors have less
importance on the model´s predictions.

29
Figure 7: Variable Importance

Source: Prepared by the authors using Unified Health System (SUS) data.

Notes: The figure represents a feature importance of the Random Forest Survival model.he variables are ordered along the y axis based
on their importance. That is, the higher the variable is on the y axis, the more important it is for the model prediction

Feature importance can be misleading if there is no proper understanding


of the relationship between explanatory variables. Indeed, we highlight
that one important characteristic of the feature importance algorithm is
its sensitivity to the correlation between explanatory variables (features).
In particular, the correlation between features should be considered when
interpreting these results. 13 . This is not a significant concern if the model
13 This challenge is similar to the multicollinearity problem in econometrics. There, if

30
is for prediction purposes only. It will not impact the model performance
in assessing the likelihood of survival. However, the contribution of each
variable to the prediction is hard to assess.
Also, as we discussed in the empirical strategy section, feature importance
algorithms do not show in which direction each variable impacts the model,
only the relative importance to the prediction. Figure 8 then shows the
SHAP values for all explanatory variables in the XGBOOST model. Low
APGAR, too few gestational weeks ,low weight, and genetic anomaly are the
most important predictors of mortality, and they impact the hazard ratio
in a negative way. They are proximate factors. Having a c-section and the
number of prenatal care visits - intermediate factors - are important to the
model output. Finally, distal factors such as schooling and marital status
have less impact on the model predictions. This pattern is in harmony with
the results in the Cox regression model.
two independent variables are strongly correlated, the estimates of the coefficients of the
model parameters can become insignificant since each one presupposes, by definition, the
variation in Y given the variation in X. A high correlation will cause both variables to
move together, and it will be hard to disentangle the particular effect of each one (Alin,
2010). The difference in a Random Forest’s feature importance setting is that there is no
statistical significance, and what is entangled is the contribution of each variable to the
model predictions.

31
Figure 8: Model Interpretation: Shapley Values

Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: Summary plots for SHAP values. For each variable, the points correspond to different observations. The SHAP value is the impact
of the specific variable (feature) for that specific observation. This corresponds to the survival probability relative across observations,
where a higher SHAP value has a higher survival probability relative to a lower SHAP value. The variables are ordered along the y axis
based on their importance, given by the average of their SHAP values. The higher the variable is on the y axis, the more important it is
for the model prediction.

Figure 11 in the appendix shows the SHAP interaction values between


the features in our models. In the main diagonal are the mean effects of each
variable on the model prediction. In the off diagonals, there is the interac-
tion between variables. Most variables can influence the model prediction
differently when interacting with others. In particular, some variables have
interactions that are worth highlighting. They reveal insights that are inter-
esting to the infant mortality problem.
For instance, figure 9 shows the dependence plot between gestational
weeks and having a c-section. Babies with fewer gestational weeks had
a higher likelihood of survival if they had a c-section - the SHAP value

32
is higher. As gestational weeks increase, the relationship inverts. Babies
with more than 35 weeks have a higher likelihood of survival if they had a
normal birth - having done a c-section decrease the SHAP value. There is a
non-linear interaction between c-section and gestational weeks.

Figure 9: Dependence Plot: Gestational Weeks vs C Section

Source: Prepared by the authors using the Unified Health System (SUS) data.
Notes: The figure shows the SHAP feature dependence plot of the XGB model for the interaction between gestational weeks and the low
weight dummy. The plot shows how the two variables affect the probability of survival non-linearly.

Figure 10 shows the SHAP dependence plot between the mother’s age and
having done a c-section. The graphic also shows a non-linear relationship
between the two variables. Being a teenage mother - between 10 and 20
years - and having done a c-section decreases the probability of survival (the
SHAP value is negative). As age increases between 20 and 39 years, there
is no straightforward relationship between c-sections and infant survival.
Increasing the age even further - more than 40 years - having done a c-section
now increases the probability of survival (SHAP value is positive).
We highlight that this result should be interpreted with care. Teenage
pregnancies tend to have more adverse outcomes for children (Ogawa et al.,
2019). There is, in general, more chance of preterm delivery, low birth
weight, and fetal distress (Baş et al., 2020). A sampling selection effect could

33
cause the relationship that the model is showing. The teenage mother group
has higher risk pregnancies and therefore has a higher chance of doing c-
sections (Yussif et al., 2017). The model then indicates a relation between
doing a c-section and a decrease in the probability of survival. However,
without properly accounting for pregnancy riskiness in the model, there is
no way to discern if the decrease in the survival rate comes from having a
c-section or an underlying omitted factor.

Figure 10: Dependence Plot: Mother Age vs C Section

Source: Prepared by the authors using the Unified Health System (SUS) data.

Notes: The figure shows the SHAP feature dependence plot of the XGB model for the interaction between age and c-section dummy. The
plot shows how the two variables affect the probability of survival non-linearly.

The SHAP framework illustrates how interpretable machine learning can


shed light on non-linear relationships omitted in mean effects. Explanatory
variables can affect the outcome variable differently depending on their
positions in the data distribution. The mean effect in some settings might
cause an important loss of information. For instance, in the Cox proportional
model, the mean effect of c-sections is to increase the survival ratio. However,
the SHAP dependence plots show that this omits important contexts where
c-sections reduce the survival ratio.

34
6 Discussion
6.1 Cox Regression
The Cox regression model results have a discernible pattern where intermedi-
ate and proximal variables have a more significant and substantial impact on
mortality than distal factors. Low weight, low APGAR, and genetic anomaly
are the most important drivers of the hazard ratio, whereas schooling and
marital status have a much lesser impact. This result is in accordance with
the theoretical framework of Mosley and Chen (1984) and the subsequent
empirical studies done using it.
A correctly specified model that utilizes the whole set of intermediate and
proximate factors will, in consequence, tend to have distal factors that are not
statistically significant. That is because the proximate determinants capture
all the variance in the model. However, it is unrealistic to assume that the
actual variables coming from health data sets can measure all proximate
aspects of infant mortality correctly. That is why including socioeconomic
factors are important, and there is, in general, statistical significance in
variables of this group (Hill, 2003).
The Cox regression results are consistent with prior works in the same
literature for Brazil. For instance, low birth weight being an important risk
factor for mortality (Risso and Nascimento, 2010; Cardoso et al., 2013; Paixao
et al., 2021a). Also, the probability of survival being negatively affected by
the mother´s having fewer prenatal visits and schooling (Pinheiro et al.,
2010; Garcia et al., 2019).

6.2 Machine Learning Models


The Random Survival Forests (RSFs), Survival Support Vector Machines
(SSVMs) and the Extreme Gradient Boosting (XGBOOST) machine learning
models achieve a good prediction performance since they all have a high
concordance index, that is the models can predict efficiently if an infant
will survive the first year of life. However, only the SSVMs (C-index: 0.843)
and the XGBOOST (C-index: 0.839) models have a slightly better predictive
performance than the Cox proportional model used as a benchmark (C-
index: 0.837). The RSFs have a worse predictive performance ((C-index:
0.815).
On the one hand, we can argue that this finding contributes to an emerg-
ing literature that shows the good performance of survival analysis using
machine learning methods that are robust to non-proportional hazards

35
(Moncada-Torres et al., 2021; Chmiel et al., 2021). On the other hand, the
good performance of the Cox model, in the presence of non proportional
hazards, can be interpreted as a signal of its strength and its substitution
for more complex and computationally intensive models should be done
with care. Particularly when there is a loss of interpretability when using
‘black-box‘ machine learning models.

6.3 Prediction Frameworks


A common challenge in machine learning applications is assessing the true
relationship between explanatory variables and the model output (Gilpin
et al., 2018). On the one hand, machine learning models are very efficient
for prediction purposes. For instance, as the SHAP results show, a model
identifying that baby’s from teenage mothers (who had a c-section) have
a higher risk of death is an efficient prediction. On the other hand, the
relationship between features and outcomes are not guaranteed to be stable
across different machine learning modelling strategies. That is, our feature
importance and SHAP results should be taken with a grain of salt and they
should not be interpreted as a causal effect that can be generalized to other
settings.
We argue that any predictive model in healthcare should be tested and
discussed with subject matter specialists before being put into production.
Machine learning is best at prediction problems and can be used as a tool
in policy-making contexts where prediction is the crucial aspect (Kleinberg
et al., 2015). However, if the estimation of causal effects is the main focus, an
empirical strategy that answers counterfactual questions is required (Athey,
2017). The danger is to infer causality from a model that is designed to be
efficient in prediction and not in answering causal questions. If predictive
models are put into production without having these caveats in mind, there
is a danger that may cause more harm than good - particularly in terms of
inducing wrong or unfair decisions in healthcare (Mehrabi et al., 2021).

7 Final Remarks and Policy Implications


Infant mortality is a serious public health challenge worldwide because,
despite the global decrease in its rates, it is still a stark reality in several
developing countries. In the last decades, Brazil has markedly improved
newborn health conditions, which has greatly reduced its infant mortality
ratios. Nevertheless, there is still substantial room for improvement, partic-

36
ularly in less developed regions. Therefore, reducing infant mortality is still
a major challenge for Brazilian policymakers and society as a whole.
This paper contributes by using survival analysis with machine learning
models that are efficient in predicting infants at risk of death, as well as the
risk factors associated with mortality. Specifically, Random Survival Forests,
Survival Support Vector Machines, and Extreme Gradient Boosting models
can achieve a concordance index higher than 0.8 in the task of predicting
mortality in the first year of life. Furthermore, using the SHAP framework,
we provide evidence that variables such as gestational weeks, low weight,
and having a cesarean section interact non-linearly in affecting mortality. To
our knowledge, this is the first research using survival analysis and machine
learning for infant mortality in Brazil.
These findings have policy implications for Brazil since identifying new-
borns that have a high risk of death at the moment of their birth can be
a valuable input in health policy. Naturally, this requires an accurate pre-
diction of survival probabilities, a task that machine learning models are
efficient. Model predictions - in particular interpretable machine learning
models - can be incorporated into a policy framework that can help miti-
gate infant mortality by being proactive in assessing risks. Finally, future
researchers should integrate machine learning strategies with causal analysis
frameworks to create more transparent and robust models that can tackle
health problems more efficiently.

37
8 Appendix

Figure 11: SHAP Interaction Values

Source: Prepared by the authors using the Unified Health System (SUS) data.
Notes: The figure shows the SHAP interaction values. The main effect of each variable in the model result is shown in the main diagonal.
The interaction effects between variables are shown by the intersection of each pair of variables outside the main diagonal.

38
References
Abate, M. G., Angaw, D. A., and Shaweno, T. (2020). Proximate determinants
of infant mortality in ethiopia, 2016 ethiopian demographic and health
surveys: results from a survival analysis. Archives of Public Health, 78(1):1–
10.

Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computa-


tional Statistics, 2(3):370–374.

Assembly, G. (2015). Sustainable development goals. SDGs Transform Our


World, 2030.

Athey, S. (2017). Beyond prediction: Using big data for policy problems.
Science, 355(6324):483–485.

Bank, W. (2021). World Development Report 2021: Data for Better Lives. The
World Bank.

Barnwal, A., Cho, H., and Hocking, T. D. (2020). Survival regression with
accelerated failure time model in xgboost. arXiv preprint arXiv:2006.04920.

Baş, E. K., Bülbül, A., Uslu, S., Baş, V., Elitok, G. K., and Zubarioğlu, U.
(2020). Maternal characteristics and obstetric and neonatal outcomes of
singleton pregnancies among adolescents. Medical science monitor: inter-
national medical journal of experimental and clinical research, 26:e919922–1.

Batista, A. F., Diniz, C. S., Bonilha, E. A., Kawachi, I., and Chiavegatto Filho,
A. D. (2021). Neonatal mortality prediction with routinely collected data:
a machine learning approach. BMC pediatrics, 21(1):1–6.

Beluzo, C. E., Alves, L. C., Silva, E., Bresan, R. C., Arruda, N. M., and
de Carvalho, T. J. (2020). Machine learning to predict neonatal mortality
using public health data from são paulo-brazil. medRxiv.

Bishop, C. M. (2006). Pattern recognition. Machine learning, 128(9).

Bou-Hamad, I., Larocque, D., and Ben-Ameur, H. (2011). A review of


survival trees. Statistics surveys, 5:44–71.

Branco, P., Torgo, L., and Ribeiro, R. P. (2016). A survey of predictive model-
ing on imbalanced domains. ACM Computing Surveys (CSUR), 49(2):1–50.

Breiman, L. (2001a). Random forests. Machine learning, 45(1):5–32.

39
Breiman, L. (2001b). Using iterated bagging to debias regressions. Machine
Learning, 45(3):261–277.

Bugelli, A., Da Silva, R. B., Dowbor, L., and Sicotte, C. (2021). Health
capabilities and the determinants of infant mortality in brazil, 2004–2015:
an innovative methodological framework. BMC public health, 21(1):1–17.

Cardoso, R. C. A., Flores, P. V. G., Vieira, C. L., Bloch, K. V., Pinheiro, R. S.,
Fonseca, S. C., and Coeli, C. M. (2013). Infant mortality in a very low birth
weight cohort from a public hospital in rio de janeiro, rj, brazil. Revista
Brasileira de Saúde Materno Infantil, 13(3):237–246.

Casey, B. M., McIntire, D. D., and Leveno, K. J. (2001). The continuing value
of the apgar score for the assessment of newborn infants. New England
Journal of Medicine, 344(7):467–471.

Chawla, N. V., Japkowicz, N., and Kotcz, A. (2004). Special issue on learning
from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1):1–6.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system.


In Proceedings of the 22nd acm sigkdd international conference on knowledge
discovery and data mining, pages 785–794.

Cherkassky, V. and Mulier, F. M. (2007). Learning from data: concepts, theory,


and methods. John Wiley & Sons.

Chiavegatto Filho, A., Batista, A. F. D. M., and Dos Santos, H. G. (2021). Data
leakage in health outcomes prediction with machine learning. comment
on “prediction of incident hypertension within the next year: Prospective
study using statewide electronic health records and machine learning”.
Journal of Medical Internet Research, 23(2):e10969.

Chmiel, F., Burns, D., Azor, M., Borca, F., Boniface, M., Zlatev, Z., White, N.,
Daniels, T., and Kiuber, M. (2021). Using explainable machine learning
to identify patients at risk of reattendance at discharge from emergency
departments. Scientific reports, 11(1):1–11.

Cieslak, D. A. and Chawla, N. V. (2008). Learning decision trees for un-


balanced data. In Joint European Conference on Machine Learning and
Knowledge Discovery in Databases, pages 241–256. Springer.

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal


Statistical Society: Series B (Methodological), 34(2):187–202.

40
de Souza, S., Duim, E., and Nampo, F. K. (2019). Determinants of neonatal
mortality in the largest international border of brazil: a case-control study.
BMC Public Health, 19(1):1–9.
Drummond, C., Holte, R. C., et al. (2003). C4. 5, class imbalance, and cost
sensitivity: why under-sampling beats over-sampling. In Workshop on
learning from imbalanced datasets II, volume 11, pages 1–8. Citeseer.
Estabrooks, A., Jo, T., and Japkowicz, N. (2004). A multiple resampling
method for learning from imbalanced data sets. Computational intelligence,
20(1):18–36.
Freitas, F., Araujo, A., Melo, M., and Romero, G. (2019). Late-onset sepsis
and mortality among neonates in a brazilian intensive care unit: a cohort
study and survival analysis. Epidemiology & Infection, 147.
Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statistical
learning, volume 1. Springer series in statistics New York.
Gamper-Rabindran, S., Khan, S., and Timmins, C. (2010). The impact of
piped water provision on infant mortality in brazil: A quantile panel data
approach. Journal of Development Economics, 92(2):188–200.
Garcia, L. P., Fernandes, C. M., and Traebert, J. (2019). Risk factors for
neonatal death in the capital city with the lowest infant mortality rate in
brazil[U+2606],[U+2606][U+2606]. Jornal de pediatria, 95:194–200.
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L.
(2018). Explaining explanations: An overview of interpretability of ma-
chine learning. In 2018 IEEE 5th International Conference on data science
and advanced analytics (DSAA), pages 80–89. IEEE.
Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and
diagnostics based on weighted residuals. Biometrika, 81(3):515–526.
Grossglauser, M. and Saner, H. (2014). Data-driven healthcare: from patterns
to actions. European journal of preventive cardiology, 21(2 suppl):14–17.
Guanais, F. C. (2015). The combined effects of the expansion of primary
health care and conditional cash transfers on infant mortality in brazil,
1998–2010. American Journal of Public Health, 105(S4):S593–S599.
Guyon, I. et al. (1997). A scaling law for the validation-set training-set size
ratio. AT&T Bell Laboratories, 1(11).

41
Hackeling, G. (2017). Mastering Machine Learning with scikit-learn. Packt
Publishing Ltd.
Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accuracy and
roc curves. Biometrics, 61(1):92–105.
Hill, K. (2003). Frameworks for studying the determinants of child survival.
Bulletin of the World Health Organization, 81:138–139.
Hoo, Z. H., Candlish, J., and Teare, D. (2017). What is an roc curve?
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. (2008).
Random survival forests. The annals of applied statistics, 2(3):841–860.
Jabbar, H. and Khan, R. Z. (2015). Methods to avoid over-fitting and under-
fitting in supervised machine learning (comparative study). Computer
Science, Communication and Instrumentation Devices, pages 163–172.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction
to statistical learning, volume 112. Springer.
Kleinberg, J., Ludwig, J., Mullainathan, S., and Obermeyer, Z. (2015). Pre-
diction policy problems. American Economic Review, 105(5):491–95.
Lopes, S. A. V. d. A., Guimarães, I. C. B., Costa, S. F. d. O., Acosta, A. X.,
Sandes, K. A., and Mendes, C. M. C. (2018). Mortality for critical congeni-
tal heart diseases and associated risk factors in newborns. a cohort study.
Arquivos brasileiros de cardiologia, 111:666–673.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting
model predictions. In Proceedings of the 31st international conference on
neural information processing systems, pages 4768–4777.
Mani, I. and Zhang, I. (2003). knn approach to unbalanced data distributions:
a case study involving information extraction. In Proceedings of workshop
on learning from imbalanced datasets, volume 126. ICML United States.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021).
A survey on bias and fairness in machine learning. ACM Computing Surveys
(CSUR), 54(6):1–35.
Miao, F., Cai, Y.-P., Zhang, Y.-T., and Li, C.-Y. (2015). Is random survival forest
an alternative to cox proportional model on predicting cardiovascular
disease? In 6TH European conference of the international federation for
medical and biological engineering, pages 740–743. Springer.

42
Miao, F., Cai, Y.-P., Zhang, Y.-X., Fan, X.-M., and Li, Y. (2018). Predictive
modeling of hospital mortality for patients with heart failure by using an
improved random survival forest. IEEE Access, 6:7244–7253.

Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S., and
Geleijnse, G. (2021). Explainable machine learning can outperform cox
regression predictions and provide insights in breast cancer survival. Sci-
entific Reports, 11(1):1–13.

Mosley, W. H. and Chen, L. C. (1984). An analytical framework for the study


of child survival in developing countries. Population and development
review, 10:25–45.

Musavi, M. T., Ahmed, W., Chan, K. H., Faris, K. B., and Hummels, D. M.
(1992). On the training of radial basis function classifiers. Neural networks,
5(4):595–603.

Nasejje, J. B. and Mwambi, H. (2017). Application of random survival


forests in understanding the determinants of under-five child mortality
in uganda in the presence of covariates that satisfy the proportional and
non-proportional hazards assumption. BMC research notes, 10(1):1–18.

Obermeyer, Z. and Emanuel, E. J. (2016). Predicting the future—big data,


machine learning, and clinical medicine. The New England journal of
medicine, 375(13):1216.

Ogawa, K., Matsushima, S., Urayama, K. Y., Kikuchi, N., Nakamura, N., Tani-
gaki, S., Sago, H., Satoh, S., Saito, S., and Morisaki, N. (2019). Association
between adolescent pregnancy and adverse birth outcomes, a multicenter
cross sectional japanese study. Scientific Reports, 9(1):1–8.

O’neill, T. J. (1986). Inconsistency of the misspecified proportional hazards


model. Statistics & probability letters, 4(5):219–222.

Paim, J., Travassos, C., Almeida, C., Bahia, L., and Macinko, J. (2011). The
brazilian health system: history, advances, and challenges. The Lancet,
377(9779):1778–1797.

Paixao, E. S., Blencowe, H., Falcao, I. R., Ohuma, E. O., dos Santos Rocha, A.,
Alves, F. J. O., Maria da Conceição, N. C., Suárez-Idueta, L., Ortelan, N.,
Smeeth, L., et al. (2021a). Risk of mortality for small newborns in brazil,
2011-2018: A national birth cohort study of 17.6 million records from

43
routine register-based linked data. The Lancet Regional Health-Americas,
3:100045.

Paixao, E. S., Bottomley, C., Pescarini, J. M., Wong, K. L., Cardim, L. L.,
Ribeiro Silva, R. d. C., Brickley, E. B., Rodrigues, L. C., Oliveira Alves,
F. J., Leal, M. d. C., et al. (2021b). Associations between cesarean delivery
and child mortality: A national record linkage longitudinal study of 17.8
million births in brazil. PLoS medicine, 18(10):e1003791.

Pinheiro, C. E. A., Peres, M. A., and D’Orsi, E. (2010). Increased survival


among lower-birthweight children in southern brazil. Revista de saude
publica, 44:776–784.

Pölsterl, S., Navab, N., and Katouzian, A. (2015). Fast training of support
vector machines for survival analysis. In Joint European Conference on
Machine Learning and Knowledge Discovery in Databases, pages 243–259.
Springer.

Probst, P., Boulesteix, A.-L., and Bischl, B. (2019). Tunability: importance of


hyperparameters of machine learning algorithms. The Journal of Machine
Learning Research, 20(1):1934–1965.

Ramos, R., Silva, C., Moreira, M. W., Rodrigues, J. J., Oliveira, M., and
Monteiro, O. (2017). Using predictive classifiers to prevent infant mortality
in the brazilian northeast. In 2017 IEEE 19th International Conference on e-
Health Networking, Applications and Services (Healthcom), pages 1–6. IEEE.

Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. Encyclopedia


of database systems, 5:532–538.

Risso, S. d. P. and Nascimento, L. F. C. (2010). Risk factors for neonatal death


in neonatal intensive care unit according to survival analysis. Revista
Brasileira de terapia intensiva, 22:19–26.

Rogers, J. and Gunn, S. (2005). Identifying feature relevance using a random


forest. In International Statistical and Optimization Perspectives Workshop”
Subspace, Latent Structure and Feature Selection”, pages 173–184. Springer.

Royston, P. and Parmar, M. K. (2014). An approach to trial design and


analysis in the era of non-proportional hazards of the treatment effect.
Trials, 15(1):1–10.

44
Russo, L. X., Scott, A., Sivey, P., and Dias, J. (2019). Primary care physicians
and infant mortality: evidence from brazil. PLoS One, 14(5):e0217614.

Schapire, R. E. (1999). A brief introduction to boosting. In Ijcai, volume 99,


pages 1401–1406. Citeseer.

Schemper, M. (1992). Cox analysis of survival data with non-proportional


hazard functions. Journal of the Royal Statistical Society: Series D (The
Statistician), 41(4):455–465.

Schoenfeld, D. (1982). Partial residuals for the proportional hazards regres-


sion model. Biometrika, 69(1):239–241.

Shapley, L. S., Roth, A. E., et al. (1988). The Shapley value: essays in honor of
Lloyd S. Shapley. Cambridge University Press.

Sommer, J., Sarigiannis, D., and Parnell, T. (2019). Learning to tune xgboost
with xgboost. arXiv preprint arXiv:1909.07218.

Song, Y.-Y. and Ying, L. (2015). Decision tree methods: applications for
classification and prediction. Shanghai archives of psychiatry, 27(2):130.

Szwarcwald, C. L., De Almeida, W. D. S., Teixeira, R. A., França, E. B.,


De Miranda, M. J., and Malta, D. C. (2020). Inequalities in infant mortality
in brazil at subnational levels in brazil, 1990 to 2015. Population Health
Metrics, 18(1):1–9.

Tian, Y. and Zhang, Y. (2022). A comprehensive survey on regularization


strategies in machine learning. Information Fusion, 80:146–166.

Tresp, V. (2001). Committee machines. Handbook for neural network signal


processing, pages 1–18.

Valter, R., Santiago, S., Ramos, R., Oliveira, M., Andrade, L. O. M., and
de HC Barreto, I. C. (2019). Data mining and risk analysis supporting
decision in brazilian public health systems. In 2019 IEEE International
Conference on E-health Networking, Application & Services (HealthCom),
pages 1–6. IEEE.

Viellas, E. F., Domingues, R. M. S. M., Dias, M. A. B., Gama, S. G. N. d.,


Theme, M. M., Costa, J. V. d., Bastos, M. H., and Leal, M. d. C. (2014).
Assistência pré-natal no brasil. Cadernos de Saúde Pública, 30:S85–S100.

45
Wei, L.-J. (1992). The accelerated failure time model: a useful alternative
to the cox regression model in survival analysis. Statistics in medicine,
11(14-15):1871–1879.

Yussif, A.-S., Lassey, A., Ganyaglo, G. Y.-k., Kantelhardt, E. J., and Kielstein,
H. (2017). The long-term effects of adolescent pregnancies in a community
in northern ghana on subsequent pregnancies and births of the young
mothers. Reproductive health, 14(1):1–7.

46

View publication stats

You might also like