Infant Mortality in Brazil a Survival Analysis Using Machine Learning Models7
Infant Mortality in Brazil a Survival Analysis Using Machine Learning Models7
net/publication/378858285
CITATIONS READS
0 129
3 authors, including:
SEE PROFILE
All content following this page was uploaded by Leonardo Matsuno Frota on 10 March 2024.
Abstract
The persistence of infant mortality in middle-income countries like
Brazil is a critical health challenge of the 21st century. Health care poli-
cymakers increasingly use statistical methods such as survival analysis
to identify factors associated with mortality rates. A common choice in
survival analysis is the Cox proportional hazards model. It is argued
that in the presence of non-proportional hazards, the Cox model has
limitations. Machine learning models are efficient at prediction and are
a methodological alternative to models with the proportional hazards
assumption. Using 2.9 million observations from the Brazilian Unique
Health System (SUS) 2017 data, we estimated a set of different machine
learning models (Survival Support Vector Machines, Random Survival
Forest, and Extreme Gradient Boosting) to predict which infants have
the highest risk of not surviving the first year of life. We found that by
the concordance index measure, the Survival Support Vector Machines
(c-index: 0.84), the Extreme Gradient Boosting (c-index: 0.83), and
the Random Survival Forest (c-index: 0.81) models can generate very
accurate mortality predictions. However, the Cox model also achieves
accurate mortality predictions (c-index: 0.83) despite the presence
of non proportional hazards. The SHAP framework of interpretable
machine learning was used to identify factors affecting Brazil’s infant
mortality rates. Factors such as cesarean sections and gestational weeks
affect mortality nonlinearly and mean variable effects such as those
found in standard regression models can be misleading. Finally, we
argue that interpretable Machine Learning models can support poli-
cymakers in designing health frameworks that tackle the challenge of
infant mortality in middle-income countries.
Key Words: Brazil; Newborns health; Infant mortality; Survival analysis;
Machine learning; Random survival forest.
1
1 Introduction
The deaths of children are particularly tragic events, as they are early and,
in most cases, preventable deaths. Indeed, infant mortality is an impor-
tant indicator of a population´s health and well being. As such, one of the
United Nations Sustainable Development Goals is to reduce global newborn
and infant mortality rates (Assembly, 2015). Quality health care and bet-
ter socioeconomic conditions are instrumental for achieving this objective.
Furthermore, the emergence of data driven health care can be an important
ally in supporting this goal by identifying risk factors and helping design
efficient policy frameworks (Grossglauser and Saner, 2014).
In Brazil, a middle income country, the infant mortality rate shows
a decreasing trend from 1980 (78.5 deaths per thousand live births) to
2015 (12.1 deaths per thousand live births), according to World Bank data.
Nevertheless, these rates are still high relative to developed economies
(less than 5 deaths per thousand) (Bank, 2021). An important reason as to
why research on infant mortality continues to be relevant in development
economics and health policy. The empirical literature on this field is vast
but a particular method that is insightful in this theme is survival analysis.
Particularly in Brazil, there are several studies using survival analysis
that shed light in different perspectives of infant mortality. For instance,
a survival analysis in intensive care units identify low birth weight (below
2500g) as a major risk factors for neonatal deaths (Risso and Nascimento,
2010). Another studies found higher mortality among children with low
birth weight (below 2500g), born in public hospitals, as well as mothers
with less schooling , and with insufficient prenatal visits (Cardoso et al.,
2013; Pinheiro et al., 2010; Garcia et al., 2019). Studies also discuss health
conditions such as sepsis and congenital heart diseases as major risk factor
for newborns survival (Lopes et al., 2018; Freitas et al., 2019). The most
compreehensive survival analysis study uses 17.6 million births between
2011 and 2018 identifying three newborn characteristics that drive infant
mortality: premature births (less than 37 gestational weeks), low weight and
small for gestational age (babies that are below the 10th percentile weight for
the same gestational weeks) (Paixao et al., 2021b). All these survival analysis
studies in Brazil use Cox regression models that rely on the proportional
hazards hypothesis. An assumption that is not warranted in every context
and should be tested (Royston and Parmar, 2014; Grambsch and Therneau,
1994).
Machine Learning (ML) algorithms are a modelling alternative that can
deal with non proportional hazards. In Netherlands, there is evidence that
2
interpretable machine learning can provide efficient predictions and iden-
tification of risk factors for cancer mortality (Moncada-Torres et al., 2021).
There is also evidence that Random Survival Forests can improve survival
predictions on pacients with heart failures and cardiovascular diseases in
general (Miao et al., 2015, 2018). A study in Uganda uses a Random Survival
Forest modeling strategy to identify the determinants of infant mortality
and shows how the proportional hazards assumption diminishes the model
robustness (Nasejje and Mwambi, 2017).
There are studies that use machine learning to discuss infant mortality
in Brazil, but they do not use survival analysis methods. A study uses
several ML algorithms and proposes a governance framework to identify
the determinants of infant mortality (Ramos et al., 2017). Another study
uses data available in the gestational period to argue that it is possible to
identify infants with high risk of mortality before birth (Valter et al., 2019).In
a small sample of 15 thousand births in Sao Paulo, there is also evidence that
interpretable machine learning can identify newborns at high risk of death
using public health databases (Beluzo et al., 2020). A more recent study
with 1.2 million births in Sao Paulo finds that the extreme gradient boosting
(XGBOOST) model has the best predictive performance in identifying infant
mortality (Batista et al., 2021).
On a methodological level, our contribution is to show the efficiency
of machine learning survival models in predicting infants with high risk
of death, but also the prediction efficiency of the Cox model even when
there is non-proportional hazards. Also to show how interpretable machine
learning algorithms can assess non-linearities in the determinants of infant
mortality. We also contribute in using micro-data of 2.9 million births
from the Unique Health System (SUS), this is the first paper to use survival
analysis with machine learning to assess infant mortality in Brazilian micro-
data. Our main findings are that Survival Support Vector Machines (SSVM),
Random Survival Forests (RSF) and Extreme Gradient Boosting (XGBOOST)
models achieve a strong predictive performance (concordance index ¿ 0.8)
in identifying infants that died in the first year of life. However, only
the Survival Support Vector Machines and the Extreme Gradient Boosting
models beat the Cox regression benchmark performance.Furthermore, the
SHAP1 algorithm of interpretable machine learning shows non-linearities
in the relationship between individual features such as gestational weeks
and c section that are not explicit in typical survival models. We also discuss
1 the SHAP algorithm uses the game theoretical concept of Shapley values to explain the
predictions of machine learning models (Lundberg and Lee, 2017).
3
public policy implications and caveats in the development of predictive
frameworks that help predict and identify risk factors associated with infant
mortality.
2 Data
We use two different datasets from the Brazilian Unified Health System
(SUS) in this research. The Live Birth Information System (SINASC) and
the Mortality Information System (SIM). The Live Birth Information System
(SINASC) function collects and processes demographic and epidemiological
data on newborn characteristics and mother characteristics. It is structured
around the Live Birth Declaration (DN). The system is universal in the Brazil-
ian territory, and it is expected that the professionals working in the health
services or in the registry offices fill in the Live Birth Declaration (DN). The
Mortality Information System (SIM) is a system of national epidemiological
surveillance whose objective is to capture data on the country’s deaths to
provide information on mortality for all instances of the Brazilian health
system. It is structured around the declaration of death (DO).
4
the same year, he will have a DN and a DO code. Then, to get infant mortality,
we match the Mortality Information System with the Live Birth Information
System using the ”DN” code and ”DO” code. In this way, we have a
dataset containing newborn and mother characteristics and information
regarding infant mortality in the first year of life - in this particular case,
information from 2017. Table 1 shows the set of variables that will be used
in the statistical analysis and their respective definitions on the original
datasets.
An essential characteristic of our dataset is that we can have specific date
information for all births and the death dates of the infants that did not
survive the first year. These two variables allow us to create time-to-event
indicators necessary to the proper usage of survival analysis models - our
modeling choice to assess infant mortality in Brazil.
Since a substantial part of our original data is categoric, we perform
several variable transformations to prepare the original data (table 1) for the
statistical estimations. We also remove null observations and observations
with infinite numeric values on the dataset due to measurement error. The
transformed variables used in the models are shown in table 2 which de-
scribes the set of numeric variables and their summary statistics for live and
dead infants and in table 3 which describes the distribution of births and
deaths by each dummy variable.
Source: Prepared by the authors using Unified Health System (SUS) data.
Note: The table describes the mean and standard deviation (in parenthe-
ses) for the numeric variables used in the models. Columns divide the
sample between infants who survived and did not survive the first year of
life.
5
vaginal birth), Low APGAR 5 2 (1 for low APGAR5 score and zero otherwise),
Low Birth Weight (1 for low birth weight and zero otherwise), Mother Marital
Status (1 for married and zero otherwise), Premature (1 for premature birth
and zero otherwise), Mother Race (1 for white mother and zero otherwise),
Schooling (1 if the mother went to college and zero if not), Baby Gender (1
for male infant and zero for female).
After the data transformations, our sample has 2.64 million births and
20310 deaths. The proportion of dead infants is less than 1% of the total
births. There are differences in the proportions between live and dead infants
that shed light on the risk factors for infant mortality. For instance, 14.91%
of newborns with genetic anomalies do not survive, 28.5% of low APGAR5,
and 6.39% of low weight babies also die during the first year of life. There are
also differences in the means of numeric variables that indicate risk factors.
The mean of prenatal visits is 8.03 for newborns who survived the first year
and 5.91 for newborns who did not. Finally, the mean of gestational weeks
is seven weeks lower for dead infants.
2 theAPGAR score is a test performed on newborns shortly after birth that assesses their
general state and vitality. This assessment is done in the first minute of birth and is repeated
again 5 minutes after delivery, taking into account baby characteristics such as heartbeat,
color, breathing and natural reflexes (Casey et al., 2001)
6
Table 3: The distribution of births and
deaths by features
Number of observations
Variable Name Total
(proportion)
Alive N (%) Dead N (%)
Genetic Anomaly
19828 3474
Genetic Anomaly 23302
(85.09) (14.91)
2626604 16836
Non-Genetic Anomaly 2643440
(99.36) (0.64)
Birth Place
2623928 20090
Hospital Birth 2644018
(99.24) (0.76)
22504 220
Non-Hospital Birth 22724
(99.03) (0.97)
Border
159978 1420
Border Municipality Birth 161398
(99.12) (0.88)
2486454 18890
Otherwise 2505344
(99.25) (0.75)
Capital
636116 4890
State Capital Municipality Birth 641006
(99.24) (0.76)
2010316 15420
Otherwise 2025736
(99.24) (0.76)
Birth Type
1504026 10737
C-Section 1514763
(99.29) (0.71)
1142406 9573
Vaginal Birth 1151979
(99.17) (0.83)
APGAR5
19070 7600
Low APGAR5 26670
(71.50) (28.50)
2627362 12710
Normal APGAR5 2640072
(99.53) (0.48)
Birth Weight
205584 14024
Low Weight 219608
(93.61) (6.39)
2440848 6286
Normal Weight 2447134
(99.74) (0.26)
Mother Marital Status
896252 5781
Married 902033
(99.36) (0.64)
1750180 14529
Non-Married 1764709
(99.18) (0.82)
Premature
2524928 7766
Non-premature Birth 2532694
(99.69) (0.31)
121504 12544
Premature Birth 134048
(90.64) (9.36)
Mother Race
1677205 13594
White Mother 1690799
(99.20) (0.80)
969227 6716
Otherwise 975943
(99.31) (0.69)
Schooling
549756 3368
Mother went to College 553124
(99.39) (0.61)
2096676 16942
Did not go to College 2113618
(99.20) (0.80)
Baby Gender
1355840 11117
Female 1366957
(99.19) (0.81)
1290592 9193
Male 1299785
(99.29) (0.71)
2646432 20310
Observations 2666742
(99.23) (0.72)
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The table describes the features’ summary statistics for infants that survived
or not the first year of life.
7
3 Infant Mortality in Brazil
Since 1988 with the universalization of healthcare through the Unified
Health System (SUS), Brazil has seen an effort to expand health services
to its population (Paim et al., 2011). The nineties were a period of severe
economic stress for Brazil, and the main public policy focus was the end of
hyperinflation and macroeconomic stabilization. It was not until the 2000s
that the economic resources to support large-scale social policies became
more available. Indeed, an important driver of infant mortality reduction in
the 21st century was the combination of family health expansion policies
together with conditional cash transfer programs (Guanais, 2015; Russo
et al., 2019)
In Brazil, the improvement in social indicators after the 1989’s Constitu-
tion can be seen in many dimensions but particularly in healthcare (Viellas
et al., 2014). To get a better perspective on these improvements, figure 1
shows trends in global infant mortality rates using World Bank’s infant mor-
tality indicator - specifically, the mortality rate per 1000 live births (Bank,
2021). Following the worldwide pattern of infant mortality reduction, Brazil
has reduced its mortality from 47.1 deaths per 1000 births in 1990 to 13.5 in
2015, substantially reducing its gap from the European and North American
averages.
From the previous discussion, we have seen that Brazil has had substan-
tial progress in reducing infant mortality. Still, there is substantial inequality
between states as can be seen in figure 1 that uses regional infant mortal-
ity data adapted from Szwarcwald et al. (2020). The worse state, Amapa
(located in the northern region), has a 20.8 mortality rate. The best state,
Santa Catarina (southern region), has a 9.9 rate. Overall, there is a pattern of
richer states in the south and southeast having statistics similar to developed
countries. In contrast, poorer states in the north and northern regions are
still far from these objectives.
8
Figure 1: Worldwide Trends in Infant Mortality
Finally, using data from the livebirth (SINASC) and mortality systems
(SIM), figure 3 shows the mortality rate (below one year) per 1000 births in
all Brazillian municipalities. We can see that there is still a lot of variation
in different municipalities, even within states. The Brazillian unified health
system (SUS) has a decentralized institutional framework where municipali-
ties share the responsibility and expenses of providing health services with
the federal government. In this way, municipalities with better socioeco-
nomic conditions - such as income, educational attainment, and piped water
provision - in general, do better in terms of health outcomes (Bugelli et al.,
2021; Gamper-Rabindran et al., 2010).
9
Figure 2: Infant Mortality in Brazillian States
4 Method
4.1 Conceptual Framework
Infant mortality is a complex problem that is better comprehended using
a multifaceted framework. That is the critical point of Mosley and Chen
(1984) seminal work on infant mortality in developing countries. It defined
an analytical framework to integrate the social science, and medical science
10
Figure 3: Infant Mortality in Brazillian Municipalities
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The figure represents the map of infant mortality in different municipalities of Brazil. In the color scale, red represents a high
infant mortality and purple represents a low infant mortality for the year of 2019.
11
Figure 4: Mosley and Chen Theoretical Framework
12
understand the research problem at hand - infant mortality - through the
two main approaches of statistical modelling.3
13
4.2.2 Machine Learning
Statistical learning or machine learning refers to a set of prediction tools
designed to understand the available data (Friedman et al., 2001).4 The
function of machine learning algorithms is to discover the relationship
between the variables of a system, its inputs, and outputs, from sampled
data (Cherkassky and Mulier, 2007)5
There are usually two types of problems in this literature. The first is
supervised learning, where for every training data (or pattern) available,
there is a known correct answer. In this case, we say the data is labeled. The
second is unsupervised learning, where there is no desired output associated
with each pattern, so the data is unlabeled. In this scenario, we want the
model to capture, represent or express properties existing in the dataset. All
the models used in this research are supervised statistical learning models.
The first step in building good prediction functions to discover the un-
derlying data patterns is to create a criterion to measure the performance of
a given prediction function. This is typically done through the mean square
error in a regression context. When we measure the performance of an
estimator based on its quadratic error, creating a good prediction function is
equivalent to finding a good estimator for the regression function (Friedman
et al., 2001). Indeed, estimating the regression function is, in this sense, the
best way to create a function to predict new observations based on observed
covariates - i.e., to learn patterns from data.
Therefore, the purpose of regression methods from a predictive perspec-
tive is to provide, in different contexts, methods that present good estimators
of the regression function, that is, low error estimators. Therefore, we want
to choose a function within a class of candidates that has good predictive
4 Following Friedman, Hastie and Tibshirani (2001), let us assume that we observe Y and
Y = f (X) + (1)
f is assumed to be some fixed but an unknown function of the variables X1 , X2 , ..., Xn , and
of an error term that is independent of X and has zero mean. Thus, f represents systematic
information of the set of characteristics that X transmit about the result Y. Statistical learning
then (Machine learning) is the set of different approaches and tools used to estimate f.
5 An analogy to machine learning is a doctor progressing in residency: learning rules
from the data. Starting with observations at the patient level, the algorithms analyze a large
number of variables, looking for combinations that reliably predict outcomes (Obermeyer
and Emanuel, 2016).
14
power (low quadratic error).6 Choosing functions with minimum error can
induce a methodological error: learning the parameters of a prediction func-
tion and testing them with the same data. This model would have a solid
predictive performance in the particular sample where it was trained, but it
would generalize poorly with unseen data.7
A standard method in statistical learning to obtain a model with a better
capacity for generalization consists of dividing the dataset into different
subsamples or sets. The training set will be used to adjust the model’s
parameters, and the validation set will be used to monitor the model’s
generalizability, which then is put to prove in the test set. An important
assumption is that observing model performance against validation data
indicates how it will behave when exposed to samples not seen in training.
In other words, the validation performance is interpreted as an estimate
of the generalization error. Thus, minimizing the error with the validation
set is expected to increase the generalizability. Therefore, it is expected
that the configuration of the model that leads to the smallest error with
the validation set, which is not used for parameter adjustment, has the best
possible performance with new samples (Guyon et al., 1997).
A more elaborate technique for splitting the data is called k-fold cross
validation (Refaeilzadeh et al., 2009). It consists of dividing the set of
samples available for training into k folders and carrying out k training
sessions, each considering k -1 folders to adjust the parameters and one
folder for validation. Every available sample will appear k -1 times in the
training set and one time in the validation set. The k training sets will have a
different composition, as will the k validation sets. The model’s performance
is then taken as the average of the performances in the k validation folders.
6 The task of training the model is to find the best θ parameters that best fit the training
data xi and yi results. To train the model, we need to define the objective function to
measure how well the model fits the training data. A characteristic of objective functions is
that they consist of two parts: the loss of training and the regularization term:
approximate the actual mapping, not even in the data used in training. This can occur
because the degree of flexibility of the model is insufficient given the complexity of the
mapping to be approximated, or also by convergence problems of the training process
(Jabbar and Khan, 2015).
15
Since cross-validation gives us a way to infer the quality of generalization of a
model, this technique is used to choose values for a model’s hyperparameters
- settings or configurations of the model that cannot be estimated from data
(Probst et al., 2019).
The overall description of the machine learning models above is adapted
to different supervised learning algorithms to deal with different empirical
contexts. In survival analysis, random survival forests, survival support
vector machines, and extreme gradient boosting algorithms are standard
choices with desirable statistical properties.
16
survival times.
17
Decision trees configure methods that use a tree-based graphical repre-
sentation, whose objective is to identify groups of individuals with character-
istics of common interest. For this purpose, a recursive method divides the
initial sample into subsamples based on observed results of the explanatory
variables and their interactions. The tree induction process is started through
a sample called a root node divided into subsamples, called child nodes or
intermediate nodes. These subsamples, when subdivided, are called parent
nodes, as they generate child nodes. When a subsample can no longer be
subdivided according to some stopping criteria, it is called an end node or
leaf node. This process is called recursive because each subsample generates
new subsamples (Song and Ying, 2015).
The Random Forest method combines the idea of bagging and the random
selection of explanatory variables in the tree induction process. In this
case, a data set of N samples is sampled (with replacement), generating M
”new” sets, which, in turn, are used to build M trees. The responses from
these trees are combined to generate the ensemble’s output. This selection
is a drawing made at each tree node, randomly selecting some candidate
variables to divide this node. Using this technique, different sets of variables
may appear at different levels in each tree. With this, the technique becomes
more sensitive to interactions between variables, in addition to resulting
in decorrelated trees, due to the random drawing of candidate variables to
divide the node made in each partition (Breiman, 2001b).
Random Survival Forests are a particular kind of ensemble tree suited
to analyzing survival data. The rules for splitting the trees in the model
use censoring and survival information (Ishwaran et al., 2008). To create
the samples and subsamples - the parent and child nodes - in growing the
decision tree, the random survival forest algorithm allows different splitting
criteria. The most intuitive and used one is the log-rank criteria: where the
splits follow the difference in survival times between groups, as measured
by the log-rank test (Bou-Hamad et al., 2011). A key advantage of Random
Survival Forests is that interactions between variables and non-linearities
do not come automatically out of parametric model survival models. In
contrast, random survival forests can deal naturally with these challenges
because of their decision tree structure (Ishwaran et al., 2008).
18
model. Weak predictors - or weak learners - are models that, when used
individually, have an accuracy that is marginally better than a random guess
(Chen and Guestrin, 2016).
It is important to highlight that the random forest and gradient boosting
models are similar. They both are based on weak predictors to make the final
prediction, however, the random forest model uses an average of predictions
from weak learners, whereas the gradient boosting model uses the boosting
method. In the Boosting technique, each weak classifier is trained with a
set of data, sequentially and adaptively, where a base model depends on the
previous ones, and in the end, they are combined in a deterministic way. It
builds the model in stages and generalizes them, allowing the optimization
of an arbitrary differentiable loss function (Schapire, 1999).
The choice of the loss function is particularly relevant for survival analy-
sis because it can impact our assumptions regarding the hazard distribution.
One can choose the partial likelihood loss function based on the Cox regres-
sion model class or the loss function weighted by the logarithm of survival
time of the accelerated failure time model class. The first is based on the
proportional hazard assumption, whereas the latter allows for time-varying
hazards (Wei, 1992).
More precisely, the accelerated failure time model takes the following
form in the Gradient Boosting context:
19
4.2.7 Model Evaluation and Interpretation
It is typical to use the Concordance index (C index)- a measure of the
correlation between the model predictions and the data - to assess the model
results from the machine learning models (Heagerty and Zheng, 2005). The
C index has a close relationship with the AUC: the area under the receiver
operating characteristic (ROC) curve.9 . It is particularly suited to survival
analysis because it handles censoring in the data.
Understanding the predictions of a particular method is a crucial part of
choosing modeling strategies. Decision tree models such as Random forests
and XGboost allow the implementation of feature (variable) importance
methods. In brief, feature importance can be measured by how much an
accuracy metric changes when a feature (variable) is not used. In survival
analysis settings, it is typical to rank the variables according to their impact
on the C-index as a measure of their importance in the model prediction.
However, most tree-based algorithms only provide the global aggregate
importance of a particular variable but do not provide the direction of the
impact - positive or negative (Rogers and Gunn, 2005).
A method to interpret the predictions of machine learning models is to
use Shapley values. They are originally a concept of game theory. In coalition
game theory, a group of players comes together to create some value. For
instance, one can think of a group of people coming together to form a
company to generate profit. The Shapley value is a method of distributing
the profit fairly among players based on their contributions. More generally,
the Shapley value is the average marginal contribution of a characteristic
value across all possible coalitions (Shapley et al., 1988).
Shapley values have been adapted to interpretable machine learning into
the SHAP framework: a unified approach to interpret model’s predictions.
The objective of the SHAP framework is to interpret the prediction of any
instance of the machine learning model by computing the contribution of
each feature to the prediction (Lundberg and Lee, 2017)
20
small proportion of the data. That is, the dataset is heavily unbalanced.
Unbalanced data can be defined by the small incidence of a category within
a dataset (minority class) compared to the other (majority class). In most
cases, this means that we have much information about the most incident
category and less about the minority one (Branco et al., 2016).
Unbalanced data can cause problems in machine learning models and
their predictions. Traditional ML algorithms will favor the unbalanced class
heavily because of their objective functions (Cieslak and Chawla, 2008). For
instance, when 99% of births do not result in deaths, the most straightfor-
ward prediction is to infer that every newborn will survive. That will result
in a 99% accuracy metric. However, the model will be useless to identify
newborns that have a lesser likelihood of survival.
One way to remove the bias caused by the difference in the proportion of
the categories is to alter the amount of data that the machine learning models
effectively use. A typical method is undersampling, reducing the number
of observations of the majority class to reduce the difference between the
categories. The result is a dataset that has a similar number of observations
between the classes (Drummond et al., 2003).
There are different undersampling strategies fit for different purposes
and challenges. The simplest strategy is random undersampling, which
consists of randomly removing data from the majority class - the drawback
is the inevitable loss of information. However, the method can be efficient
in different contexts (Estabrooks et al., 2004; Chawla et al., 2004).Another
common method is using distance criteria to evaluate which observations
in the imbalanced class should be added to the training set. For instance,
the nearest neighbor algorithm is used to define a relative distance between
observations, and then the data is undersampled to preserve the information
structure revealed by the algorithm. (Mani and Zhang, 2003).
21
Figure 5: Sampling Strategy
An important precaution is to split the data between train and test sam-
ples before applying an undersampling strategy. If the undersampling
algorithm is applied to the test set as well, we will have a data leakage or
train-test communication problem. That is, information from the test set will
leak to the training set, which will probably overestimate the model’s pre-
dictive performance. For the results to be robust, a machine learning model
cannot be evaluated in the same sample that it is trained (Chiavegatto Filho
et al., 2021).
After the sample splitting, we first balance the train test using a random
undersampling algorithm to summarize our procedure. We also tested the
distance criteria method, but the results were similar, so we chose random
undersampling for computational efficiency. The models are estimated and
optimized using k-fold cross-validation. Afterward, the model performance
is measured in the test set. Figure 5 describes the sampling strategy begin-
ning with the original microdata coming from the Brazilian Unique Health
System (SUS).
22
5 Results
5.1 Cox Regression
Table 4 shows the results for the Cox regression model. The regression table
is structured to highlight the distal, intermediate, and proximal factors. In
the distal factors, schooling has a statistically significant adverse impact
(-0.135) on the hazard (increases the probability of survival), whereas living
in a frontier city positively impacts the hazard (reduces the probability of
survival). Being married (marital status) has a negative impact (-0.073) but
less statistical significance.
In the intermediate factors, parity (0.075) and number of dead children
(0.045) have a statistically positive impact on the hazard, whereas having a
c-section (-0.258), having induced labor (-0.273) or assisted labor (-0.197) ,
and having more prenatal visits (-0.013) reduce the hazard. In the proximal
factors, low APGAR score (2.162), low weight (2.521), having a genetic
anomaly (2.068), and being born a man (0.228) have a statistically significant
positive impact in the hazard ratio.
23
Table 4: Cox Regression Results
Coefficient
Variable Pr(>—z—)
(se)
Distal Factors
-0.001
Mother Age 0.636
(0.002)
0.008
Father Age 0.396
(0.002)
-0.027
Capital Residency 0.450
(0.037)
-0.135***
Schooling 0.0001
(0.035)
-0.073*
Marital Status 0.013
(0.029)
0.234***
Border Residency 1.30e-07
(0.044)
Intermediate Factors
-0.258***
C Section <2e-16
(0.031)
-0.013***
Prenatal Visits 5.45e-07
(0.002)
0.075*
Parity 0.015
(0.031)
-0.214
Birthplace Dummy 0.222
(0.175)
0.045**
Dead Children 0.009
(0.017)
-0.273***
Induced Labor 4.87e-09
(0.046)
-0197*
Assisted Labor 0.017
(0.082)
Proximal Factors
2.162***
Low APGAR1 <2e-16
(0.029)
2.521***
Low Weight <2e-16
(0.031)
2.068***
Anomaly <2e-16
(0.129)
Fetus Presentation 24.47584 3
0.054
Race Dummy 0.051
(0.027)
0.228***
Sex Dummy <2e-16
(0.27)
Number of Observations 2666742
Number of events 20310
0.896
Concordance Index
(se = 0.003 )
Likelihood ratio test on 25 df 21338 <2e-16
Wald test = 28077 on 25 df 28077 <2e-16
Score (logrank) test on 25 df 67422 <2e-16
Source: Prepared by the author.
Notes: The table shows the results for the cox regression models.
Standard errors are in parenthesis. ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
24
Table 5 shows the results for the proportional hazards assumption using
the Schoenfeld residuals test. Mother age, father age, the schooling dummy,
marital status dummy, and living in border dummy are not statistically
significant, whereas living in the capital is. The birthplace dummy, number
of dead children, and the assisted labor dummy are not statistically sig-
nificant, whereas having a c-section, the number of prenatal visits, parity,
and having induced labor is. Finally, having a low Apgar score, low weight,
genetic anomaly, the type of fetus presentation are statistically significant.
Overall, distal factors satisfy the proportional hazards assumption, whereas
intermediate and proximal factors do not.
25
Table 5: Proportional Hazards Assumption Test
Schoefelnd Test
Variable Chi Squared Degrees of Freedom p-value
Distal Factors
Mother Age 0.550 1 0.458
Father Age 0.005 1 0.939
Capital Residency 13.544*** 1 ¡0.001
Schooling 0.286 1 0.592
Marital Status 0.738 1 0.390
Border Residency 0.011 1 0.915
Intermediate Factors
C Section 5.621** 1 0.017
Prenatal Visits 5.336** 1 0.020
Parity 14.271*** 1 ¡0.001
Birth Place Dummy 0.011 1 0.914
Number of dead children 0.033 1 0.855
Induced Labor 3.426* 1 0.064
Assisted Labor 3.496 4 0.478
Proximal Factors
Low APGAR1 316.379*** 1 ¡0.001
Low Weight 66.618*** 1 ¡0.001
Genetic Anomaly 36.031*** 2 ¡0.001
Fetus Presentation 24.475*** 3 ¡0.001
Race Dummy 0.027 1 0.869
Sex Dummy 2.172*** 1 ¡0.001
GLOBAL 476.448*** 25 ¡0.001
Source: Prepared by the authors using Unified Health System (SUS) data.
Note: The table describes the results of the Schoefelnd Test for the propor-
tional Hazard assumption. The null hypothesis is that the hazards ratios are
proportional. ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
26
5.2 Machine Learning Models
Figure 6 describes the machine learning model results. It shows the aver-
age predictive performance of the different ML models in the test set as
measured by the concordance index score. The Cox proportional hazards
model (0.837) is a benchmark to assess the other models. The model with
the best predictive performance is the SVM Survival (0.843), followed by
the Gradient Boosting (0.839). The Random Survival Forest (0.815) has the
worse predictive performance. Only the SVM Survival and the Gradient
Boosting have a higher predictive performance than the Cox model.
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The figure shows the average c-index for the machine learning models using the k-fold cross-validation method.The Random
Survival Forest model has a mean c-index of 0.815, (standard deviation: 0.003). The SVM Survival model has a mean c-index of 0.843
(standard deviation: 0.002). The Cox Proportional Hazards model has a mean c-index of 0.837 (standard deviation: 0.002). The Extreme
Gradient Boosting has a mean c-index of 0.839 (standard deviation: 0.002.)
27
higher α value increases the model bias but reduces its variance 11 . The cho-
sen kernel transformation was the radial basis function (RBF)12 . Although,
in general, the Cox proportional hazards model is not considered a machine
learning algorithm, there are ways to estimate it in such a way as to be used
as a benchmark to the other models. So instead of using the full sample to
estimate the model - as in the statistical inference method - the sample is
divided between train and test sets and a regularization parameter is set
to control the model complexity. Therefore, the only hyper-parameter in
the Cox proportional hazards model is the alpha parameter (0.232) which
controls the model´s regularization - analogous to the SVM alpha.
the object to be predicted. The model is not learning. On the other hand, if there is a very
small bias, the model is so adjusted to the training data that when used with different data,
it makes many mistakes. The model is overfitted. Variance is the sensitivity of a model to
being used with datasets other than training. If the model is very sensitive to the training
data, it identifies the relationship between them so well that it will be very inaccurate when
faced with different data. regularization is a method that seeks to penalize the complexity of
models, reducing their variance (Tian and Zhang, 2022)
12 Kernel functions are intended to project feature vectors into a high-dimensional feature
space for the classification of problems that lie in non-linearly separable spaces. The model
is then able to classify the output variables into different categories (Musavi et al., 1992).
28
against our tests. However, this does not happen because, at very large
depths, the tree becomes so perfect for the training data that it fails the test
data (overfitting). For the Random Survival Forests (RSF), the maximum
depth is six and the minimum depth is 1, whereas, for the XGBOOST, the
maximum and minimum depth is 1. The total number of trees is 100 for the
RSF and 50 for the XGBOOST. Finally, the XGBOOST model has a learning
rate parameter (0.5) which controls the complexity of the model (Sommer
et al., 2019).
To interpret the model predictions, figure 7 shows the feature (variable)
importance for the Random Survival Forest model. Gestational weeks is
the most relevant predictor, followed by low APGAR5, low weight, genetic
anomaly, mother age and the number of prenatal visits, Similar to the Cox
regression results, the most important variables to the probability of sur-
vival are proximate factors. The distal and intermediate factors have less
importance on the model´s predictions.
29
Figure 7: Variable Importance
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: The figure represents a feature importance of the Random Forest Survival model.he variables are ordered along the y axis based
on their importance. That is, the higher the variable is on the y axis, the more important it is for the model prediction
30
is for prediction purposes only. It will not impact the model performance
in assessing the likelihood of survival. However, the contribution of each
variable to the prediction is hard to assess.
Also, as we discussed in the empirical strategy section, feature importance
algorithms do not show in which direction each variable impacts the model,
only the relative importance to the prediction. Figure 8 then shows the
SHAP values for all explanatory variables in the XGBOOST model. Low
APGAR, too few gestational weeks ,low weight, and genetic anomaly are the
most important predictors of mortality, and they impact the hazard ratio
in a negative way. They are proximate factors. Having a c-section and the
number of prenatal care visits - intermediate factors - are important to the
model output. Finally, distal factors such as schooling and marital status
have less impact on the model predictions. This pattern is in harmony with
the results in the Cox regression model.
two independent variables are strongly correlated, the estimates of the coefficients of the
model parameters can become insignificant since each one presupposes, by definition, the
variation in Y given the variation in X. A high correlation will cause both variables to
move together, and it will be hard to disentangle the particular effect of each one (Alin,
2010). The difference in a Random Forest’s feature importance setting is that there is no
statistical significance, and what is entangled is the contribution of each variable to the
model predictions.
31
Figure 8: Model Interpretation: Shapley Values
Source: Prepared by the authors using Unified Health System (SUS) data.
Notes: Summary plots for SHAP values. For each variable, the points correspond to different observations. The SHAP value is the impact
of the specific variable (feature) for that specific observation. This corresponds to the survival probability relative across observations,
where a higher SHAP value has a higher survival probability relative to a lower SHAP value. The variables are ordered along the y axis
based on their importance, given by the average of their SHAP values. The higher the variable is on the y axis, the more important it is
for the model prediction.
32
is higher. As gestational weeks increase, the relationship inverts. Babies
with more than 35 weeks have a higher likelihood of survival if they had a
normal birth - having done a c-section decrease the SHAP value. There is a
non-linear interaction between c-section and gestational weeks.
Source: Prepared by the authors using the Unified Health System (SUS) data.
Notes: The figure shows the SHAP feature dependence plot of the XGB model for the interaction between gestational weeks and the low
weight dummy. The plot shows how the two variables affect the probability of survival non-linearly.
Figure 10 shows the SHAP dependence plot between the mother’s age and
having done a c-section. The graphic also shows a non-linear relationship
between the two variables. Being a teenage mother - between 10 and 20
years - and having done a c-section decreases the probability of survival (the
SHAP value is negative). As age increases between 20 and 39 years, there
is no straightforward relationship between c-sections and infant survival.
Increasing the age even further - more than 40 years - having done a c-section
now increases the probability of survival (SHAP value is positive).
We highlight that this result should be interpreted with care. Teenage
pregnancies tend to have more adverse outcomes for children (Ogawa et al.,
2019). There is, in general, more chance of preterm delivery, low birth
weight, and fetal distress (Baş et al., 2020). A sampling selection effect could
33
cause the relationship that the model is showing. The teenage mother group
has higher risk pregnancies and therefore has a higher chance of doing c-
sections (Yussif et al., 2017). The model then indicates a relation between
doing a c-section and a decrease in the probability of survival. However,
without properly accounting for pregnancy riskiness in the model, there is
no way to discern if the decrease in the survival rate comes from having a
c-section or an underlying omitted factor.
Source: Prepared by the authors using the Unified Health System (SUS) data.
Notes: The figure shows the SHAP feature dependence plot of the XGB model for the interaction between age and c-section dummy. The
plot shows how the two variables affect the probability of survival non-linearly.
34
6 Discussion
6.1 Cox Regression
The Cox regression model results have a discernible pattern where intermedi-
ate and proximal variables have a more significant and substantial impact on
mortality than distal factors. Low weight, low APGAR, and genetic anomaly
are the most important drivers of the hazard ratio, whereas schooling and
marital status have a much lesser impact. This result is in accordance with
the theoretical framework of Mosley and Chen (1984) and the subsequent
empirical studies done using it.
A correctly specified model that utilizes the whole set of intermediate and
proximate factors will, in consequence, tend to have distal factors that are not
statistically significant. That is because the proximate determinants capture
all the variance in the model. However, it is unrealistic to assume that the
actual variables coming from health data sets can measure all proximate
aspects of infant mortality correctly. That is why including socioeconomic
factors are important, and there is, in general, statistical significance in
variables of this group (Hill, 2003).
The Cox regression results are consistent with prior works in the same
literature for Brazil. For instance, low birth weight being an important risk
factor for mortality (Risso and Nascimento, 2010; Cardoso et al., 2013; Paixao
et al., 2021a). Also, the probability of survival being negatively affected by
the mother´s having fewer prenatal visits and schooling (Pinheiro et al.,
2010; Garcia et al., 2019).
35
(Moncada-Torres et al., 2021; Chmiel et al., 2021). On the other hand, the
good performance of the Cox model, in the presence of non proportional
hazards, can be interpreted as a signal of its strength and its substitution
for more complex and computationally intensive models should be done
with care. Particularly when there is a loss of interpretability when using
‘black-box‘ machine learning models.
36
ularly in less developed regions. Therefore, reducing infant mortality is still
a major challenge for Brazilian policymakers and society as a whole.
This paper contributes by using survival analysis with machine learning
models that are efficient in predicting infants at risk of death, as well as the
risk factors associated with mortality. Specifically, Random Survival Forests,
Survival Support Vector Machines, and Extreme Gradient Boosting models
can achieve a concordance index higher than 0.8 in the task of predicting
mortality in the first year of life. Furthermore, using the SHAP framework,
we provide evidence that variables such as gestational weeks, low weight,
and having a cesarean section interact non-linearly in affecting mortality. To
our knowledge, this is the first research using survival analysis and machine
learning for infant mortality in Brazil.
These findings have policy implications for Brazil since identifying new-
borns that have a high risk of death at the moment of their birth can be
a valuable input in health policy. Naturally, this requires an accurate pre-
diction of survival probabilities, a task that machine learning models are
efficient. Model predictions - in particular interpretable machine learning
models - can be incorporated into a policy framework that can help miti-
gate infant mortality by being proactive in assessing risks. Finally, future
researchers should integrate machine learning strategies with causal analysis
frameworks to create more transparent and robust models that can tackle
health problems more efficiently.
37
8 Appendix
Source: Prepared by the authors using the Unified Health System (SUS) data.
Notes: The figure shows the SHAP interaction values. The main effect of each variable in the model result is shown in the main diagonal.
The interaction effects between variables are shown by the intersection of each pair of variables outside the main diagonal.
38
References
Abate, M. G., Angaw, D. A., and Shaweno, T. (2020). Proximate determinants
of infant mortality in ethiopia, 2016 ethiopian demographic and health
surveys: results from a survival analysis. Archives of Public Health, 78(1):1–
10.
Athey, S. (2017). Beyond prediction: Using big data for policy problems.
Science, 355(6324):483–485.
Bank, W. (2021). World Development Report 2021: Data for Better Lives. The
World Bank.
Barnwal, A., Cho, H., and Hocking, T. D. (2020). Survival regression with
accelerated failure time model in xgboost. arXiv preprint arXiv:2006.04920.
Baş, E. K., Bülbül, A., Uslu, S., Baş, V., Elitok, G. K., and Zubarioğlu, U.
(2020). Maternal characteristics and obstetric and neonatal outcomes of
singleton pregnancies among adolescents. Medical science monitor: inter-
national medical journal of experimental and clinical research, 26:e919922–1.
Batista, A. F., Diniz, C. S., Bonilha, E. A., Kawachi, I., and Chiavegatto Filho,
A. D. (2021). Neonatal mortality prediction with routinely collected data:
a machine learning approach. BMC pediatrics, 21(1):1–6.
Beluzo, C. E., Alves, L. C., Silva, E., Bresan, R. C., Arruda, N. M., and
de Carvalho, T. J. (2020). Machine learning to predict neonatal mortality
using public health data from são paulo-brazil. medRxiv.
Branco, P., Torgo, L., and Ribeiro, R. P. (2016). A survey of predictive model-
ing on imbalanced domains. ACM Computing Surveys (CSUR), 49(2):1–50.
39
Breiman, L. (2001b). Using iterated bagging to debias regressions. Machine
Learning, 45(3):261–277.
Bugelli, A., Da Silva, R. B., Dowbor, L., and Sicotte, C. (2021). Health
capabilities and the determinants of infant mortality in brazil, 2004–2015:
an innovative methodological framework. BMC public health, 21(1):1–17.
Cardoso, R. C. A., Flores, P. V. G., Vieira, C. L., Bloch, K. V., Pinheiro, R. S.,
Fonseca, S. C., and Coeli, C. M. (2013). Infant mortality in a very low birth
weight cohort from a public hospital in rio de janeiro, rj, brazil. Revista
Brasileira de Saúde Materno Infantil, 13(3):237–246.
Casey, B. M., McIntire, D. D., and Leveno, K. J. (2001). The continuing value
of the apgar score for the assessment of newborn infants. New England
Journal of Medicine, 344(7):467–471.
Chawla, N. V., Japkowicz, N., and Kotcz, A. (2004). Special issue on learning
from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1):1–6.
Chiavegatto Filho, A., Batista, A. F. D. M., and Dos Santos, H. G. (2021). Data
leakage in health outcomes prediction with machine learning. comment
on “prediction of incident hypertension within the next year: Prospective
study using statewide electronic health records and machine learning”.
Journal of Medical Internet Research, 23(2):e10969.
Chmiel, F., Burns, D., Azor, M., Borca, F., Boniface, M., Zlatev, Z., White, N.,
Daniels, T., and Kiuber, M. (2021). Using explainable machine learning
to identify patients at risk of reattendance at discharge from emergency
departments. Scientific reports, 11(1):1–11.
40
de Souza, S., Duim, E., and Nampo, F. K. (2019). Determinants of neonatal
mortality in the largest international border of brazil: a case-control study.
BMC Public Health, 19(1):1–9.
Drummond, C., Holte, R. C., et al. (2003). C4. 5, class imbalance, and cost
sensitivity: why under-sampling beats over-sampling. In Workshop on
learning from imbalanced datasets II, volume 11, pages 1–8. Citeseer.
Estabrooks, A., Jo, T., and Japkowicz, N. (2004). A multiple resampling
method for learning from imbalanced data sets. Computational intelligence,
20(1):18–36.
Freitas, F., Araujo, A., Melo, M., and Romero, G. (2019). Late-onset sepsis
and mortality among neonates in a brazilian intensive care unit: a cohort
study and survival analysis. Epidemiology & Infection, 147.
Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statistical
learning, volume 1. Springer series in statistics New York.
Gamper-Rabindran, S., Khan, S., and Timmins, C. (2010). The impact of
piped water provision on infant mortality in brazil: A quantile panel data
approach. Journal of Development Economics, 92(2):188–200.
Garcia, L. P., Fernandes, C. M., and Traebert, J. (2019). Risk factors for
neonatal death in the capital city with the lowest infant mortality rate in
brazil[U+2606],[U+2606][U+2606]. Jornal de pediatria, 95:194–200.
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L.
(2018). Explaining explanations: An overview of interpretability of ma-
chine learning. In 2018 IEEE 5th International Conference on data science
and advanced analytics (DSAA), pages 80–89. IEEE.
Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and
diagnostics based on weighted residuals. Biometrika, 81(3):515–526.
Grossglauser, M. and Saner, H. (2014). Data-driven healthcare: from patterns
to actions. European journal of preventive cardiology, 21(2 suppl):14–17.
Guanais, F. C. (2015). The combined effects of the expansion of primary
health care and conditional cash transfers on infant mortality in brazil,
1998–2010. American Journal of Public Health, 105(S4):S593–S599.
Guyon, I. et al. (1997). A scaling law for the validation-set training-set size
ratio. AT&T Bell Laboratories, 1(11).
41
Hackeling, G. (2017). Mastering Machine Learning with scikit-learn. Packt
Publishing Ltd.
Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accuracy and
roc curves. Biometrics, 61(1):92–105.
Hill, K. (2003). Frameworks for studying the determinants of child survival.
Bulletin of the World Health Organization, 81:138–139.
Hoo, Z. H., Candlish, J., and Teare, D. (2017). What is an roc curve?
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. (2008).
Random survival forests. The annals of applied statistics, 2(3):841–860.
Jabbar, H. and Khan, R. Z. (2015). Methods to avoid over-fitting and under-
fitting in supervised machine learning (comparative study). Computer
Science, Communication and Instrumentation Devices, pages 163–172.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction
to statistical learning, volume 112. Springer.
Kleinberg, J., Ludwig, J., Mullainathan, S., and Obermeyer, Z. (2015). Pre-
diction policy problems. American Economic Review, 105(5):491–95.
Lopes, S. A. V. d. A., Guimarães, I. C. B., Costa, S. F. d. O., Acosta, A. X.,
Sandes, K. A., and Mendes, C. M. C. (2018). Mortality for critical congeni-
tal heart diseases and associated risk factors in newborns. a cohort study.
Arquivos brasileiros de cardiologia, 111:666–673.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting
model predictions. In Proceedings of the 31st international conference on
neural information processing systems, pages 4768–4777.
Mani, I. and Zhang, I. (2003). knn approach to unbalanced data distributions:
a case study involving information extraction. In Proceedings of workshop
on learning from imbalanced datasets, volume 126. ICML United States.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2021).
A survey on bias and fairness in machine learning. ACM Computing Surveys
(CSUR), 54(6):1–35.
Miao, F., Cai, Y.-P., Zhang, Y.-T., and Li, C.-Y. (2015). Is random survival forest
an alternative to cox proportional model on predicting cardiovascular
disease? In 6TH European conference of the international federation for
medical and biological engineering, pages 740–743. Springer.
42
Miao, F., Cai, Y.-P., Zhang, Y.-X., Fan, X.-M., and Li, Y. (2018). Predictive
modeling of hospital mortality for patients with heart failure by using an
improved random survival forest. IEEE Access, 6:7244–7253.
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S., and
Geleijnse, G. (2021). Explainable machine learning can outperform cox
regression predictions and provide insights in breast cancer survival. Sci-
entific Reports, 11(1):1–13.
Musavi, M. T., Ahmed, W., Chan, K. H., Faris, K. B., and Hummels, D. M.
(1992). On the training of radial basis function classifiers. Neural networks,
5(4):595–603.
Ogawa, K., Matsushima, S., Urayama, K. Y., Kikuchi, N., Nakamura, N., Tani-
gaki, S., Sago, H., Satoh, S., Saito, S., and Morisaki, N. (2019). Association
between adolescent pregnancy and adverse birth outcomes, a multicenter
cross sectional japanese study. Scientific Reports, 9(1):1–8.
Paim, J., Travassos, C., Almeida, C., Bahia, L., and Macinko, J. (2011). The
brazilian health system: history, advances, and challenges. The Lancet,
377(9779):1778–1797.
Paixao, E. S., Blencowe, H., Falcao, I. R., Ohuma, E. O., dos Santos Rocha, A.,
Alves, F. J. O., Maria da Conceição, N. C., Suárez-Idueta, L., Ortelan, N.,
Smeeth, L., et al. (2021a). Risk of mortality for small newborns in brazil,
2011-2018: A national birth cohort study of 17.6 million records from
43
routine register-based linked data. The Lancet Regional Health-Americas,
3:100045.
Paixao, E. S., Bottomley, C., Pescarini, J. M., Wong, K. L., Cardim, L. L.,
Ribeiro Silva, R. d. C., Brickley, E. B., Rodrigues, L. C., Oliveira Alves,
F. J., Leal, M. d. C., et al. (2021b). Associations between cesarean delivery
and child mortality: A national record linkage longitudinal study of 17.8
million births in brazil. PLoS medicine, 18(10):e1003791.
Pölsterl, S., Navab, N., and Katouzian, A. (2015). Fast training of support
vector machines for survival analysis. In Joint European Conference on
Machine Learning and Knowledge Discovery in Databases, pages 243–259.
Springer.
Ramos, R., Silva, C., Moreira, M. W., Rodrigues, J. J., Oliveira, M., and
Monteiro, O. (2017). Using predictive classifiers to prevent infant mortality
in the brazilian northeast. In 2017 IEEE 19th International Conference on e-
Health Networking, Applications and Services (Healthcom), pages 1–6. IEEE.
44
Russo, L. X., Scott, A., Sivey, P., and Dias, J. (2019). Primary care physicians
and infant mortality: evidence from brazil. PLoS One, 14(5):e0217614.
Shapley, L. S., Roth, A. E., et al. (1988). The Shapley value: essays in honor of
Lloyd S. Shapley. Cambridge University Press.
Sommer, J., Sarigiannis, D., and Parnell, T. (2019). Learning to tune xgboost
with xgboost. arXiv preprint arXiv:1909.07218.
Song, Y.-Y. and Ying, L. (2015). Decision tree methods: applications for
classification and prediction. Shanghai archives of psychiatry, 27(2):130.
Valter, R., Santiago, S., Ramos, R., Oliveira, M., Andrade, L. O. M., and
de HC Barreto, I. C. (2019). Data mining and risk analysis supporting
decision in brazilian public health systems. In 2019 IEEE International
Conference on E-health Networking, Application & Services (HealthCom),
pages 1–6. IEEE.
45
Wei, L.-J. (1992). The accelerated failure time model: a useful alternative
to the cox regression model in survival analysis. Statistics in medicine,
11(14-15):1871–1879.
Yussif, A.-S., Lassey, A., Ganyaglo, G. Y.-k., Kantelhardt, E. J., and Kielstein,
H. (2017). The long-term effects of adolescent pregnancies in a community
in northern ghana on subsequent pregnancies and births of the young
mothers. Reproductive health, 14(1):1–7.
46