0% found this document useful (0 votes)

310 views19 pages

Latent Profile Analysis in R: A Tutorial and Comparison To Mplus

This document provides a tutorial on conducting latent profile analysis (LPA) in R. It compares results from R packages "mclust" and "tidyLPA" to results from Mplus. The tutorial uses simulated data with three latent classes to illustrate four LPA model configurations of increasing complexity: EEI, EEE, VVI, and VVV. These vary in whether indicator variable variances and covariances are constrained or allowed to vary within and between classes. The tutorial aims to help applied researchers conduct LPA in R and compare results to Mplus.

Uploaded by

Daimon Michiko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

310 views19 pages

Latent Profile Analysis in R: A Tutorial and Comparison To Mplus

Uploaded by

Daimon Michiko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Latent Profile Analysis in R: A tutorial and comparison

to Mplus

Klaas J. Wardenaar
University Medical Center Groningen (UMCG)
[email protected]

April 9, 2021

Version 1.1

Summary
Latent profile analysis (LPA) can be used to identify data-driven classes of individuals
based on scoring patterns across continuous input variables. LPA can be conducted using
commercially available software packages like Mplus, Latent Gold, and SAS, but it is also
possible to use freely available R-packages. This tutorial aims to (1) help applied researchers
to conduct an LPA in R and (2) to show how results obtained in R compare to those obtained
in Mplus.

1. Background
Latent profile Analysis (LPA) is a type of latent variable model that can be use to identify
latent classes or mixtures in a dataset, based on a set of continuous input variables (Gibson,
1959; Oberski, 2016). LPA is closely related to the widely used technique of Latent Class
Analysis, which is used to estimate latent classes based on discrete input variables (Nylund-
Gibson & Choi, 2018). In medical and psychological science, LPA can be useful when
considerable between-subject heterogeneity exists in scores on a range of variables and when
this variation cannot be explained by known, manifest variables (e.g., Wolfe 1970; Sterba
2013). Here, LPA can help to identify or approximate possibly meaningful sub-grouping of
subjects that may help to better understand sample heterogeneity (Sterba 2013).
Generally, LPA works under the assumption that sample (residual) variance can be reduced
by assuming a categorical latent variable that effectively subdivides the sample into >=2
subgroups that are more homogeneous in terms of their patterns of variable means and
(co)variances. In case an LPA model is found to fit a dataset well, it is often the case that
subjects in each class resemble each other relatively well in terms of their scores on the input
variables. Depending on the model configuration, the identified classes can show different
class-specific patterns of means and class-specific or class-varying variances.

1
Many different LPA model configurations are possible, each with different sets of parameters
that are either freely estimated in each class specifically or constrained to be equal across
classes in the resulting model (e.g., Celeux & Govaert, 1995). Model configurations with
many class-specific parameters can be very flexible and, as such, may fit the data well. A
downside is that these models are more complex (much more parameters to be estimated).
Using criteria, such as the Bayesian Information criterion (BIC) helps to find a model that
strikes a good balance between model-fit and model-complexity. When doing an LPA, most
applied researchers will usually be most interested in the differences and/or overlap between
the classes’ specific patterns of parameter estimates. These can be used to characterize the
classes and, possibly, provide clues about underlying mechanisms (Sterba 2013).

This tutorial
LPA is less widely used than other latent variable models and, possibly due to that, has
long been only available in specialized software packages such as Mplus. Luckily, ongoing
developments in many different scientific fields (e.g., ecology, econometrics) have yielded a
number of packages that also allow users to conduct LPA in the open-source R-platform.
However, the use of R does require experience and documentation of packages can be rather
limited or technical, making it less easily accessible for applied researchers. Therefore, this
tutorial is aimed to help applied researchers to get going with LPA in R, illustrating the use
of several packages and, for reference, providing a comparison of the results obtained in R
with results obtained with Mplus.

2. Data
All examples in this tutorial will be using a simulated dataset (see Appendix for code). The
simulated data consist of 300 cases, each with responses on 10 continuous variables. The
data are simulated to consist of 3 classes, each with different mean scores across the 10
variables and each with a different variable (co)variance matrix.The following figure shows
the simulated data, with the different colors indicating the different classes.

2
10
value

var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
variable

Note that the data were simulated to have a latent structure that is quite obvious. Even if the
classes’ lines in the plot had the same color, some clustering would still be observable. In real
life research settings, data with such obvious patterns are never - or seldom - encountered.

3. Packages
3.1 R-packages
LPA models can be fit in R (version 4.0.3; R-core team, 2020), running in Rstudio (version
1.3.1093 used here). There are many R-packages that offer some form of latent class/mixture
analytic functionality. However, the majority of packages is focused on analyses with discrete
indicator variables (e.g., ‘poLCA’, Linzer & Lewis, 2011; ‘e1071::lca’, Meyer et al., 2019)
or require a lot of coding to define the required models (e.g., openMX, Neale et al., 2016).
For the current tutorial, two packages were selected that work with continuous indicator
variables and that require limited user coding. ‘mclust’ package (Scrucca et al., 2016) is the
first package that will be illustrated. This package is a specialist tool and allows for a wide
variety of model configurations to be estimated; as such it offers much more functionality
than most researchers will likely need. Therefore, the current tutorial focuses on a limited
range of relatively simple LPA model configurations that are commonly encountered in the
literature. Conveniently, Rosenberger et al., (2018) recently developed the package ‘tidyLPA’
which can be used as a relatively easy to use front-end for estimating common LPA models
with ‘mclust’, basically streamlining some of the in- and output functionality of ‘mclust’.

3
Although ‘tidyLPA’ is easy to use, this comes at the expense of restricting modeling options
to a few, oft-used options. For completeness, both the ‘mclust’ and ‘tidyLPA’ approach will
be illustrated.

3.2 Mplus
Because it is one of the most widely used commercial software packages for latent variable
modeling, Mplus (version 5, Muthén & Muthén, 2010) is used to fit some of the same LPA
models as are estimated using the two R-packages. The results obtained in Mplus are
compared to the results obtained with R and the extent of overlap and/or differences between
the software packages is evaluated.

4. Model configurations
LPA models can be configured in many different ways (see Scrucca et al., 2016; Banfield &
Raftery, 1993; Celeux & Govaert, 1995; Pastor et al., 2007). Here, four variants of increasing
complexity will be covered. In all models, cluster-specific means are estimated for each of
the k classes: each class has its own associated pattern of mean scores on the indicator
variables (e.g., var1-var10). The different model versions vary in terms of how the class-specific
(co)variance matrices of the indicator variables are constrained or allowed to vary within and
between classes. See the following table for an overview of the three models:
Variances Covariances
Model variant vary within class? vary beteen class? vary within class? vary beteen class?
EEI yes no no; fixed to 0 no; fixed to 0
EEE yes no yes no
VVI yes yes no; fixed to 0 no; fixed to 0
VVV yes yes yes yes

In the first, most parsimonious LPA model variant, the indicator variables are set to have
zero covariances within and across classes. Indicator-variable variances are allowed to vary
within classes but are constrained to be equal between classes. Due to the latter, only one
set of variances needs to be estimated, resulting in a parsimonious model. In ‘mclust’, which
uses Gaussian Mixture Modeling vernacular, this variant is also referred to as the EEI (equal
volume, equal shape [and undefined orientation]) model (Scrucca et al., 2016).
The second model resembles the first model, but here the complete variable (co)variance
matrix is estimated: i.e. both the indicator variances and covariances are estimated. As in
the first model, the resulting (co)variance matrix is constrained to be equal across classes. In
‘mclust’ jargon, this type of model is called an EEE (equal volume, equal shape, and equal
orientation) model.
The third model variant allows for more variation across classes. All within- and between
class variable covariances are set to zero, but variances are now allowed to vary within and
between classes. As a result, the number of variance parameters to be estimated increases
with each class that is added to the model. In ‘mclust’, this type of model is called a VVI
(varying volume, varying shape [and undefined orientation]) model.

4
The fourth, most complex model allows for most variation in (co)variances across classes:
both the variances and covariances are allowed to vary within and between classes, resulting in
estimation of class specificcovariance matrices. This type of model is called a ‘VVV’ (varying
volume, varying shape, varying orientation) model .
Of course it is possible to design many other LPA models that, for instance, use partially
constrained (co)variance matrices across classes. The ‘mclust’ package alone covers an
additional 10 different model variants. Mplus, which takes a different approach to LPA than
‘mclust’, also allows for additional and highly customized model variations. These are not
covered here, but will be relatively easy to implement for users once they are familiar with
the four basic variants mentioned above.

5. Tutorial
5.1 ‘mclust’
To run LPA using the ‘mclust’ package, the Mclust() function can be used. This function
requires data as its minimum input. In addition, the number of clusters to fit (G) and model
variants to fit and select from (modelNames) can be entered (defaults are: G=1-9 and all
fourteen model variants). In many cases, we may want to consider a more limited range of
model variants to with different numbers of classes. For multivariate data, a total of fourteen
options are available (see help("mclustModelNames") for an overview). Each model variant
is estimated for each value of the specified range of G using Expectation Maximization (EM),
using the Bayesian Information Criterion (BIC) to compare different model variants and
select the optimal one.
Conducting a latent variable model or mixture analysis with Mplus or comparable software
usually entails fitting models with different numbers of classes (that are otherwise configured
similarly) and to compare the fit between these models to select the best model. The
default approach taken in the ‘mclust’ package is slightly different in that the main modeling
command Mclust() fits multiple models for all possible combinations of the specified number
of classes and model configurations . The eventual output of ‘mclust’ is the BIC-selected best
model variant with the optimal combination of number of classes and model configuration.
An advantage of the ‘mclust’ approach is that it is efficient and flexible: given a number of
classes, the best model configuration is selected straight away and we do not have to run all
model combinations one by one. This makes the approach especially suitable for exploratory
analyses. However, like stated above, this approach is somewhat less usual in some fields,
where estimation of LPA is approached in a more confirmatory fashion. In addition, it does
take some control away from the researcher, who may want to primarily focus on selecting
the number of classes, given a particular , theory-based model configuration that is kept
constant throughout analyses, which is often the chosen strategy in software like Mplus and
made easily available through the ‘tidyLPA’ package.
Here, both approaches are shown: (1) the ‘mclust’ aproach and (2) the ‘tidyLPA’ approach.

5
5.1.1 The ‘mclust’ approach
In this approach, we run consecutive models with increasing numbers of classes (G=1 to
G=9), for each model we let the package fit the four above described model variants (“EEI”,
“EEE”, “VVI” and “VVV”) and select the best-fitting one. We start by loading the package
(library(mclust)) and creating an object mnames that contains the names of the four models
we want to be fitted. Next, we use the Mclust() function to fit the models:
library(mclust)

## Package 'mclust' version 5.4.7

## Type 'citation("mclust")' for citing this R package in publications.
mnames <- c("EEI", "EEE", "VVI", "VVV")

# Fit 1-5 class model

mod_g1_9 <- Mclust(dat1[, 1:10], modelNames = mnames)

We can first look at the optimal number of classes and the optimal model variant that were
selected, using the following code:
# Opimal number of classes
mod_g1_9$G

## [1] 3
# Optimal model variant
mod_g1_9$modelName

## [1] "EEI"
This shows us that a 3-class EEI model was selected as the best model based on the BIC. We
can get a better perspective of this model’s performance if we compare it to the other fitted
models. We can do this by taking a closer look at the other models’ BIC values:
mod_g1_9$BIC

## Bayesian Information Criterion (BIC):

## EEI EEE VVI VVV
## 1 -14661.61 -12986.57 -14661.61 -12986.57
## 2 -13275.65 -12611.60 -13095.82 -12678.12
## 3 -12142.11 -12290.68 -12163.08 -12731.18
## 4 -12178.44 -12330.04 -12241.21 -12970.57
## 5 -12223.14 -12379.96 -12320.47 -13231.35
## 6 -12264.04 -12423.72 -12406.59 -13522.40
## 7 -12269.39 -12451.25 -12497.03 -13748.91
## 8 -12298.30 -12483.35 -12568.62 -13980.82
## 9 -12347.11 -12529.40 -12645.50 -14269.34
##

6
## Top 3 models based on the BIC criterion:
## EEI,3 VVI,3 EEI,4
## -12142.11 -12163.08 -12178.44
mod_g1_9$loglik

## [1] -5951.275
mod_g1_9$df

## [1] 42
Here, we can see the BICs for all fitted models (in this case 1-9 classes and 4 model variants:
36 models in total). We can see that the closest contenders were a 3-class model with a VVI
configuration (with variances allowed to vary both within and between classes) and a 4-class
EEI model.
Note that the BIC values are negative, and that values that are more negative are considered
to fit poorer than values closer to 0. This may strike some users as odd, given that in
many modeling applications, we are used to consider lower BIC values to indicate better
fit. However, the ‘mclust’ approach of maximizing the BIC is correct for the used modeling
approach (see e.g., Fraley & Raftery, 2003; Banfield & Raftery, 1993). For theoretical reasons,
the BIC is calculated in’mclust’ as:

BIC = 2(Loglikelihood) − df (log(n))

where df is the number of parameters (degrees of freedom) and n is the sample size. We can
see, that here, higher loglikelihood values lead to a higher BIC. For the best fitting model in
our example, the loglikelihood is -5951.275, the number of parameters is 42 and the sample
size is 300. If we plug these into the formula, we get the following BIC:

BIC = 2(−5951.275) − 42(log(300)) = −12142.11

As expected, the obtained value indeed corresponds to the BIC we got in the model output
(see above). Other software packages, such as Mplus and ‘tidyLPA’ (see below) use a slightly
different formula to calculate the BIC, where higher loglikelihood values lead to lower BIC
values:

BIC = df (log(n)) − 2(Loglikelihood)

When we plug the best-model values into this formula, we get:

BIC = 42(log(300)) − 2(−5951.275) = 12142.11

We can see that the absolute BIC value is the same, irrespective of the used formula, with
only the sign differing between the two approaches.

7
Now, let us continue by taking a closer look at the class sizes and the mean values and
variances of the ten variables for the three identified classes. Because the best model has and
EEI configuration, printing out a single variance matrix from one class will suffice, because
the variances are the same across classes (this is different in cases where the optimal model
allows for between-class differences in (co)variances:
# tabulate class-membership numbers
table(summary(mod_g1_9)$classification)

##
## 1 2 3
## 100 110 90
# display the means per class
mod_g1_9$parameters$mean

## [,1] [,2] [,3]

## var1 2.023572 6.083019 2.0304980
## var2 3.056837 6.937785 4.8997993
## var3 1.718166 1.107873 7.9385872
## var4 6.086727 2.854402 6.9657536
## var5 7.101109 2.225675 4.7416266
## var6 3.691936 8.081570 2.2469977
## var7 5.009819 2.817889 0.8862725
## var8 8.316894 8.080891 3.9350536
## var9 1.978136 1.896294 9.1537131
## var10 1.191249 5.142590 8.0332628
# select the first of 3 10-by-10 variance matrices
mod_g1_9$parameters$variance$sigma[1:10, 1:10, 1]

## var1 var2 var3 var4 var5 var6 var7 var8

## var1 2.539537 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## var2 0.000000 2.117941 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## var3 0.000000 0.000000 2.509035 0.000000 0.000000 0.000000 0.000000 0.000000
## var4 0.000000 0.000000 0.000000 2.812796 0.000000 0.000000 0.000000 0.000000
## var5 0.000000 0.000000 0.000000 0.000000 2.647399 0.000000 0.000000 0.000000
## var6 0.000000 0.000000 0.000000 0.000000 0.000000 2.694156 0.000000 0.000000
## var7 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.368379 0.000000
## var8 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.540704
## var9 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## var10 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## var9 var10
## var1 0.000000 0.000000
## var2 0.000000 0.000000
## var3 0.000000 0.000000
## var4 0.000000 0.000000

8
## var5 0.000000 0.000000
## var6 0.000000 0.000000
## var7 0.000000 0.000000
## var8 0.000000 0.000000
## var9 2.509146 0.000000
## var10 0.000000 2.204457
From this output, we can see that the three classes have class-sizes of 100, 110 and 90,
respectively. These sizes along with the patterns of class-specific mean variable scores
correspond with those that served as the input for the simulation. As specified, the variances
vary slightly across variables.
The class allocation in LPA is probabilistic in nature: each subject in the data is assigned a
probability for each of the estimated classes, based on their pattern of scores on the input
variables. These probabilities can be inspected in the z-matrix (here: mod_g1_9$z). Subjects
can be allocated to one of the classes based on their highest class-probability. These posterior
allocations can be found in the classification matrix (here: mod_g1_9$classification).
These class-allocations were tabulated above to evaluate class sizes.
Note that class-allocation in this way only yields useful classifications if the patterns of
class-probabilities allow for allocation of each subject to a single class with sufficient certainty.
In case of too much uncertainty, using the classification is not advised. We can inspect the
uncertainty of allocation for all subjects (here: mod_g1_9$uncertainty). This gives us a list
of uncertainties for all subjects. Next, we can evaluate the extent of uncertainty by looking
at the maximum uncertainty or, for instance, the averaged uncertainty across subjects:
max(mod_g1_9$uncertainty)

## [1] 0.003852604
mean(mod_g1_9$uncertainty)

## [1] 2.607299e-05
Here, the uncertainty is atypically low because of the used idealized data. In many cases,
uncertainty is higher and it may be of interest to investigate it further, for instance, by
looking at (e.g., plotting or tabulating) the uncertainty per class:
cprob <- cbind(mod_g1_9$z, mod_g1_9$classification)
cprob <- as.data.frame(cprob)
colnames(cprob) <- c("prob (class 1)", "prob (class 2)", "prob (class 3)", "class")
aggregate(cprob[, 1:3], list(cprob$class), mean)

## Group.1 prob (class 1) prob (class 2) prob (class 3)

## 1 1 9.999824e-01 1.761611e-05 3.350070e-10
## 2 2 5.486300e-05 9.999451e-01 7.192849e-08
## 3 3 2.591358e-08 1.675400e-07 9.999998e-01
Here we can see that for each class, the mean probability of membership of that class was

9
very high and probabilities for the other classes were very low. Again, these values reflect the
idealized data; in many cases, more differentiated class-probability patterns may be observed.

5.1.2 ‘tidyLPA’
When running ‘mclust’ models from the ‘tidyLPA’ package, the estimations are approached
a little differently. By default, the number of models that can be estimated is restricted to
six relatively common variants (see help(tidyLPA::estimate_profiles) for details). Of
these models, four can be estimated with ‘mclust’. These models differ in terms of how the
variances and covariances are allowed to vary or constrained to be equal across classes and
whether covariances are or are not fixed to zero within classes. These models have each been
allocated a number (1-6) in ‘tidyLPA’. See the table below how these numbers correspond to
the above mentioned model configurations.
Model variant model number
EEI 1
EEE 3
VVI 2
VVV 6
Models four and five are different and can only be fit when R is interfacing with Mplus using
the ‘MplusAutomation’ package. This is outside the scope of this tutorial.
To fit an LPA model in ‘tidyLPA’, we use the estimate_profiles() function. Here, we need
to enter the data.frame to be used (df) and the number of classes to estimate (n_profiles).
To determine what kind of model configuration to estimate, the authors have provided two
different ways in the estimate_profiles() command. In the first approach, we simply tell what
model number (see Table 2) we want to estimate using the models= argument For instance, if
we want to estimate LPA models with 1 to 9 classes, with an EEI configuration, we can use:
suppressMessages(library(tidyLPA))
suppressMessages(mod_1c_v1 <- estimate_profiles(df = dat1[1:10], n_profiles = 1:9,
models = 1))

The package issues a standard message by default notifying us that we use the models
argument and that the variances and covariances arguments are ignored. Here, we suppress
the message. The output shows us the model-fit information for the 1 to 9-class EEI models:
mod_1c_v1

## tidyLPA analysis using mclust:

##
## Model Classes AIC BIC Entropy prob_min prob_max n_min n_max BLRT_p
## 1 1 14587.53 14661.61 1.00 1.00 1.00 1.00 1.00
## 1 2 13160.84 13275.65 1.00 1.00 1.00 0.30 0.70 0.01
## 1 3 11986.55 12142.11 1.00 1.00 1.00 0.30 0.37 0.01
## 1 4 11982.14 12178.44 0.91 0.79 1.00 0.14 0.33 0.05
## 1 5 11986.09 12223.14 0.89 0.75 1.00 0.06 0.33 0.30

10
## 1 6 11986.25 12264.04 0.84 0.78 1.00 0.08 0.30 0.16
## 1 7 11950.86 12269.39 0.81 0.77 0.91 0.08 0.17 0.01
## 1 8 11939.04 12298.30 0.82 0.79 0.92 0.08 0.17 0.02
## 1 9 11947.10 12347.11 0.82 0.76 0.92 0.06 0.16 0.57
Here, we can see that the BIC takes on different values compared to ‘mclust’, with lower
rather than higher values indicating best fit (see explanation above). Instead of drawing the
fit indices (e.g., BIC, AIC) directly from the ‘mclust’ package, ‘tidyLPA’ only draws the ‘raw’
loglikelihood values and posterior probabilities from the ‘mclust’ output and (re)calculates
the BIC, AIC, entropy etc. to mirror as well as possible those provided in Mplus. In addition,
p-values of the the bootstrapped likelihood ratio test (BLRT) are given by default. These
p-values indicate for each k-class model whether adding the kth class adds significantly to
model fit. For instance, the BLRT_p of 0.04 for the 4-class model indicates that adding
a fourth class led to an improvement of model fit compared to the 3-class model that was
just significant when using an alpha of 0.05. When using ’mclust’ as a stand-alone package,
the BLRT is not calculated by default, but can be obtained with the mclustBootstrapLRT
function.
Next, we can run similar commands, with different values for the models argument. It is
also possible to estimate more than one model variant in a single run by providing a vector
of model numbers (e.g., models=c(1,6)).
The second approach to determine the model variant to be estimated is by using the variances
and covariances arguments in the estimate_profiles command. The variances argument
can have the values "equal" (i.e. variable variances constrained to be equal across classes) or
"varying" (i.e. variances allowed to vary across classes). The covariances argument can
have the values: "zero" (i.e. all variable covariances fixed to zero), "equal" (i.e. covariances
constrained to be equal across classes) or "varying" (i.e. covariances allowed to vary across
classes). Now, if we again want to estimate LPA models with 1 to 9 classes, with an EEI
configuration, we can use:
mod_1c_v2 <- estimate_profiles(df = dat1[1:10], n_profiles = 1:9, variances = "equal",
covariances = "zero")
mod_1c_v2$model_1_class_2$fit

## Model Classes LogLik AIC AWE

## 1.000000e+00 2.000000e+00 -6.549418e+03 1.316084e+04 1.354347e+04
## BIC CAIC CLC KIC SABIC
## 1.327565e+04 1.330665e+04 1.310084e+04 1.319484e+04 1.317734e+04
## ICL Entropy prob_min prob_max n_min
## -1.327568e+04 9.996750e-01 9.999484e-01 9.999990e-01 3.000000e-01
## n_max BLRT_val BLRT_p
## 7.000000e-01 1.448699e+03 9.900990e-03
As expected, the output is exactly the same as the one we got when using the models
argument. Now, if we want to, we can continue fitting other model variants using either
approach. Next, we can manually compare the models in terms of their BIC and AIC values

11
or we can use compare_solutions to do this in an automated fashion. Interestingly, the
best model can be selected based on the integrated information about several fit indices
(analytic hierarchy process; see help(tidyLPA::AHP)) for more details). We obtain the model
comparison with the following code.
comp <- suppressWarnings(compare_solutions(mod_1c_v1))
comp$fits

## # A tibble: 9 x 18
## Model Classes LogLik AIC AWE BIC CAIC CLC KIC SABIC ICL
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 -7274. 14588. 14834. 14662. 14682. 14550. 14611. 14598. -14662.
## 2 1 2 -6549. 13161. 13543. 13276. 13307. 13101. 13195. 13177. -13276.
## 3 1 3 -5951. 11987. 12506. 12142. 12184. 11905. 12032. 12009. -12142.
## 4 1 4 -5938. 11982. 12638. 12178. 12231. 11878. 12038. 12010. -12218.
## 5 1 5 -5929. 11986. 12778. 12223. 12287. 11860. 12053. 12020. -12273.
## 6 1 6 -5918. 11986. 12915. 12264. 12339. 11838. 12064. 12026. -12353.
## 7 1 7 -5889. 11951. 13016. 12269. 12355. 11780. 12040. 11997. -12376.
## 8 1 8 -5873. 11939. 13141. 12298. 12395. 11747. 12039. 11991. -12409.
## 9 1 9 -5866. 11947. 13285. 12347. 12455. 11733. 12058. 12005. -12458.
## # ... with 7 more variables: Entropy <dbl>, prob_min <dbl>, prob_max <dbl>,
## # n_min <dbl>, n_max <dbl>, BLRT_val <dbl>, BLRT_p <dbl>
comp$best

## LogLik AIC AWE BIC CAIC CLC KIC SABIC ICL

## 9 8 3 3 3 9 3 8 3
As shown in the ‘mclust’ examples above, the EEI model (models=1) with 3 classes fit the
data best, not only according to the BIC, but according to the integrated information from a
broader range of fit indices. Note that we did not fit any competing EEE, VVV and VVI
models here. If we would have wanted to compare fit to these model variants too, we should
have actively told the package to fit these too, either by specifying a vector of different types
of models to fit in the models argument or by repeating the analysis with different values for
the variances and covariances arguments.

5.2 Comparison to Mplus

For comparison, we fit the equivalent EEI model variants in Mplus. For this, we can mostly
rely on the defaults of Mplus. We only need to specify that we want to estimate class-specific
variable means. The variances are automatically constrained to be equal across classes and
within class variable covariances are fixed to 0. Here we have the separate codes for fitting
the first 4 models wit 1-4 class models (code for the 5-9 class models not shown).
!! 1-class model
DATA: file='dat1.dat';
VARIABLE:

12
names=id v1-v10;
usevariables= v1-v10;
classes = c(1);
ANALYSIS:
type = mixture;
starts= 250 50;
MODEL:
%overall%
%C#1%
[v1-v10];

!! 2-class model
DATA: file='dat1.dat';
VARIABLE:
names=id v1-v10;
usevariables= v1-v10;
classes = c(2);
ANALYSIS:
type = mixture;
starts= 250 50;
MODEL:
%overall%
%C#1%
[v1-v10];
%C#2%
[v1-v10];

!! 3-class model
DATA: file='dat1.dat';
VARIABLE:
names=id v1-v10;
usevariables= v1-v10;
classes = c(3);
ANALYSIS:
type = mixture;
starts= 250 50;
MODEL:
%overall%
%C#1%
[v1-v10];
%C#2%
[v1-v10];
%C#3%
[v1-v10];

13
!! 3-class model
DATA: file='dat1.dat';
VARIABLE:
names=id v1-v10;
usevariables= v1-v10;
classes = c(4);
ANALYSIS:
type = mixture;
starts= 250 50;
MODEL:
%overall%
%C#1%
[v1-v10];
%C#2%
[v1-v10];
%C#3%
[v1-v10];
%C#4%
[v1-v10];
Each of the models was fitted with multiple initial stage random starts and multiple final
stage optimizations to reduce the risk of finding solutions on a local maximum. For the 1-6
class models, 250 iniitial and 50 final starts were used. For the 7-class model 1000 initial
starts and 200 final optimizations were used and for the 8- and 9-class models, a replicable
global solution could only be obtained with 5000 initial stage starts and 1000 final stage
optimizations. The fit indices for each of the fitted models is displayed in the table below.
For now, we only look at the estimated fit indices.
EEI models
Classes AIC BIC
1 14587.535 14587.535
2 13160.836 13275.653
3 11986.551 12142.11
4 11957.71 12154.01
5 11951.766 12188.808
6 11944.632 12222.415
7 11935.501 12254.027
8 11929.428 12288.695
9 11925.203 12325.212
If we compare the BIC values for the EEI models to those obtained with ‘mclust’ directly
and via ‘tidyLPA’, we can see that the results are quite similar for the less complex models,
with the BIC-values for the 2- and 3-class models being exactly the same in terms of their
absolute values. When taking a closer look at the 3-class model that was previously found to
be the best model, we can also see that that the Mplus-based classifications (n=100, n=110,
and n=90) and entropy estimates (entropy=1.0) are similar to those obtained with ‘mclust’

14
and/or ‘tidyLPA’.
If we look at the larger picture and evaluate the overlap between the BIC values obtained
for all models with class numbers ranging from 1 to 9, we can see that the estimated BIC
values show more differences between the R-packages and Mplus for the models with larger
numbers of classes. This can be explained by the different methods used by Mplus and
‘mclust’/’tidyLPA’ to generate the start values for estimation (see below). These differences
can cause the packages to yield different results for more complex models and/or models that
are increasingly misspecified (as the models with more than 3 classes were in our example).

6. Final comments
In this tutorial we have seen that it is relatively easy to run an LPA in R. As with all
data-driven analyses, care should be taken not to over-interpret the results. In the end, LPA
identifies classes in such a way as to optimally explain variance on a range of variables and
not to optimize interpretability or usefulness.
Another remark with regard to LPA results is that class-membership is probabilistic in nature:
each subject in an analyzed sample has a probability of being in each of the model’s classes.
Subjects can be allocated to a class based on their highest class-probability. Importantly,
this can only be done with enough certainty if separation between classes is sufficient, which
means that we see that each subject has a clear highest probability (e.g., p=0.9) for one of
the classes. The entropy statistic is often used to quantify this separation, with values <0.8
being taken to indicate insufficient separation to allocate subjects to one class. In such cases,
the class probabilities themselves can still be used in further investigations.
Depending on one’s preferences, one can choose either the ‘mclust’ or the ‘tidyLPA’ package.
The latter may be especially attractive for persons that already have experience with Mplus
or comparable software. It is important to note, though, that differences exist between the
estimation approaches taken by both R-packages and by Mplus.
Mixture models such as LPA need to be estimated in an iterative fashion, using an EM
algorithm that is very sensitive to the used start values, where poor start values can lead the
estimation process to a poor solution. In addition, there is often a chance that the iterative
EM process leads to a solution that is at a local, rather than a global maximum in the
likelihood ‘landscape’.
Different methods have been developed to initialize the estimation (get starting values) in
such a way as to optimize the chance of arriving at an accurate model solution (Biernacki et
al., 2003; Shireman et al., 2016). Mplus and ‘mclust’ take two different approaches. Mplus
uses a ‘brute force’ approach and reruns the model with multiple sets of random start values,
each generated from uniform distributions of values with ranges that are based on the data
(Muthén & Muthén, 1998-2015). The best solution is the model with the highest loglikelihood
that was arrived upon from at least two different starting points. In contrast, ‘mclust’ uses
the data itself to generate a set of plausible start values using hierarchical clustering. This
means that the start values are informed by the (hierarchical structure of) the data, which
can work well. A downside is that only a single set of start values is used, so the risk of

15
arriving at a local optimum is not addressed. These approaches show clear differences and
have been shown previously to not always arrive at the same solutions, with some authors
being especially critical of the ‘mclust’ approach (Shireman et al., 2017). In our example,
we did not see much difference between the two approaches, but it should be noted that
many real-world data sets do not have such a clear latent structure, making it much more
challenging to arrive at an accurate solution, given the data.
Based on the provided examples and these technical differences between ‘mclust’ and Mplus,
it is probably better to see ‘mclust’ and ‘tidyLPA’ as useful alternatives for Mplus, rather
than as tools to exactly mimic what can be done with the latter software.

References
Banfield, J., Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering.
Biometrics. 49: 803–821.
Biernacki,C.,Celeux,G., Govaert,G.(2003).Choosing starting values for the EM algorithm
for getting the highest likelihood in multivariate Gaussian mixture models. Computational
Studies and Data Analysis, 41, 561–575.
Celeux, G., Govaert, G. (1995) Gaussian parsimonious clustering models. Pattern Recognition.
28:781–793
Fraley, C., Raftery, A.E. (1998) How many clusters? Which clustering method? Answers via
model-based cluster analysis. The Computer Journal 41: 578-588.
Gibson, W. A. (1959). Three multivariate models: Factor analysis, latent structure analysis,
and latent profile analysis. Psychometrika, 24, 229–252.
Linzer, D.A., Lewis J. (2011). “poLCA: an R Package for Polytomous Variable Latent Class
Analysis.” Journal of Statistical Software. 42(10): 1-29
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. (2019). e1071: Misc
Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU
Wien. R package version 1.7-3. https://CRAN.R-project.org/package=e1071
Muthén, L., Muthén, B., 1998–2015. Mplus User’s Guide, 7th ed., Muthén & Muthén, Los
Angeles.
Neale, M.C., Hunter, M.D., Pritikin, J.N., Zahery, M., Brick, T.R., Kirkpatrick, R.M.,
Estabrook, R., Bates, T.C., Maes, H.H., Boker, S.M. (2016). “OpenMx 2.0: Extended
structural equation and statistical modeling.” Psychometrika 81(2): 535-549.
Nylund-Gibson, K., Choi, A. Y. (2018). Ten frequently asked questions about latent class
analysis. Translational Issues in Psychological Sciences, 4, 440–461.
Oberski, D. (2016). Mixture models: Latent profile and latent class analysis. In J. Robertson
& M. Kaptein (Eds.), Modern statistical methods for HCI (pp. 275–287). Cham, Switzerland:
Springer Inter-national Publishing.

16
Pastor, D.A., Barron, K.E., Miller, B.J., Davis, S.L. (2007). A latent profile analysis of
college students’ achievement goal orientation. Contemporary Educational Psychology 32(1):
8-47.
Rosenberg, J.M., Beymer, P.N., Anderson, D.J., Van Lissa, C.J., & Schmidt, J.A. (2018).
tidyLPA: An R Package to Easily Carry Out Latent Profile Analysis (LPA) Using Open-Source
or Commercial Software. Journal of Open Source Software 3(30): 978.
Scrucca, L., Fop M., Murphy, T.B., Raftery A.E. (2016) mclust 5: clustering, classification
and density estimation using Gaussian finite mixture models The R Journal 8/1: 289-317.
Shireman, E. M., Steinley, D., & Brusco, M. J. (2016). Local Optima in Mixture Modeling.
Multivariate Behavioral Research 51(4): 466–481.
Shireman, E., Steinley, D., Brusco, M.J. (2017). Examining the effect of initialization
strategies on the performance of Gaussian mixture modeling. Behavioral Research Methods
49(1):282-293.
Sterba, S.K. (2013). Understanding Linkages Among Mixture Models. Multivariate Behav
Res. 2013 48(6): 775-815.
Wolfe, J. (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral
Research, 5: 329–350.

Appendix
R code to simulate the tutorial data. Make sure to install the ‘MASS’ and ‘reshape2’ packages
first.
library(MASS)
library(reshape2)

# define covariance matrix of v1-v10 in class 1

# diagonal: variances of v1-v10
# off-diagonal: covariances between v1-v10

Sigma1 <- rbind(c(1.0,0.0,0.2,0.1,0.1,0.3,0.1,0.2,0.1,0.1),

c(0.0,1.1,0.1,0.2,0.1,0.2,0.2,0.3,0.2,0.1),
c(0.2,0.1,1.0,0.1,0.2,0.1,0.1,0.2,0.1,0.2),
c(0.1,0.2,0.1,2.0,0.2,0.1,0.2,0.1,0.2,0.1),
c(0.1,0.1,0.2,0.2,2.8,0.2,0.1,0.1,0.2,0.1),
c(0.3,0.2,0.1,0.1,0.2,1.0,0.3,0.2,0.0,0.1),
c(0.1,0.2,0.1,0.2,0.1,0.3,2.0,0.1,0.2,0.1),
c(0.2,0.3,0.2,0.1,0.1,0.2,0.1,1.0,0.2,0.1),
c(0.1,0.2,0.1,0.2,0.2,0.0,0.2,0.2,1.0,0.2),
c(0.1,0.1,0.2,0.1,0.1,0.1,0.1,0.1,0.2,1.0))

# simulate class 1 responses (v1-v10), based on means (mu)

17
# and covariance matrix (sigma)
set.seed(0111)
cl1 <-mvrnorm(100, mu=c(2,3,2,6,7,4,5,8,2,1), Sigma=Sigma1)

#########################################################
####### define covariance matrix of v1-v10 in class 2 ###

Sigma2 <- rbind(c(2.0,0.1,0.2,0.1,0.1,0.3,0.1,0.2,0.1,0.1),

c(0.1,1.5,0.1,0.2,0.1,0.1,0.2,0.3,0.2,0.1),
c(0.2,0.1,1.0,0.1,0.2,0.1,0.1,0.2,0.1,0.2),
c(0.1,0.2,0.1,1.5,0.2,0.1,0.2,0.1,0.2,0.1),
c(0.1,0.1,0.2,0.2,1.0,0.2,0.1,0.1,0.2,0.1),
c(0.3,0.1,0.1,0.1,0.2,2.0,0.3,0.2,0.0,0.1),
c(0.1,0.2,0.1,0.2,0.1,0.3,1.0,0.1,0.2,0.1),
c(0.2,0.3,0.2,0.1,0.1,0.2,0.1,2.0,0.2,0.1),
c(0.1,0.2,0.1,0.2,0.2,0.0,0.2,0.2,1.0,0.2),
c(0.1,0.1,0.2,0.1,0.1,0.1,0.1,0.1,0.2,1.5))

# simulate class 2 responses (v1-v10), based on means (mu)

# and covariance matrix (sigma)
set.seed(0111)
cl2 <-mvrnorm(110, mu=c(6,7,1,3,2,8,3,8,2,5), Sigma=Sigma2)

##########################################################
######## define covariance matrix of v1-v10 in class 3 ###

Sigma3 <- rbind(c(1.0,0.1,0.2,0.1,0.1,0.3,0.1,0.3,0.1,0.1),

c(0.1,1.0,0.1,0.2,0.1,0.1,0.2,0.3,0.2,0.1),
c(0.2,0.1,2.2,0.1,0.2,0.1,0.1,0.2,0.1,0.2),
c(0.1,0.2,0.1,2.0,0.1,0.1,0.2,0.1,0.2,0.1),
c(0.1,0.1,0.2,0.1,1.0,0.2,0.1,0.1,0.2,0.1),
c(0.3,0.1,0.1,0.1,0.2,1.0,0.3,0.2,0.0,0.1),
c(0.1,0.2,0.1,0.2,0.1,0.3,1.0,0.1,0.2,0.1),
c(0.3,0.3,0.2,0.1,0.1,0.2,0.1,1.2,0.2,0.1),
c(0.1,0.2,0.1,0.2,0.2,0.0,0.2,0.2,3.0,0.2),
c(0.1,0.1,0.2,0.1,0.1,0.1,0.1,0.1,0.2,2.0))

# simulate class 3 responses (v1-v10), based on means (mu)

# and covariance matrix (sigma)
set.seed(0111)
cl3 <-mvrnorm(90, mu=c(2,5,8,7,5,2,1,4,9,8), Sigma=Sigma3)

##########################################################
### convert matrices to data.frame objects & add a class nr
cl1 <- as.data.frame(cl1)

18
cl1$class <-rep(1,100)
cl2 <- as.data.frame(cl2)
cl2$class <-rep(2,110)
cl3 <- as.data.frame(cl3)
cl3$class <- rep(3,90)

# merge the three class dataframes and add ID numbers

dat1 <- rbind(cl1,cl2,cl3)
colnames(dat1) <- c('var1', 'var2', 'var3', 'var4', 'var5',
'var6', 'var7', 'var8', 'var9', 'var10',
'class')
dat1$class <- as.factor(dat1$class)
dat1$id <- seq(1,300,1)

##########################################################
### add a little stochastic noise for increased realism ;)
dat1$var1 <- dat1$var1 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var2 <- dat1$var2 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var3 <- dat1$var3 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var4 <- dat1$var4 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var5 <- dat1$var5 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var6 <- dat1$var6 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var7 <- dat1$var7 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var8 <- dat1$var8 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var9 <- dat1$var9 + rnorm(n=300, mean=0, sd=sqrt(1))
dat1$var10 <- dat1$var10 + rnorm(n=300, mean=0, sd=sqrt(1))

Wrangler TJ 2005 Service Manual
No ratings yet
Wrangler TJ 2005 Service Manual
1,680 pages
The Legal Order 1st Edition Santi Romano instant download
No ratings yet
The Legal Order 1st Edition Santi Romano instant download
50 pages
Reversible Dementia: Sudhir Sharma
No ratings yet
Reversible Dementia: Sudhir Sharma
35 pages
Boron Chemistry & Applications To Its Cancer Treatments - 061230
No ratings yet
Boron Chemistry & Applications To Its Cancer Treatments - 061230
21 pages
Wei Et Al 2021 Seed Mediated Synthesis of Gold Nanorods at Low Concentrations of Ctab
No ratings yet
Wei Et Al 2021 Seed Mediated Synthesis of Gold Nanorods at Low Concentrations of Ctab
8 pages
Achismita Das - 23PMHS028 Intrernship Report
No ratings yet
Achismita Das - 23PMHS028 Intrernship Report
15 pages
EG Series Uptime Warranty
No ratings yet
EG Series Uptime Warranty
2 pages
Mizoram Synod Handbooks
No ratings yet
Mizoram Synod Handbooks
16 pages
MVA
No ratings yet
MVA
4 pages
Statistical Power to Detect the Correct Number of Classes in Latent Profile Analysis- Tein et al
No ratings yet
Statistical Power to Detect the Correct Number of Classes in Latent Profile Analysis- Tein et al
19 pages
PLASTIFORM Quality Control Molding 2019 V1
No ratings yet
PLASTIFORM Quality Control Molding 2019 V1
2 pages
SPORT Arts and Sports Primary Education
No ratings yet
SPORT Arts and Sports Primary Education
39 pages
Finding Latent Groups in Observed Data
No ratings yet
Finding Latent Groups in Observed Data
56 pages
Hypertension and Cardiac Arrhythmias A Consensus Document From The Ehra and Esc Council On Hypertension Endorsed by Hrs Aphrs and Soleace
No ratings yet
Hypertension and Cardiac Arrhythmias A Consensus Document From The Ehra and Esc Council On Hypertension Endorsed by Hrs Aphrs and Soleace
21 pages
Psychiatry, Psychology and Law
No ratings yet
Psychiatry, Psychology and Law
28 pages
Owen 2012
No ratings yet
Owen 2012
16 pages
0 Hendy2003
No ratings yet
0 Hendy2003
12 pages
Composite Exhibit 2 (Part 1)
No ratings yet
Composite Exhibit 2 (Part 1)
55 pages
LPA in R v1.2
No ratings yet
LPA in R v1.2
19 pages
Ielts Academic Reading Task Type 2 Identifying Information and Task Type 3 Identifying Writers Views and Claims Activity PDF
No ratings yet
Ielts Academic Reading Task Type 2 Identifying Information and Task Type 3 Identifying Writers Views and Claims Activity PDF
10 pages
LCA Latent Class Analisys Introduction
No ratings yet
LCA Latent Class Analisys Introduction
37 pages
Modul Bahasa Inggris 3
No ratings yet
Modul Bahasa Inggris 3
12 pages
1.2 Systems and Models
No ratings yet
1.2 Systems and Models
34 pages
Bauer LCA Chapter Preprint Rev
No ratings yet
Bauer LCA Chapter Preprint Rev
37 pages
MM Chi Square Lab
No ratings yet
MM Chi Square Lab
4 pages
Extra Exercises Answers
No ratings yet
Extra Exercises Answers
4 pages
Mari, Wilson, Maul - Measurement Across The Sciences
No ratings yet
Mari, Wilson, Maul - Measurement Across The Sciences
339 pages
Getting To Know Jest
100% (1)
Getting To Know Jest
29 pages
Laboratory Disk Mill: Rapid Fine Grinding 100 M For Brittle To Very Hard Materials
No ratings yet
Laboratory Disk Mill: Rapid Fine Grinding 100 M For Brittle To Very Hard Materials
4 pages
The Analysis of Time Series: An Introduction with R 7th Edition Chris Chatfield download pdf
100% (5)
The Analysis of Time Series: An Introduction with R 7th Edition Chris Chatfield download pdf
55 pages
387 Syllabus
100% (1)
387 Syllabus
5 pages
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
No ratings yet
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
521 pages
Field Intensity Meter Model Fim-41 Operating Instructions: Potomac Instruments, Inc
No ratings yet
Field Intensity Meter Model Fim-41 Operating Instructions: Potomac Instruments, Inc
8 pages
Review Brain Computer Interfaces
No ratings yet
Review Brain Computer Interfaces
11 pages
Process Development A4
No ratings yet
Process Development A4
16 pages
Nonparametric Statistics Theory and Methods
No ratings yet
Nonparametric Statistics Theory and Methods
275 pages
Year 4 Transit Forms 2
No ratings yet
Year 4 Transit Forms 2
16 pages
Math Final Exam Grade 5
No ratings yet
Math Final Exam Grade 5
3 pages
Download Complete Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization 1st Edition Matt Wiley PDF for All Chapters
100% (9)
Download Complete Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization 1st Edition Matt Wiley PDF for All Chapters
55 pages
Utkarsh-2k25
No ratings yet
Utkarsh-2k25
33 pages
Multiple Correspondence Analysis: January 2007
No ratings yet
Multiple Correspondence Analysis: January 2007
14 pages
Pub Structural Equation Modeling and Natural Systems
No ratings yet
Pub Structural Equation Modeling and Natural Systems
379 pages
Ux Style Frameworks
No ratings yet
Ux Style Frameworks
246 pages
STATS Introduction Statistical Analysis
No ratings yet
STATS Introduction Statistical Analysis
105 pages
List of Suppliers M&E
No ratings yet
List of Suppliers M&E
6 pages
Rasch Models (Taken From William Revelle's) : "Short Guide To R"
No ratings yet
Rasch Models (Taken From William Revelle's) : "Short Guide To R"
7 pages
Technology of Denim Production Part - I
No ratings yet
Technology of Denim Production Part - I
21 pages
LLM_SOAR
No ratings yet
LLM_SOAR
27 pages
Brain Computer Interface
No ratings yet
Brain Computer Interface
20 pages
Bayesian Clinical Trials
No ratings yet
Bayesian Clinical Trials
10 pages
Morgan Handbook 2013
No ratings yet
Morgan Handbook 2013
10 pages
PLS-SEM or CB-SEM: Updated Guidelines On Which Method To Use
No ratings yet
PLS-SEM or CB-SEM: Updated Guidelines On Which Method To Use
17 pages
Applied Logistic Regression PDF
100% (1)
Applied Logistic Regression PDF
397 pages
Contemporary Dramatic
No ratings yet
Contemporary Dramatic
9 pages
Schaffer's Stages of Attachment
No ratings yet
Schaffer's Stages of Attachment
2 pages
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
No ratings yet
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
105 pages
Causal Inference and Stable Learning: Peng Cui Tong Zhang
No ratings yet
Causal Inference and Stable Learning: Peng Cui Tong Zhang
95 pages
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
No ratings yet
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
239 pages
Chi Squared
No ratings yet
Chi Squared
17 pages
Random Forest
No ratings yet
Random Forest
32 pages
MAT6001 Advanced-Statistical-Methods ETH 1 AC44
No ratings yet
MAT6001 Advanced-Statistical-Methods ETH 1 AC44
2 pages
Roger E. Millsap - Statistical Approaches To Measurement Invariance-Routledge (2011)
No ratings yet
Roger E. Millsap - Statistical Approaches To Measurement Invariance-Routledge (2011)
359 pages
R Reference Card
No ratings yet
R Reference Card
1 page
Use and Interpretation of The CDC Growth Charts: Purpose
No ratings yet
Use and Interpretation of The CDC Growth Charts: Purpose
4 pages
Shame, Attachment, and Psychotherapy: Phenomenology, Neurophysiology, Relational Trauma, and Harbingers of Healing
100% (1)
Shame, Attachment, and Psychotherapy: Phenomenology, Neurophysiology, Relational Trauma, and Harbingers of Healing
27 pages
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
No ratings yet
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
123 pages
Linear Mixed Models For Longitudinal Data
No ratings yet
Linear Mixed Models For Longitudinal Data
579 pages
2015 Book RegressionModelingStrategies-1 PDF
No ratings yet
2015 Book RegressionModelingStrategies-1 PDF
598 pages
Bifactor Modelling in Mplus
No ratings yet
Bifactor Modelling in Mplus
55 pages
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
No ratings yet
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
284 pages
Kutner Solution
0% (2)
Kutner Solution
43 pages
Modeling Data and Curve Fitting - Non-Linear Least-Squares Minimization and Curve-Fitting For Python
No ratings yet
Modeling Data and Curve Fitting - Non-Linear Least-Squares Minimization and Curve-Fitting For Python
25 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Introduction To Bayesian SEM
No ratings yet
Introduction To Bayesian SEM
127 pages
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
100% (1)
[Ebooks PDF] download Research Methods, Statistics, and Applications Kathrynn A. Adams full chapters
65 pages
Thematic Analysis
100% (1)
Thematic Analysis
5 pages
Structural Equation Models: The Basics
No ratings yet
Structural Equation Models: The Basics
15 pages
Prophet R
No ratings yet
Prophet R
18 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
Penalized Regression
No ratings yet
Penalized Regression
19 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Gary King, Ori Rosen, Martin A. Tanner - Ecological Inference - New Methodological Strategies (Analytical Methods For Social Research) (2004)
100% (1)
Gary King, Ori Rosen, Martin A. Tanner - Ecological Inference - New Methodological Strategies (Analytical Methods For Social Research) (2004)
433 pages
Procurement Compliance
No ratings yet
Procurement Compliance
5 pages
Lme4: Mixed-Effects Modeling With R
No ratings yet
Lme4: Mixed-Effects Modeling With R
145 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
Solutions Manual Using R Introductory ST
No ratings yet
Solutions Manual Using R Introductory ST
33 pages
Longitudinal PDF
No ratings yet
Longitudinal PDF
664 pages
R Markdown
No ratings yet
R Markdown
15 pages
poLCA An R Package For Polytomous Variable Latent
No ratings yet
poLCA An R Package For Polytomous Variable Latent
29 pages
Generalized Linear Models
100% (1)
Generalized Linear Models
526 pages
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
No ratings yet
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
281 pages
Public Administration
100% (2)
Public Administration
34 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
Econometrics in R: Grant V. Farnsworth October 26, 2008
No ratings yet
Econometrics in R: Grant V. Farnsworth October 26, 2008
50 pages

Latent Profile Analysis in R: A Tutorial and Comparison To Mplus

Uploaded by

Latent Profile Analysis in R: A Tutorial and Comparison To Mplus

Uploaded by

Latent Profile Analysis in R: A tutorial and comparison

## Package 'mclust' version 5.4.7

# Fit 1-5 class model

## Bayesian Information Criterion (BIC):

BIC = 2(Loglikelihood) − df (log(n))

BIC = 2(−5951.275) − 42(log(300)) = −12142.11

BIC = df (log(n)) − 2(Loglikelihood)

When we plug the best-model values into this formula, we get:

BIC = 42(log(300)) − 2(−5951.275) = 12142.11

## [,1] [,2] [,3]

## var1 var2 var3 var4 var5 var6 var7 var8

## Group.1 prob (class 1) prob (class 2) prob (class 3)

## tidyLPA analysis using mclust:

## Model Classes LogLik AIC AWE

## LogLik AIC AWE BIC CAIC CLC KIC SABIC ICL

5.2 Comparison to Mplus

# define covariance matrix of v1-v10 in class 1

Sigma1 <- rbind(c(1.0,0.0,0.2,0.1,0.1,0.3,0.1,0.2,0.1,0.1),

# simulate class 1 responses (v1-v10), based on means (mu)

Sigma2 <- rbind(c(2.0,0.1,0.2,0.1,0.1,0.3,0.1,0.2,0.1,0.1),

# simulate class 2 responses (v1-v10), based on means (mu)

Sigma3 <- rbind(c(1.0,0.1,0.2,0.1,0.1,0.3,0.1,0.3,0.1,0.1),

# simulate class 3 responses (v1-v10), based on means (mu)

# merge the three class dataframes and add ID numbers

You might also like