Title: | Generalized Structure Component Analysis- Latent Class Analysis & Latent Class Regression |
---|---|
Description: | Execute Latent Class Analysis (LCA) and Latent Class Regression (LCR) by using Generalized Structured Component Analysis (GSCA). This is explained in Ryoo, Park, and Kim (2019) <doi:10.1007/s41237-019-00084-6>. It estimates the parameters of latent class prevalence and item response probability in LCA with a single line comment. It also provides graphs of item response probabilities. In addition, the package enables to estimate the relationship between the prevalence and covariates. |
Authors: | Jihoon Ryoo [aut], Seohee Park [aut, cre], Seoungeun Kim [aut], heungsun Hwaung [aut] |
Maintainer: | Seohee Park <[email protected]> |
License: | GPL-3 |
Version: | 0.0.5 |
Built: | 2025-03-13 04:02:34 UTC |
Source: | https://github.com/cran/gscaLCA |
Add Health data about substance use
data(AddHealth)
data(AddHealth)
A data frame with 5114 observations on the following 8 variables.
A numeric vector of observations' ID.
A factor with levels "Yes" or "No"; Have you ever smoked an entire cigarette?
A factor with levels "Yes" or "No";Have you had a drink of beer, wine, or liquor more than two or three times? Do not include sips or tastes from someone else’s drink.
A factor with levels "Yes" or "No"; Have you ever used any of the following drugs? Other types of illegal drugs, such as LSD, PCP, ecstasy, heroin, or mushrooms; or inhalants.
A factor with levels "Yes" or "No"; Have you ever used any of the following drugs? Marijuana (hash, bhang, ganja)
A factor with levels "Yes" or "No"; Have you ever used any of the following drugs? Cocaine (crack, coca leaves)
A factor with levels "M" or "F"
An integer vector from 1 to 8. It refers to the education level.
This AddHealth data consist of 5,144 participants' responses with a randomly generated ID variable and five item variables, such as Smoking, Alcohol, Other Types of Illegal Drug, Marijuana, and Cocaine. The responses of the five items are dichotomous as either "Yes" or "No" and are treated the other missing codes as systematic missing. Along with the dichotomous responses, participants' gender and education level are also included in the sample data. This data can be obtained from the National Longitudinal Study of Adolescent to Adult Health (Add Health; Harris et al., 2009) where the study has mainly focused on the investigation of how health factors in childhood affect adult outcomes. In terms of data collection, there have been four additional waves since 1994. In this package, the data of a specific section of substance use at the wave IV is pre-installed.
Harris, Kathleen Mullan, and Udry, J. Richard. National Longitudinal Study of Adolescent to Adult Health (Add Health), 1994-2008 [Public Use]. Ann Arbor, MI: Carolina Population Center, University of North Carolina-Chapel Hill [distributor], Inter-university Consortium for Political and Social Research [distributor], 2018-08-06. https://doi.org/10.3886/ICPSR21600.v21
data(AddHealth) str(AddHealth) head(AddHealth)
data(AddHealth) str(AddHealth) head(AddHealth)
Fitting a component-based LCA by utilizing fuzzy clustering GSCA algorithm.
gscaLCA( dat, varnames = NULL, ID.var = NULL, num.class = 2, num.factor = "EACH", Boot.num = 20, multiple.Core = FALSE, covnames = NULL, cov.model = NULL, multinomial.ref = "MAX" )
gscaLCA( dat, varnames = NULL, ID.var = NULL, num.class = 2, num.factor = "EACH", Boot.num = 20, multiple.Core = FALSE, covnames = NULL, cov.model = NULL, multinomial.ref = "MAX" )
dat |
Data that you want to fit the gscaLCA function into. |
varnames |
A character vector. The names of columns to be used in the gscaLCA function. |
ID.var |
A character element. The name of ID variable. If ID variable is not specified, the gscaLCA function will search an ID variable in the given data. The ID of observations will be automatically generated as a numeric variable if the data set does not include any ID variable. The default is NULL. |
num.class |
A numeric element. The number of classes to be identified The default is 2. |
num.factor |
Either "EACH" or "ALLin1"."EACH" specifies the sitatuion that each indicator is assumed to be its phantom latent variable. "ALLin1" indicates that all variables are assumed to be explained by a common latent variable. The default is "EACH". |
Boot.num |
The number of bootstraps. The standard errors of parameters are computed from the bootstrap within the gscaLCA algorithm. The default is 20. |
multiple.Core |
A logical element. TRUE enables to use multiple cores for the bootstrap wehn they are available. The default is |
covnames |
A character vector of covariates. The covariates are used when latent class regression (LCR) is fitted. |
cov.model |
A numeric vector. The indicator function of latent class regression (LCR) that covariates are involved in fitting the fuzzy clustering GSCA. 1 if gscaLCA is for LCR and otherwise 0. |
multinomial.ref |
A character element. Options of |
A list of the sample size (N), the number of cluster (C), the number of bootstraps (Boot.num/Boot.num.im), the model fit indices (model.fit), the latent class prevalence (LCprevalence), the item response probability (RespProb), the posterior membership & the predicted class membership (membership), and the graphs of item response probability (plot). When it include covariates, the regression results are also provided.
Ryoo, J. H., Park, S., & Kim, S. (2019). Categorical latent variable modeling utilizing fuzzy clustering generalized structured component analysis as an alternative to latent class analysis. Behaviormetrika, 47, 291-306. https://doi.org/10.1007/s41237-019-00084-6
#AddHealth data with 3 clusters with 500 samples AH.sample= AddHealth[1:500,] R3 = gscaLCA (dat = AH.sample, varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0) summary(R3) R3$model.fit # Model fit R3$LCprevalence # Latent Class Prevalence R3$RespProb # Item Response Probability head(R3$membership) # Membership for all observations # AddHealth data with 3 clusters with 500 samples with two covariates R3_2C = gscaLCA (dat = AH.sample, varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0, multiple.Core = FALSE, covnames = names(AddHealth)[7:8], # Gender and Edu cov.model = c(1, 0), # Only Gender varaible is added to the gscaLCR. multinomial.ref = "MAX") # To print with the results of multinomial regression with hard partitioning of the gscaLCR, # use the option of "multinomial.hard". summary(R3_2C, "multinomial.hard") # AddHealth data with 2 clusters with 20 bootstraps R2 = gscaLCA(AddHealth, varnames = names(AddHealth)[2:6], num.class = 2, Boot.num = 20, multiple.Core = FALSE) # "multiple.Core = TRUE" is recommended. # TALIS data with 3 clusters with 20 bootstraps and the "ALLin1" option T3 = gscaLCA(TALIS, varnames = names(TALIS)[2:6], num.class = 3, num.factor = "ALLin1", Boot.num = 20, multiple.Core = FALSE) # "multiple.Core = TRUE" is recommended.
#AddHealth data with 3 clusters with 500 samples AH.sample= AddHealth[1:500,] R3 = gscaLCA (dat = AH.sample, varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0) summary(R3) R3$model.fit # Model fit R3$LCprevalence # Latent Class Prevalence R3$RespProb # Item Response Probability head(R3$membership) # Membership for all observations # AddHealth data with 3 clusters with 500 samples with two covariates R3_2C = gscaLCA (dat = AH.sample, varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0, multiple.Core = FALSE, covnames = names(AddHealth)[7:8], # Gender and Edu cov.model = c(1, 0), # Only Gender varaible is added to the gscaLCR. multinomial.ref = "MAX") # To print with the results of multinomial regression with hard partitioning of the gscaLCR, # use the option of "multinomial.hard". summary(R3_2C, "multinomial.hard") # AddHealth data with 2 clusters with 20 bootstraps R2 = gscaLCA(AddHealth, varnames = names(AddHealth)[2:6], num.class = 2, Boot.num = 20, multiple.Core = FALSE) # "multiple.Core = TRUE" is recommended. # TALIS data with 3 clusters with 20 bootstraps and the "ALLin1" option T3 = gscaLCA(TALIS, varnames = names(TALIS)[2:6], num.class = 3, num.factor = "ALLin1", Boot.num = 20, multiple.Core = FALSE) # "multiple.Core = TRUE" is recommended.
The 2nd and 3rd step of gscaLCA, which are the partitioning and fitting regression in the latent class regression.
gscaLCR(results.obj, covnames, multinomial.ref = "MAX")
gscaLCR(results.obj, covnames, multinomial.ref = "MAX")
results.obj |
the results of gscaLCA. |
covnames |
A character vector of covariates. The covariates are used when latent class regression (LCR) is fitted. |
multinomial.ref |
A character element. Options of |
Results of the gscaLCR, fitting regression after partioning in addtion to gscaLCA results.
R2 = gscaLCA (dat = AddHealth[1:500, ], # Data has to include the possible covarite to run gscaLCR varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0, multiple.Core = F) R2.gender = gscaLCR (R2, covnames = "Gender") summary(R2.gender, "multinomial.hard") # hard partitioning with multinomial regression summary(R2.gender, "multinomial.soft") # soft partitioning with multinomial regression summary(R2.gender, "binomial.hard") # hard partitioning with binomial regression summary(R2.gender, "binomial.soft") # soft partitioning with binomial regression
R2 = gscaLCA (dat = AddHealth[1:500, ], # Data has to include the possible covarite to run gscaLCR varnames = names(AddHealth)[2:6], ID.var = "AID", num.class = 3, num.factor = "EACH", Boot.num = 0, multiple.Core = F) R2.gender = gscaLCR (R2, covnames = "Gender") summary(R2.gender, "multinomial.hard") # hard partitioning with multinomial regression summary(R2.gender, "multinomial.soft") # soft partitioning with multinomial regression summary(R2.gender, "binomial.hard") # hard partitioning with binomial regression summary(R2.gender, "binomial.soft") # soft partitioning with binomial regression
Summary of gscaLCA output or gscaLCR output
## S3 method for class 'gscaLCA' summary(object, print.cov.output = NULL, ...)
## S3 method for class 'gscaLCA' summary(object, print.cov.output = NULL, ...)
object |
the object of gscaLCA or gscaLCR |
print.cov.output |
a character of what type partitioning and regression. Four possible option are possible "multinomial.hard", "multinomial.soft", "binomial.hard", and "binomial.soft". |
... |
Additional arguments affecting the summary produced. |
print model fit, prevalence, item probabilities, and regression results
# summary(R2)
# summary(R2)
Teaching and Learning International Survey
data(TALIS)
data(TALIS)
A data frame with 2560 observations on the following 6 variables.
a numeric vector of teachers' ID.
Integers with levels from 1 to 3 (1: not/low important, 2: moderate important, 3: high important); Motivation item 1: To become a teacher, teaching offered a steady career path.
Integers with levels from 1 to 3 (1: not/low important, 2: moderate important, 3: high important); Motivation item 2: To become a teacher, teaching schedule fit with responsibilities in my personal life.
Integers with levels from 1 to 3 (1: not at all/to some extent, 2: quite a bit 3: a lot); Pedagogy item 1: What extend you can do help my students value learning.
Integers with levels from 1 to 3 (1: not at all/to some extent, 2: quite a bit 3: a lot); Pedagogy item 2: What extend you can do control disruptive behavior in the classroom.
Integers with levels from 1 to 3 (1: strongly disagree/disagree, 2: agree, 3: strongly agree); Satisfaction item: Feeling I enjoy working at this school.
The Teaching and Learning International Survey (TALIS) 2018 focusing on teachers, school leaders, and the learning environment in schools was conducted by the Organization for Economic Cooperation and Development (OECD). There have been three cycles, TALIS 2008, TALIS 2013, and TALIS 2018. In this study, we utilize publicly available TALIS 2018 U.S. Data, 2,560 teachers’ responses. The sample data include five items: two items are on motivation, two items are on pedagogy, and the last item is on satisfaction. Items’ responses are originally four ordered categorical data of (1) Not at all, (2) To some extent, (3) Quite a bit, and (4) A lot. Due to too small frequencies in the first category, we modified them into three ordered categories.
OECD (2019), TALIS 2018 Results (Volume I): Teachers and School Leaders as Lifelong Learners, TALIS, OECD Publishing, Paris, https://doi.org/10.1787/1d0bc92a-en.
str(TALIS) head(TALIS)
str(TALIS) head(TALIS)