Title: | Semi-Supervised Gaussian Mixture Model with a Missing-Data Mechanism |
---|---|
Description: | The algorithm of semi-supervised learning is based on finite Gaussian mixture models and includes a mechanism for handling missing data. It aims to fit a g-class Gaussian mixture model using maximum likelihood. The algorithm treats the labels of unclassified features as missing data, building on the framework introduced by Rubin (1976) <doi:10.2307/2335739> for missing data analysis. By taking into account the dependencies in the missing pattern, the algorithm provides more information for determining the optimal classifier, as specified by Bayes' rule. |
Authors: | Ziyang Lyu [aut, cre], Daniel Ahfock [aut], Ryan Thompson [aut], Geoffrey J. McLachlan [aut] |
Maintainer: | Ziyang Lyu <[email protected]> |
License: | GPL-3 |
Version: | 1.1.5 |
Built: | 2025-02-12 05:40:57 UTC |
Source: | https://github.com/cran/gmmsslm |
Bayes' rule of allocation
bayesclassifier(dat, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL)
bayesclassifier(dat, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL)
dat |
An |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
paralist |
A list containing the required parameters |
Classifier specified by Bayes' rule
The classifier/Bayes rule of allocation assigns an entity with observation
to class
(that is,
) if
clust |
Class membership for the ith entity |
n <- 150 pi <- c(0.25, 0.25, 0.25, 0.25) sigma <- array(0, dim = c(3, 3, 4)) sigma[, , 1] <- diag(1, 3) sigma[, , 2] <- diag(2, 3) sigma[, , 3] <- diag(3, 3) sigma[, , 4] <- diag(4, 3) mu <- matrix(c(0.2, 0.3, 0.4, 0.2, 0.7, 0.6, 0.1, 0.7, 1.6, 0.2, 1.7, 0.6), 3, 4) dat <- rmix(n = n, pi = pi, mu = mu, sigma = sigma) params <- list(pi=pi,mu = mu, sigma = sigma) clust <- bayesclassifier(dat=dat$Y,p=3,g=4,paralist=params)
n <- 150 pi <- c(0.25, 0.25, 0.25, 0.25) sigma <- array(0, dim = c(3, 3, 4)) sigma[, , 1] <- diag(1, 3) sigma[, , 2] <- diag(2, 3) sigma[, , 3] <- diag(3, 3) sigma[, , 4] <- diag(4, 3) mu <- matrix(c(0.2, 0.3, 0.4, 0.2, 0.7, 0.6, 0.1, 0.7, 1.6, 0.2, 1.7, 0.6), 3, 4) dat <- rmix(n = n, pi = pi, mu = mu, sigma = sigma) params <- list(pi=pi,mu = mu, sigma = sigma) clust <- bayesclassifier(dat=dat$Y,p=3,g=4,paralist=params)
This file provides functions to perform bootstrap analysis on the results of the gmmsslm function.
This function performs non-parametric bootstrap to assess the variability of the gmmsslm function outputs.
bootstrap_gmmsslm( dat, zm, pi, mu, sigma, paralist, xi, type, iter.max = 500, eval.max = 500, rel.tol = 1e-15, sing.tol = 1e-15, B = 2000 )
bootstrap_gmmsslm( dat, zm, pi, mu, sigma, paralist, xi, type, iter.max = 500, eval.max = 500, rel.tol = 1e-15, sing.tol = 1e-15, B = 2000 )
dat |
A matrix where each row represents an individual observation. |
zm |
A matrix or data frame of labels corresponding to dat. |
pi |
A numeric vector representing the mixing proportions. |
mu |
A matrix representing the location parameters. |
sigma |
An array representing the covariance matrix or list of covariance matrices. |
paralist |
A list of parameters. |
xi |
A numeric value representing the coefficient for a logistic function of the Shannon entropy. |
type |
A character value indicating the type of Gaussian mixture model. |
iter.max |
An integer indicating the maximum number of iterations. |
eval.max |
An integer indicating the maximum number of evaluations. |
rel.tol |
A numeric value indicating the relative tolerance. |
sing.tol |
A numeric value indicating the singularity tolerance. |
B |
An integer indicating the number of bootstrap samples. |
A list containing mean and sd of bootstrap samples for pi, mu, sigma, and xi.
Transform a variance matrix into a vector i.e., Sigma=R^T*R
cov2vec(sigma)
cov2vec(sigma)
sigma |
A |
The variance matrix is decomposed by computing the Choleski factorization of a real symmetric positive-definite square matrix. Then, storing the upper triangular factor of the Choleski decomposition into a vector.
par A vector representing a variance matrix
Discriminant function in the particular case of g=2 classes with an equal-covariance matrix
discriminant_beta(pi, mu, sigma)
discriminant_beta(pi, mu, sigma)
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
Discriminant function in the particular case of g=2 classes with an equal-covariance matrix can be expressed
where and
.
beta0 |
An intercept of discriminant function |
beta |
A coefficient of discriminant function |
Error rate of the Bayes rule for a g-class Gaussian mixture model
erate(dat, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL, clust)
erate(dat, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL, clust)
dat |
An |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
paralist |
A list containing the required parameters |
clust |
An n-dimensional vector of class partition. |
The error rate of the Bayes rule for a g-class Gaussian mixture model is given by
Here, we write
where if
and
otherwise, and
is an indicator function for the
th class.
errval |
a value of error rate |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y) fit_pc<-gmmsslm(dat=dat$Y,zm=zm,pi=inits$pi,mu=inits$mu,sigma=inits$sigma,xi=xi,type='full') parlist<-paraextract(fit_pc) erate(dat=dat$Y,p=3,g=4,paralist=parlist,clust=dat$clust)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y) fit_pc<-gmmsslm(dat=dat$Y,zm=zm,pi=inits$pi,mu=inits$mu,sigma=inits$sigma,xi=xi,type='full') parlist<-paraextract(fit_pc) erate(dat=dat$Y,p=3,g=4,paralist=parlist,clust=dat$clust)
The optimal error rate of Bayes rule for two-class Gaussian homoscedastic model
errorrate(beta0, beta, pi, mu, sigma)
errorrate(beta0, beta, pi, mu, sigma)
beta0 |
An intercept parameter of the discriminant function coefficients. |
beta |
A |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
The optimal error rate of Bayes rule for two-class Gaussian homoscedastic model can be expressed as
where is a normal probability function with mean
and covariance matrix
.
errval |
A vector of error rate. |
The collected dataset is composed of 76 colonoscopic videos (recorded with both White Light (WL) and Narrow Band Imaging (NBI)), the histology (classification ground truth), and the endoscopist's opinion (including 4 experts and 3 beginners). There are $n=76$ observations, and each observation consists of 698 features extracted from colonoscopic videos on patients with gastrointestinal lesions.
http://www.depeca.uah.es/colonoscopy_dataset/
Get posterior probabilities of class membership
get_clusterprobs( dat, n, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL )
get_clusterprobs( dat, n, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL )
dat |
An |
n |
Number of observations. |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
paralist |
A list containing the required parameters |
The posterior probability can be expressed as
where is a normal probability function with mean
and covariance matrix
,
and
is is a zero-one indicator variable denoting the class of origin.
clusprobs |
Posterior probabilities of class membership for the ith entity |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) tau<-get_clusterprobs(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) tau<-get_clusterprobs(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi)
Shannon entropy
get_entropy(dat, n, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL)
get_entropy(dat, n, p, g, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL)
dat |
An |
n |
Number of observations. |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
paralist |
A list containing the required parameters |
The concept of information entropy was introduced by shannon1948mathematical.
The entropy of is formally defined as
clusprobs |
The posterior probabilities of the i-th entity that belongs to the j-th group. |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) en<-get_entropy(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) en<-get_entropy(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi)
Fitting Gaussian mixture model to a complete classified dataset or an incomplete classified dataset with/without the missing-data mechanism.
gmmsslm( dat, zm, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL, xi = NULL, type, iter.max = 500, eval.max = 500, rel.tol = 1e-15, sing.tol = 1e-15 )
gmmsslm( dat, zm, pi = NULL, mu = NULL, sigma = NULL, paralist = NULL, xi = NULL, type, iter.max = 500, eval.max = 500, rel.tol = 1e-15, sing.tol = 1e-15 )
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
paralist |
A list containing the required parameters |
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
type |
Three types of Gaussian mixture models, 'ign' indicates fitting the model to a partially classified sample on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates fitting the model to a partially classified sample on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate fitting the model to a completed classified sample. |
iter.max |
Maximum number of iterations allowed. Defaults to 500 |
eval.max |
Maximum number of evaluations of the objective function allowed. Defaults to 500 |
rel.tol |
Relative tolerance. Defaults to 1e-15 |
sing.tol |
Singular convergence tolerance; defaults to 1e-20. |
A gmmsslmFit object containing the following slots:
objective |
Value of objective likelihood |
convergence |
Value of convergence |
iteration |
Number of iterations |
obs |
Input data matrix |
n |
Number of observations |
p |
Number of variables |
g |
Number of Gaussian components |
type |
Type of Gaussian mixture model |
pi |
Estimated vector of the mixing proportions |
mu |
Estimated matrix of the location parameters |
sigma |
Estimated covariance matrix or list of covariance matrices |
xi |
Estimated coefficient vector for a logistic function of the Shannon entropy |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y) fit_pc<-gmmsslm(dat=dat$Y,zm=zm,paralist=inits,xi=xi,type='full')
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y) fit_pc<-gmmsslm(dat=dat$Y,zm=zm,paralist=inits,xi=xi,type='full')
gmmsslmFit objects store the results of fitting Gaussian mixture models using the gmmsslm function.
An S4 class representing the result of fitting a Gaussian mixture model using gmmsslm()
objective
A numeric value representing the objective likelihood.
ncov
A numeric value representing the number of covariance matrices.
convergence
A numeric value representing the convergence value.
iteration
An integer value representing the number of iterations.
obs
A matrix containing the input data.
m
A logical vector representing label indicators.
n
An integer value representing the number of observations.
p
An integer value representing the number of variables.
g
An integer value representing the number of Gaussian components.
type
A character value representing the type of Gaussian mixture model.
pi
A numeric vector representing the mixing proportions.
mu
A matrix representing the location parameters.
sigma
An array representing the covariance matrix or list of covariance matrices.
xi
A numeric value representing the coefficient for a logistic function of the Shannon entropy.
gmmsslm
Inittial values for claculating the estimates based on solely on the classified features.
initialvalue(dat, zm, g, ncov = 2)
initialvalue(dat, zm, g, ncov = 2)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
g |
Number of multivariate normal classes. |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
pi |
A g-dimensional initial vector of the mixing proportions. |
mu |
A initial |
sigma |
A |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA initlist<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi) zm<-dat$clust zm[m==1]<-NA initlist<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2)
Transfer a list into a vector
list2par(p, g, pi, mu, sigma, xi = NULL, type = c("ign", "full", "com"))
list2par(p, g, pi, mu, sigma, xi = NULL, type = c("ign", "full", "com"))
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
par |
a vector including all list information |
Full log-likelihood function with both terms of ignoring and missing
loglk_full(dat, zm, pi, mu, sigma, xi)
loglk_full(dat, zm, pi, mu, sigma, xi)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
The full log-likelihood function can be expressed as
whereis the log likelihood function formed ignoring the missing in the label of the unclassified features,
and
is the log likelihood function formed on the basis of the missing-label indicator.
lk |
Log-likelihood value |
Log likelihood for partially classified data with ingoring the missing mechanism
loglk_ig(dat, zm, pi, mu, sigma)
loglk_ig(dat, zm, pi, mu, sigma)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
The log-likelihood function for partially classified data with ingoring the missing mechanism can be expressed as
where is a missing label indicator,
is a zero-one indicator variable defining the known group of origin of each,
and
is a probability density function with parameters
.
lk |
Log-likelihood value. |
Log likelihood for partially classified data based on the missing mechanism with the Shanon entropy
loglk_miss(dat, zm, pi, mu, sigma, xi)
loglk_miss(dat, zm, pi, mu, sigma, xi)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
The log-likelihood function formed on the basis of the missing-label indicator can be expressed by
where is a logistic function of the Shannon entropy
,
and
is a missing label indicator.
lk |
loglikelihood value |
log summation of exponential variable vector.
logsumexp(x)
logsumexp(x)
x |
A variable vector. |
val |
log summation of exponential variable vector. |
Convert class indicator into a label maxtrix.
makelabelmatrix(clust)
makelabelmatrix(clust)
clust |
An n-dimensional vector of class partition. |
Z |
A matrix of class indicator. |
cluster<-c(1,1,2,2,3,3) label_maxtrix<-makelabelmatrix(cluster)
cluster<-c(1,1,2,2,3,3) label_maxtrix<-makelabelmatrix(cluster)
Negative objective function for gmmssl
neg_objective_function( dat, zm, g, par, ncov = 2, type = c("ign", "full", "com") )
neg_objective_function( dat, zm, g, par, ncov = 2, type = c("ign", "full", "com") )
dat |
An |
zm |
An n-dimensional vector of group partition including the missing-label, denoted as NA. |
g |
Number of multivariate Gaussian groups. |
par |
An informative vector including |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
val |
Value of negatvie objective function. |
Normalize log-probability.
normalise_logprob(x)
normalise_logprob(x)
x |
A variable vector. |
val |
A normalize log probability of variable vector. |
Transfer a vector into a list
par2list(par, g, p, ncov = 2, type = c("ign", "full", "com"))
par2list(par, g, p, ncov = 2, type = c("ign", "full", "com"))
par |
A vector with list information. |
g |
Number of multivariate normal classes. |
p |
Dimension of observation vecor. |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
parlist |
Return a list including |
This function extracts the parameters from a gmmsslmFit object, including p, g, pi, mu, and sigma.
paraextract(object)
paraextract(object)
object |
A gmmsslmFit object. |
This function plots the smoothed values of '-log(entropy)' against the missingness mechanism and a boxplot of entropy for labeled vs. unlabeled observations.
plot_missingness( dat, g, parlist, zm, bandwidth = 5, range.x = c(0, 5), ylim = NULL, kernel = "normal" )
plot_missingness( dat, g, parlist, zm, bandwidth = 5, range.x = c(0, 5), ylim = NULL, kernel = "normal" )
dat |
An |
g |
Number of multivariate normal classes. |
parlist |
A list containing the required parameters |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
bandwidth |
Bandwidth for kernel smoothing. Default is 5. |
range.x |
Range for x values. Default is c(0, 5). |
ylim |
The y-axis limits in the form of c(ylim[1], ylim[2]). Default is NULL. |
kernel |
Kernel type for smoothing. Default is 'normal'. |
A plot.
This function predicts unclassified label from a gmmsslmFit object.
predict(object)
predict(object)
object |
A gmmsslmFit object. |
Transfer a probability vector into an informative vector
pro2vec(pro)
pro2vec(pro)
pro |
An propability vector |
y An informative vector
Generate the missing label indicator
rlabel(dat, pi, mu, sigma, xi)
rlabel(dat, pi, mu, sigma, xi)
dat |
An |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
xi |
A 2-dimensional coefficient vector for a logistic function of the Shannon entropy. |
m |
A n-dimensional vector of missing label indicator. The element of outputs |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi)
Generate random observations from the normal mixture distributions.
rmix(n, pi, mu, sigma)
rmix(n, pi, mu, sigma)
n |
Number of observations. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
Y |
An |
Z |
An |
clust |
An n-dimensional vector of class partition. |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma)
This function extracts summary information from a gmmsslmFit object, including objective value, ncov, convergence, iteration, and type.
summary(object)
summary(object)
object |
A gmmsslmFit object. |
Transform a vector into a matrix i.e., Sigma=R^T*R
vec2cov(par)
vec2cov(par)
par |
A vector representing a variance matrix |
The variance matrix is decomposed by computing the Choleski factorization of a real symmetric positive-definite square matrix. Then, storing the upper triangular factor of the Choleski decomposition into a vector.
sigma A variance matrix
Transfer an informative vector to a probability vector
vec2pro(vec)
vec2pro(vec)
vec |
An informative vector |
pro A probability vector