Title: | Semi-Supervised Gaussian Mixture Model with a Missing-Data Mechanism |
---|---|
Description: | The algorithm of semi-supervised learning based on finite Gaussian mixture models with a missing-data mechanism is designed for a fitting g-class Gaussian mixture model via maximum likelihood (ML). It is proposed to treat the labels of the unclassified features as missing-data and to introduce a framework for their missing as in the pioneering work of Rubin (1976) for missing in incomplete data analysis. This dependency in the missingness pattern can be leveraged to provide additional information about the optimal classifier as specified by Bayes’ rule. |
Authors: | Ziyang Lyu, Daniel Ahfock, Geoffrey J. McLachlan |
Maintainer: | Ziyang Lyu <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1 |
Built: | 2025-02-17 04:29:28 UTC |
Source: | https://github.com/cran/EMMIXSSL |
A classifier based on Bayes rule, that is maximum a posterior probabilities of class membership
Classifier_Bayes(dat, n, p, g, pi, mu, sigma, ncov = 2)
Classifier_Bayes(dat, n, p, g, pi, mu, sigma, ncov = 2)
dat |
An |
n |
Number of observations. |
p |
Dimension of observation vecor. |
g |
Number of classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
The posterior probability can be expressed as
where is a normal probability function with mean
and covariance matrix
,
and
is is a zero-one indicator variable denoting the class of origin.
The Bayes' Classifier of allocation assigns an entity with feature vector
to Class
if
cluster |
A vector of the class membership. |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) cluster<-Classifier_Bayes(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) cluster<-Classifier_Bayes(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
Transform a variance matrix into a vector i.e., Sigma=R^T*R
cov2vec(sigma)
cov2vec(sigma)
sigma |
A variance matrix |
The variance matrix is decomposed by computing the Choleski factorization of a real symmetric positive-definite square matrix. Then, storing the upper triangular factor of the Choleski decomposition into a vector.
par A vector representing a variance matrix
Discriminant function in the particular case of g=2 classes with an equal-covariance matrix
discriminant_beta(pi, mu, sigma)
discriminant_beta(pi, mu, sigma)
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
Discriminant function in the particular case of g=2 classes with an equal-covariance matrix can be expressed
where and
.
beta0 |
An intercept of discriminant function |
beta |
A coefficient of discriminant function |
Fitting Gaussian mixture model to a complete classified dataset or a incomplete classified dataset with/without the missing-data mechanism.
EMMIXSSL( dat, zm, pi, mu, sigma, ncov, xi = NULL, type, iter.max = 500, eval.max = 500, rel.tol = 1e-06, sing.tol = 1e-20 )
EMMIXSSL( dat, zm, pi, mu, sigma, ncov, xi = NULL, type, iter.max = 500, eval.max = 500, rel.tol = 1e-06, sing.tol = 1e-20 )
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
type |
Three types of Gaussian mixture models, 'ign' indicates fitting the model to a partially classified sample on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates fitting the model to a partially classified sample on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate fitting the model to a completed classified sample. |
iter.max |
Maximum number of iterations allowed. Defaults to 500 |
eval.max |
Maximum number of evaluations of the objective function allowed. Defaults to 500 |
rel.tol |
Relative tolerance. Defaults to 1e-15 |
sing.tol |
Singular convergence tolerance; defaults to 1e-20. |
objective |
Value of objective likelihood |
convergence |
Value of convergence |
iteration |
Number of iteration |
pi |
Estimated vector of the mixing proportions. |
mu |
Estimated matrix of the location parameters. |
sigma |
Estimated covariance matrix |
xi |
Estimated coefficient vector for a logistic function of the Shannon entropy |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2) ## Not run: fit_pc<-EMMIXSSL(dat=dat$Y,zm=zm,pi=inits$pi,mu=inits$mu,sigma=inits$sigma,xi=xi,type='full',ncov=2) ## End(Not run)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2) ## Not run: fit_pc<-EMMIXSSL(dat=dat$Y,zm=zm,pi=inits$pi,mu=inits$mu,sigma=inits$sigma,xi=xi,type='full',ncov=2) ## End(Not run)
The optimal error rate of Bayes rule for two-class Gaussian homoscedastic model
errorrate(beta0, beta, pi, mu, sigma)
errorrate(beta0, beta, pi, mu, sigma)
beta0 |
An |
beta |
Number of observations. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
The optimal error rate of Bayes rule for two-class Gaussian homoscedastic model can be expressed as
where is a normal probability function with mean
and covariance matrix
.
errval |
A vector of error rate. |
A panel of seven endoscopists viewed the videos and determined which patient needs resection (malignant) or no-resection (benign).
http://www.depeca.uah.es/colonoscopy_dataset/
Gastrointestinal trinary ground truth (Adenoma, Serrated, and Hyperplastic)
http://www.depeca.uah.es/colonoscopy_dataset/
The collected dataset is composed of 76 colonoscopic videos (recorded with both White Light (WL) and Narrow Band Imaging (NBI)), the histology (classification ground truth), and the endoscopist's opinion (including 4 experts and 3 beginners). There are $n=76$ observations, and each observation consists of 698 features extracted from colonoscopic videos on patients with gastrointestinal lesions.
http://www.depeca.uah.es/colonoscopy_dataset/
Get posterior probabilities of class membership
get_clusterprobs(dat, n, p, g, pi, mu, sigma, ncov = 2)
get_clusterprobs(dat, n, p, g, pi, mu, sigma, ncov = 2)
dat |
An |
n |
Number of observations. |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
The posterior probability can be expressed as
where is a normal probability function with mean
and covariance matrix
,
and
is is a zero-one indicator variable denoting the class of origin.
clusprobs |
Posterior probabilities of class membership for the ith entity |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) tau<-get_clusterprobs(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) tau<-get_clusterprobs(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
Shannon entropy
get_entropy(dat, n, p, g, pi, mu, sigma, ncov = 2)
get_entropy(dat, n, p, g, pi, mu, sigma, ncov = 2)
dat |
An |
n |
Number of observations. |
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
The concept of information entropy was introduced by shannon1948mathematical.
The entropy of is formally defined as
clusprobs |
The posterior probabilities of the i-th entity that belongs to the j-th group. |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) en<-get_entropy(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) en<-get_entropy(dat=dat$Y,n=150,p=3,g=4,mu=mu,sigma=sigma,pi=pi,ncov=2)
Inittial values for claculating the estimates based on solely on the classified features.
initialvalue(dat, zm, g, ncov = 2)
initialvalue(dat, zm, g, ncov = 2)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
g |
Number of multivariate normal classes. |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
pi |
A g-dimensional initial vector of the mixing proportions. |
mu |
A initial |
sigma |
A |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2) zm<-dat$clust zm[m==1]<-NA inits<-initialvalue(g=4,zm=zm,dat=dat$Y,ncov=2)
Transfer a list into a vector
list2par( p, g, pi, mu, sigma, ncov = 2, xi = NULL, type = c("ign", "full", "com") )
list2par( p, g, pi, mu, sigma, ncov = 2, xi = NULL, type = c("ign", "full", "com") )
p |
Dimension of observation vecor. |
g |
Number of multivariate normal classes. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
par |
a vector including all list information |
Full log-likelihood function with both terms of ignoring and missing
loglk_full(dat, zm, pi, mu, sigma, ncov = 2, xi)
loglk_full(dat, zm, pi, mu, sigma, ncov = 2, xi)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
The full log-likelihood function can be expressed as
whereis the log likelihood function formed ignoring the missing in the label of the unclassified features,
and
is the log likelihood function formed on the basis of the missing-label indicator.
lk |
Log-likelihood value |
Log likelihood for partially classified data with ingoring the missing mechanism
loglk_ig(dat, zm, pi, mu, sigma, ncov = 2)
loglk_ig(dat, zm, pi, mu, sigma, ncov = 2)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
The log-likelihood function for partially classified data with ingoring the missing mechanism can be expressed as
where is a missing label indicator,
is a zero-one indicator variable defining the known group of origin of each,
and
is a probability density function with parameters
.
lk |
Log-likelihood value. |
Log likelihood for partially classified data based on the missing mechanism with the Shanon entropy
loglk_miss(dat, zm, pi, mu, sigma, ncov = 2, xi)
loglk_miss(dat, zm, pi, mu, sigma, ncov = 2, xi)
dat |
An |
zm |
An n-dimensional vector containing the class labels including the missing-label denoted as NA. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
xi |
A 2-dimensional vector containing the initial values of the coefficients in the logistic function of the Shannon entropy. |
The log-likelihood function formed on the basis of the missing-label indicator can be expressed by
where is a logistic function of the Shannon entropy
,
and
is a missing label indicator.
lk |
loglikelihood value |
log summation of exponential variable vector.
logsumexp(x)
logsumexp(x)
x |
A variable vector. |
val |
log summation of exponential variable vector. |
Convert class indicator into a label maxtrix.
makelabelmatrix(clust)
makelabelmatrix(clust)
clust |
An n-dimensional vector of class partition. |
Z |
A matrix of class indicator. |
cluster<-c(1,1,2,2,3,3) label_maxtrix<-makelabelmatrix(cluster)
cluster<-c(1,1,2,2,3,3) label_maxtrix<-makelabelmatrix(cluster)
Negative objective function for EMMIXSSL
neg_objective_function( dat, zm, g, par, ncov = 2, type = c("ign", "full", "com") )
neg_objective_function( dat, zm, g, par, ncov = 2, type = c("ign", "full", "com") )
dat |
An |
zm |
An n-dimensional vector of group partition including the missing-label, denoted as NA. |
g |
Number of multivariate Gaussian groups. |
par |
An informative vector including |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
val |
Value of negatvie objective function. |
Normalize log-probability.
normalise_logprob(x)
normalise_logprob(x)
x |
A variable vector. |
val |
A normalize log probability of variable vector. |
Transfer a vector into a list
par2list(par, g, p, ncov = 2, type = c("ign", "full"))
par2list(par, g, p, ncov = 2, type = c("ign", "full"))
par |
A vector with list information. |
g |
Number of multivariate normal classes. |
p |
Dimension of observation vecor. |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
type |
Three types to fit to the model, 'ign' indicates fitting the model on the basis of the likelihood that ignores the missing label mechanism, 'full' indicates that the model to be fitted on the basis of the full likelihood, taking into account the missing-label mechanism, and 'com' indicate that the model to be fitted to a completed classified sample. |
parlist |
Return a list including |
Transfer a probability vector into an informative vector
pro2vec(pro)
pro2vec(pro)
pro |
An propability vector |
y An informative vector
Generate the missing label indicator
rlabel(dat, pi, mu, sigma, ncov = 2, xi)
rlabel(dat, pi, mu, sigma, ncov = 2, xi)
dat |
An |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
xi |
A 2-dimensional coefficient vector for a logistic function of the Shannon entropy. |
m |
A n-dimensional vector of missing label indicator. The element of outputs |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2) xi<-c(-0.5,1) m<-rlabel(dat=dat$Y,pi=pi,mu=mu,sigma=sigma,xi=xi,ncov=2)
Generate random observations from the normal mixture distributions.
rmix(n, pi, mu, sigma, ncov = 2)
rmix(n, pi, mu, sigma, ncov = 2)
n |
Number of observations. |
pi |
A g-dimensional vector for the initial values of the mixing proportions. |
mu |
A |
sigma |
A |
ncov |
Options of structure of sigma matrix; the default value is 2;
|
Y |
An |
Z |
An |
clust |
An n-dimensional vector of class partition. |
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2)
n<-150 pi<-c(0.25,0.25,0.25,0.25) sigma<-array(0,dim=c(3,3,4)) sigma[,,1]<-diag(1,3) sigma[,,2]<-diag(2,3) sigma[,,3]<-diag(3,3) sigma[,,4]<-diag(4,3) mu<-matrix(c(0.2,0.3,0.4,0.2,0.7,0.6,0.1,0.7,1.6,0.2,1.7,0.6),3,4) dat<-rmix(n=n,pi=pi,mu=mu,sigma=sigma,ncov=2)
Transform a vector into a matrix i.e., Sigma=R^T*R
vec2cov(par)
vec2cov(par)
par |
A vector representing a variance matrix |
The variance matrix is decomposed by computing the Choleski factorization of a real symmetric positive-definite square matrix. Then, storing the upper triangular factor of the Choleski decomposition into a vector.
sigma A variance matrix
Transfer an informative vector to a probability vector
vec2pro(vec)
vec2pro(vec)
vec |
An informative vector |
pro A probability vector