Version: | 0.1-9 |
Date: | 2016-09-22 |
Title: | Sparse Discriminant Analysis |
Author: | Line Clemmensen <lhc@imm.dtu.dk>, contributions by Max Kuhn |
Maintainer: | Max Kuhn <mxkuhn@gmail.com> |
Imports: | elasticnet, MASS, mda |
Depends: | R (≥ 2.10) |
Description: | Performs sparse linear discriminant analysis for Gaussians and mixture of Gaussian models. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | http://www.imm.dtu.dk/~lhc, https://github.com/topepo/sparselda |
NeedsCompilation: | no |
Packaged: | 2016-09-22 14:49:09 UTC; kuhna03 |
Repository: | CRAN |
Date/Publication: | 2016-09-22 17:10:01 |
Normalize training data
Description
Normalize a vector or matrix to zero mean and unit length columns
Usage
normalize(X)
Arguments
X |
a matrix with the training data with observations down the rows and variables in the columns. |
Details
The function can e.g. be used for the training data in sda or smda.
Value
Returns a list with the following attributes:
Xc |
The normalized data. |
mx |
Mean of columns of X. |
vx |
Length of columns of X. |
Id |
Logical vector indicating which variables are included in X. If some of the columns have zero length they are omitted. |
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T. and Ersboell, K. (2008) "Sparse discriminant analysis", Technical report, IMM, Technical University of Denmark
See Also
Examples
## Data
X<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
## Normalize data
Nm<-normalize(X)
print(Nm$Xc)
## See if any variables have been removed
which(!Nm$Id)
Normalize test data
Description
Normalize test data using output from the normalize() of the training data
Usage
normalizetest(Xtst,Xn)
Arguments
Xtst |
a matrix with the test data with observations down the rows and variables in the columns. |
Xn |
List with the output from normalize(Xtr) of the training data. |
Details
The function can e.g. be used to normalize the testing data in sda or smda.
Value
Returns the normalized test data
Xtst |
The normalized data. |
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T. and Ersboell, K. (2007) "Sparse discriminant analysis", Technical report, IMM, Technical University of Denmark
See Also
Examples
## Data
Xtr<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
Xtst<-matrix(sample(seq(3),12,replace=TRUE),nrow=3)
## Normalize training data
Nm<-normalize(Xtr)
## Normalize test data
Xtst<-normalizetest(Xtst,Nm)
Data set of three species of Penicillium fungi
Description
The data set penicilliumYES
has 36 rows and 3754 columns. The variables are
1st order statistics from multi-spectral images of three species of Penicillium fungi:
Melanoconidium, Polonicum, and Venetum.
These are the data used in the Clemmemsen et al "Sparse Discriminant Analysis" paper.
Usage
data(penicilliumYES)
Format
This data set contains the following matrices:
- X
A matrix with 36 columns and 3754 rows. The training and test data. The first 12 rows are P. Melanoconidium species, rows 13-24 are P. Polonicum species, and the last 12 rows are P. Venetum species. The samples are ordered so that each pair of three is from the same isolate.
- Y
A matrix of dummy variables for the training data.
- Z
Z matrix of probabilities for the subcalsses of the training data.
Details
The X matrix is not normalized.
Source
References
Clemmensen, Hansen, Frisvad, Ersboell (2007) "A method for comparison of growth media in objective identification of Penicillium based on multi-spectral imaging" Journal of Microbiological Methods
Predict method for Sparse Discriminant Methods
Description
Prediction functions for link{sda}
and link{smda}
.
Usage
## S3 method for class 'sda'
predict(object, newdata = NULL, ...)
## S3 method for class 'smda'
predict(object, newdata = NULL, ...)
Arguments
object |
an object of class |
newdata |
a matrix or data frame of predictors |
... |
arguments passed to |
Details
The current implementation for mixture discriminant models current predicts the subclass probabilities.
Value
A list with components:
class |
The classification (a factor) |
posterior |
posterior probabilities for the classes (or subclasses for |
x |
the scores |
Sparse discriminant analysis
Description
Performs sparse linear discriminant analysis. Using an alternating minimization algorithm to minimize the SDA criterion.
Usage
sda(x, ...)
## Default S3 method:
sda(x, y, lambda = 1e-6, stop = -p, maxIte = 100,
Q = K-1, trace = FALSE, tol = 1e-6, ...)
Arguments
x |
A matrix of the training data with observations down the rows and variables in the columns. |
y |
A matrix initializing the dummy variables representing the groups. |
lambda |
The weight on the L2-norm for elastic net regression. Default: 1e-6. |
stop |
If STOP is negative, its absolute value corresponds to the desired number of variables. If STOP is positive, it corresponds to an upper bound on the L1-norm of the b coefficients. There is a one to one correspondence between stop and t. The default is -p (-the number of variables). |
maxIte |
Maximum number of iterations. Default: 100. |
Q |
Number of components. Maximum and default is K-1 (the number of classes less one). |
trace |
If TRUE, prints out its progress. Default: FALSE. |
tol |
Tolerance for the stopping criterion (change in RSS). Default is 1e-6. |
... |
additional arguments |
Details
The function finds sparse directions for linear classification.
Value
Returns a list with the following attributes:
beta |
The loadings of the sparse discriminative directions. |
theta |
The optimal scores. |
rss |
A vector of the Residual Sum of Squares at each iteration. |
varNames |
Names on included variables |
.
Author(s)
Line Clemmensen, modified by Trevor Hastie
References
Clemmensen, L., Hastie, T. Witten, D. and Ersboell, K. (2011) "Sparse discriminant analysis", Technometrics, To appear.
See Also
normalize, normalizetest, smda
Examples
## load data
data(penicilliumYES)
X <- penicilliumYES$X
Y <- penicilliumYES$Y
colnames(Y) <- c("P. Melanoconidium",
"P. Polonicum",
"P. Venetum")
## test samples
Iout<-c(3,6,9,12)
Iout<-c(Iout,Iout+12,Iout+24)
## training data
Xtr<-X[-Iout,]
k<-3
n<-dim(Xtr)[1]
## Normalize data
Xc<-normalize(Xtr)
Xn<-Xc$Xc
p<-dim(Xn)[2]
## Perform SDA with one non-zero loading for each discriminative
## direction with Y as matrix input
out <- sda(Xn, Y,
lambda = 1e-6,
stop = -1,
maxIte = 25,
trace = TRUE)
## predict training samples
train <- predict(out, Xn)
## testing
Xtst<-X[Iout,]
Xtst<-normalizetest(Xtst,Xc)
test <- predict(out, Xtst)
print(test$class)
## Factor Y as input
Yvec <- factor(rep(colnames(Y), each = 8))
out2 <- sda(Xn, Yvec,
lambda = 1e-6,
stop = -1,
maxIte = 25,
trace = TRUE)
Sparse mixture discriminant analysis
Description
Performs sparse linear discriminant analysis for mixture of gaussians models.
Usage
smda(x, ...)
## Default S3 method:
smda(x, y, Z = NULL, Rj = NULL,
lambda = 1e-6, stop, maxIte = 50, Q=R-1,
trace = FALSE, tol = 1e-4, ...)
Arguments
x |
A matrix of the training data with observations down the rows and variables in the columns. |
y |
A matrix initializing the dummy variables representing the groups. |
Z |
Am optional matrix initializing the probabilities representing the groups. |
Rj |
K length vector containing the number of subclasses in each of the K classes. |
lambda |
The weight on the L2-norm for elastic net regression. Default: 1e-6. |
stop |
If STOP is negative, its absolute value corresponds to the desired number of variables. If STOP is positive, it corresponds to an upper bound on the L1-norm of the b coefficients. There is a one to one correspondence between stop and t. |
maxIte |
Maximum number of iterations. Default: 50. |
Q |
The number of components to include. Maximum and default is R-1 (total number of subclasses less one). |
trace |
If TRUE, prints out its progress. Default: FALSE. |
tol |
Tolerance for the stopping criterion (change in RSS). Default: 1e-4 |
... |
additional arguments |
Details
The function finds sparse directions for linear classification of mixture og gaussians models.
Value
Returns a list with the following attributes:
call |
The call |
beta |
The loadings of the sparse discriminative directions. |
theta |
The optimal scores. |
Z |
Updated subclass probabilities. |
Rj |
a vector of the number of ssubclasses per class |
rss |
A vector of the Residual Sum of Squares at each iteration. |
Author(s)
Line Clemmensen
References
Clemmensen, L., Hastie, T., Witten, D. and Ersboell, K. (2007) "Sparse discriminant analysis", Technometrics, To appear.
See Also
Examples
# load data
data(penicilliumYES)
X <- penicilliumYES$X
Y <- penicilliumYES$Y
Z <- penicilliumYES$Z
## test samples
Iout <- c(3, 6, 9, 12)
Iout <- c(Iout, Iout+12, Iout+24)
## training data
Xtr <- X[-Iout,]
k <- 3
n <- dim(Xtr)[1]
Rj <- rep(4, 3)
## Normalize data
Xc <- normalize(Xtr)
Xn <- Xc$Xc
p <- dim(Xn)[2]
## perform SMDA with one non-zero loading for each discriminative
## direction
## Not run:
smdaFit <- smda(x = Xn,
y = Y,
Z = Z,
Rj = Rj,
lambda = 1e-6,
stop = -5,
maxIte = 10,
tol = 1e-2)
# testing
Xtst <- X[Iout,]
Xtst <- normalizetest(Xtst, Xc)
test <- predict(smdaFit, Xtst)
## End(Not run)