% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/GFA.R
\name{gfa}
\alias{gfa}
\title{Gibbs sampling for group factor analysis}
\usage{
gfa(Y, opts, K = NULL, projection = NULL, filename = "")
}
\arguments{
\item{Y}{Either \enumerate{
\item{Data sources with co-occuring samples: a list of data
matrices, where Y[[m]] is a numeric \eqn{N \times D_m} matrix, or}
\item{Data sources paired in two modes (some data sources share the
samples of the first data source, and some share its features): A list with
two elements structured as 1. The data collections Y[[1]] and
Y[[2]] should be connected by sharing their first data source, i.e.
Y[[1]][[1]] should equal the transpose of Y[[2]][[1]].}
}
NOTE: The data features should have roughly zero mean and unit variance.
If this is not the case, preprocessing with function
\code{\link{normalizeData}} is recommended.}

\item{opts}{List of model options; see function \code{\link{getDefaultOpts}}.}

\item{K}{The number of components (i.e. latent variables). Recommended to be
set somewhat higher than the expected component number, so that the sampler
can determine the model complexity by shutting down excessive components.
High values result in high CPU time. Default: half of the minimum of the
sample size and total data dimensionality.}

\item{projection}{Fixed projections. Only intended for sequential prediction
use via function \code{\link{sequentialGfaPrediction}}. Default: NULL.}

\item{filename}{A string. If provided, will save the sampling chain to this
file every 100 iterations. Default "", inducing no saving.}
}
\value{
A list containing the model parameters - in case of pairing in two
  modes, each element is a list of length 2; one element for each mode.
  For most parameters, the final posterior sample is provided to aid in
  initial checks; all the posterior samples should be used for model
  analysis. The list elements are:   
\item{W}{The loading matrix (final posterior sample); \eqn{D \times K}
         matrix.}
\item{X}{The latent variables (final sample); \eqn{N \times K} matrix.}
\item{Z}{The spike-and-slab parameters (final sample); \eqn{D \times K}
         matrix.}
\item{r}{The probability of slab in Z (final sample).}
\item{rz}{The probability of slab in the spike-and-slab prior of X
        (final sample).}
\item{tau}{The noise precisions (final sample); D-element vector.}
\item{alpha}{The precisions of the projection weights W (final sample);
        \eqn{D \times K} matrix.}
\item{beta}{The precisions of the latent variables X (final sample);
        \eqn{N \times K} matrix.}
\item{groups}{A list denoting which features belong to each data source.}
\item{D}{Data dimensionalities; M-element vector.}
\item{K}{The number of components inferred. May be less than the initial K.}
and the following elements:
\item{posterior}{the posterior samples of, by default, X, W and tau.}
\item{cost}{The likelihood of all the posterior samples.}
\item{aic}{The Akaike information criterion of all the posterior samples.}
\item{opts}{The options used for the GFA model.}
\item{conv}{An estimate of the convergence of the model's reconstruction
        based on Geweke diagnostic. Values significantly above 0.05 imply a
        non-converged model, and hence the need for a longer sampling chain.}
\item{time}{The CPU time (in seconds) used to sample the model.}
}
\description{
\code{gfa} returns posterior samples of group factor analysis model.
}
\details{
GFA allows factor analysis of multiple data sources (i.e. data sets).
The priors of the model can be set to infer bicluster structure
from the data sources; see \code{\link{getDefaultOpts}}.
Missing values (NAs) are inherently supported. They will not affect the model
parameters, but can be predicted with function \code{\link{reconstruction}},
based on the observed values of the corresponding sample and feature.
The association of a data source to each component is inferred based on
the data. Letting only a subset of the components to explain a data source
results in the posterior identifying relationships between any subset of the
data sources. In the extreme cases, a component can explain relationships
within a single data source only ("structured noise"), or across all the data
sources.
}
