\documentclass[doc]{apa}
%\documentclass[11pt]{article}
%\documentclass[11pt]{amsart}
\usepackage{geometry}                % See geometry.pdf to learn the layout options. There are lots.
\geometry{letterpaper}                   % ... or a4paper or a5paper or ... 
%\geometry{landscape}                % Activate for for rotated page geometry
\usepackage[parfill]{parskip}    % Activate to begin paragraphs with an empty line rather than an indent
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{mathptmx}
\usepackage{helvet}
\usepackage{courier}
\usepackage{epstopdf}
\usepackage{makeidx}        % allows index generation
\usepackage[authoryear,round]{natbib} 
\usepackage{gensymb}
\usepackage{longtable}
%\usepackage{geometry}     
\usepackage{/Library/Frameworks/R.framework/Versions/2.7/Resources/share/texmf/Sweave}
%\usepackage[ae]{Rd}
%\usepackage[usenames]{color}
%\usepackage{setspace}
\usepackage{amssymb}    
\usepackage{amsmath}
%\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\bibstyle{apacite}
\bibliographystyle{apa}   %this one plus author year seems to work?
\usepackage{hyperref}

\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
% \VignetteIndexEntry{An R Package for ...}
% \VignetteDepends{foo, bar, ...}
% \VignetteKeyword{multivariate}
% \VignetteKeyword{models}
% \VignetteKeyword{Hplot}
\usepackage{multicol}        % used for the two-column index
\usepackage[bottom]{footmisc}% places footnotes at page bottom
\let\proglang=\textsf
\newcommand{\R}{\proglang{R}}
%\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}} 
\newcommand{\fun}[1]{{\texttt{#1}\index{#1}\index{R function!#1}}}
\newcommand{\pfun}[1]{{\texttt{#1}\index{#1}\index{R function!#1}\index{R function!psych package!#1}}}\newcommand{\Rc}[1]{{\texttt{#1}}}    %R command  same as Robject
\newcommand{\Robject}[1]{{\texttt{#1}}} 
\newcommand{\Rpkg}[1]{{\textit{#1}\index{#1}\index{R package!#1}}}   %different from pkg  - which is better?
\newcommand{\iemph}[1]{{\emph{#1}\index{#1}}} 
\newcommand{\wrc}[1]{\marginpar{\textcolor{blue}{#1}}}   %bill's comments
\newcommand{\wra}[1]{\textcolor{blue}{#1}}  %bill's comments

\newcommand{\ve}[1]{{\textbf{#1}}} %trying to get a vector command


\makeindex         % used for the subject index

\title{Using the psych package to generate and test structural models}
\author{William Revelle}

\affiliation{Northwestern University}
\acknowledgements{Written to accompany the psych package. \\ Comments should be directed to William Revelle \\  \url{revelle@northwestern.edu}}
%\date{}                                         % Activate to display a given date or no date


\begin{document}
\maketitle
%\section{}
%\subsection{}
\tableofcontents
\newpage
\section{The psych package}

\subsection{Preface}
The \Rpkg{psych} package \cite{psych} has been developed to include those functions most useful for teaching and learning basic psychometrics and personality theory. Functions have been developed for many parts of the analysis of test data, including basic descriptive statistics (\pfun{describe} and \pfun{pairs.panels}), dimensionality analysis (\pfun{ICLUST}, \pfun{VSS}, \pfun{principal}, \pfun{factor.pa}), reliability analysis (\pfun{omega}, \pfun{guttman}) and eventual scale construction (\pfun{cluster.cor}, \pfun{score.items}).   The use of these and other functions is described in more detail in the complete user's manual and the relevant help pages.  This vignette is concerned with the problem of modeling structural data and using the \Rpkg{psych} package as a front end for the much more powerful \Rpkg{sem} package of John Fox \cite{fox:06,sem}.

\subsection{Creating and modeling structural relations}
One common application of \pfun{psych} is the creation of simulated data matrices with  particular structures to use as examples for principal components analysis, factor analysis, cluster analysis, and structural equation modeling.   This vignette describes some of the functions used for creating, analyzing, and displaying such data sets.  The examples use two other packages: \Rpkg{Rgraphviz} and \Rpkg{sem}.  Although not required to use the \Rpkg{psych} package, these two libraries are required for these examples. \Rpkg{Rgraphviz} is used for the graphical displays, but the analyses themselves require only the \Rpkg{sem} package to do  the structural modeling



\section{Functions for generating correlational matrices with a particular structure}
The \pfun{sim} family of  functions create data sets with particular structure.  Most of these functions have default values that will produce useful examples.  Although graphical summaries of these structures will be shown here, some of the options of the graphical displays will be discussed in a later section.

\subsection{sim.congeneric}
Classical test theory considers tests to be \iemph{tau} equivalent if they have the same covariance with a vector of  latent true scores, but perhaps different error variances.  Tests are considered \iemph{congeneric} if they each have the same true score component (perhaps to a different degree) and independent error components.  The \pfun{sim.congeneric} function may be used to generate either structure.

<<print=FALSE,echo=TRUE>>=
tau <- sim.congeneric(loads=c(.8,.8,.8,.8)) #population values
tau.samp <- sim.congeneric(loads=c(.8,.8,.8,.8),N=100) # sample correlation matrix for 100 cases
round(tau.samp,2)  
tau.samp <- sim.congeneric(loads=c(.8,.8,.8,.8),N=100, short=FALSE) 
tau.samp
dim(tau.samp$observed)
@
In this last case, the generated data are retrieved from tau.samp\$observed.

Congeneric data are created by specifying unequal loading values.  The default is loadings of c(.8,.7,.6,.5). As seen in Figure~\ref{fig:tau}, tau equivalence is the special case where all paths are equal.
<<print=FALSE,echo=TRUE>>=
cong <- sim.congeneric(N=100)
round(cong,2)
@

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
m1 <- structure.graph(c("a","b","c","d"))
@
\caption{Tau equivalent tests are special cases of congeneric tests.  Tau equivalence assumes a=b=c=d}
\label{fig:tau}
\end{center}
\end{figure}



\subsection{sim.hierarchical}
The previous function, \pfun{sim.congeneric}, is used when one factor accounts for the pattern of correlations.  A slightly more complicated model is when one broad factor and several narrower factors are observed.  An example of this structure might be the structure of mental abilities, where there is a broad factor of general ability and several narrower factors (e.g., spatial ability, verbal ability, working memory capacity).   Another example is in the measure of psychopathology where a broad general factor of neuroticism is seen along with more specific anxiety, depression, and aggression factors.  This kind of structure may be simulated with \pfun{sim.hierarchical} specifying the loadings of each sub factor on a general factor (the g-loadings) as well as the loadings of individual items on the lower order factors (the f-loadings).  An early paper describing a \iemph{bifactor} structure was by \cite{holzinger:37}.  A helpful description of what makes a good general factor is that of \cite{jensen:weng}.

<<print=FALSE,echo=TRUE>>=
gload=matrix(c(.9,.8,.7),nrow=3)
fload <- matrix(c(.9,.8,.7,rep(0,9),.7,.6,.5,
rep(0,9),.6,.5,.4),   ncol=3)
bifact <- sim.hierarchical(gload=gload,fload=fload)
round(bifact,2)
@
These data can be represented as either a \iemph{bifactor} (Figure~\ref{fig:bifact}) or \iemph{hierarchical} (Figure~\ref{fig:hierarch}) factor solution.
\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
m.bi <- omega(bifact,title="A bifactor model")
@
\caption{A bifactor solution represents each test in terms of a general factor and a residualized group factor.}
\label{fig:bifact}
\end{center}
\end{figure}

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
m.hi <- omega(bifact,sl=FALSE,title="A hierarchical model")
@
\caption{A hierarchical factor solution has g as a second order factor accounting for the correlations between the first order factors.}
\label{fig:hierarch}
\end{center}
\end{figure}


\subsection{\pfun{sim.item} and \pfun{sim.circ}}
Many personality questionnaires are thought to represent multiple, independent factors.  A particularly interesting case is when there are two factors and the items either have \iemph{simple structure} or \iemph{circumplex structure}.  Examples of such items with a circumplex structure are measures of emotion \citep{rafaeli:revelle:06} where many different emotion terms can be arranged in a two dimensional space, but where there is no obvious clustering of items.  Typical personality scales are constructed to have simple structure, where items load on one and only one factor.  

An additional  challenge to measurement with emotion or personality items is that the items can be highly skewed and are assessed with a small number of discrete categories (do not agree, somewhat agree, strongly agree).  

The more general  \pfun{sim.item} function, and the more specific, \pfun{sim.circ} functions simulate items with a two dimensional structure, with or without skew, and varying the number of categories for the items.


\subsection{\pfun{sim.structural}}
A more general case is to consider three matrices, $\vec{f}_x,\vec{\phi_{xy}},  \vec{f}_y $ which describe, in turn, a measurement model of x variables, $\vec{f}_x$, a measurement model of y variables, $\vec{f}_x$, and a covariance matrix between and within the two sets of factors.  If $\vec{f}_x$ is a vector and $\vec{f}_y$ and $\vec{phi}_{xy}$ are NULL, then this is just the congeneric model.  If $\vec{f}_x$ is a matrix of loadings with n rows and c columns, then this is a measurement model for n variables across c factors.  If $\vec{phi}_{xy}$ is not null, but $\vec{f}_y$ is NULL, then the factors in $\vec{f}_x$ are correlated.  Finally, if all three matrices are not NULL, then the data show the standard linear structural relations (LISREL) structure.

Consider the following examples:

\begin{enumerate}
\item $\vec{f}_x$ is a vector implies a congeneric model:
<<print=FALSE,echo=TRUE>>=
fx <- c(.9,.8,.7,.6)
cong1 <- sim.structural(f=fx)
cong1
@


\item $\vec{f}_x$ is a matrix implies an independent factors model:
<<print=FALSE,echo=TRUE>>=
fx  <- matrix(c(.9,.8,.7,rep(0,9),.7,.6,.5,rep(0,9),.6,.5,.4),   ncol=3)
three.fact <- sim.structural(f=fx)
three.fact
@

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
three.fact.mod <-structure.graph(fx)
@
\caption{default}
\label{Three uncorrelated factors generated using the structure.graph function.}
\end{center}
\end{figure}

\item $\vec{f}_x$ is a matrix and Phi $\neq I$ is a correlated factors model
<<print=FALSE,echo=TRUE>>=
Phi = matrix(c(1,.5,.3,.5,1,.2,.3,.2,1), ncol=3)
corf3 <- sim.structural(f=fx,Phi=Phi)
fx
Phi
corf3
@


This can be shown with symbolic loadings and path coefficients by using the \pfun{structure.list} and \pfun{phi.list} functions to create the fx and Phi matrices.
\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=TRUE, fig=TRUE,eps=FALSE>>=
fxs <- structure.list(9,list(F1=c(1,2,3),F2=c(4,5,6),F3=c(7,8,9)))
Phis <- phi.list(3,list(F1=c(2,3),F2=c(1,3),F3=c(1,2)))
fxs  #show the matrix
Phis #show this one as well
corf3.mod <- structure.graph(fxs,Phi=Phis)
@
\caption{Three correlated factors with symbolic paths. Created using structure.graph and structure.list and phi.list for ease of input.}
\label{fig:symb3}
\end{center}
\end{figure}


\item $\vec{f}_x$ and  $\vec{f}_y$ are  matrices, and  Phi $ne I$ represents their correlations.  
<<print=FALSE,echo=TRUE>>=
fx  <- matrix(c(.9,.8,.7,rep(0,9),.7,.6,.5,rep(0,9),.6,.5,.4),   ncol=3)
fy <- c(.6,.5,.4)
Phi <- matrix(c(1,.5,.3,.1,.5,1,.2,.4,.3,.2,1,.4,.1,.4,.4,1), ncol=4)
ls <- sim.structural(fx,fy,Phi)
ls
@
\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
fxs <- structure.list(9,list(X1=c(1,2,3), X2 =c(4,5,6),X3 = c(7,8,9)))
phi <- phi.list(4,list(F1=c(4),F2=c(4),F3=c(4),F4=c(1,2,3)))
fyx <- structure.list(3,list(Y=c(1,2,3)),"Y")
sg3 <- structure.graph(fxs,phi,fyx)
@
\caption{A symbolic structural model. Three independent latent variables are regressed on a latent Y.}
\label{fig:symb}
\end{center}
\end{figure}

This may be seen by specifying a symbolic model seen in Figure~\ref{fig:symb3}. 

\end{enumerate}
 
\section{Functions for analyzing structure}
Given a correlation matrix such as seen above for congeneric or bifactor models, how best to estimate the underlying structure.  Because these data sets were generated from a known model, the question becomes how well does a particular model recover the underlying structure.

\subsection{Exploratory models}
The technique of \iemph{principal components} provides a set of weighted linear composites that best aproximates a particular correlation or covariance matrix.  If these are then \iemph{rotated} to provide a more interpretable solution, the components are no longer the \iemph{principal} components.  The \pfun{principal} function will extract the first n principal components (default value is 1) and if n>1, rotate to \iemph{simple structure} using a \fun{varimax}, \fun{quartimin}, or \pfun{Promax} criterion.

<<print=FALSE,echo=TRUE>>=
principal(cong1$model)
factor.pa(cong1$model)
@
It is important to note that although the \pfun{principal} components function does not exactly reproduce the model parameters, the \pfun{factor.pa} function, implementing principal axes factor analysis, does.  

Consider the case of three underlying factors as seen in the bifact example above.

<<print=FALSE,echo=TRUE>>=
pc3 <- principal(bifact,3)
pa3 <- factor.pa(bifact,3)
ml3 <- factanal(covmat=bifact,factors=3)
pc3
pa3
ml3
factor.congruence(pc3,pa3)
factor.congruence(pa3,ml3)
@

By default, all three of these procedures use the varimax rotation criterion.  Perhaps it is useful to apply an oblique transformation such as \pfun{Promax} or \fun{oblimin} to the results.  The \pfun{Promax} function in \Rpkg{psych} differs slightly from the standard \fun{promax} in that it reports the factor intercorrelations.
<<print=FALSE,echo=TRUE>>=
ml3p <- Promax(ml3)
ml3p
@
\subsection{Hierarchical models}
An exploratory hierarchical model can be applied to this data structure using the \pfun{omega} function.  Graphic options include drawing a Schmid - Leiman bifactor solution (Figure~\ref{fig:om:bi}) or drawing a hierarchical factor solution f(Figure~\ref{fig:om:hi}).

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=TRUE, fig=TRUE,eps=FALSE>>=
om.bi <- omega(bifact)
@
\caption{An exploratory bifactor solution to the nine variable problem}
\label{fig:om:bi}
\end{center}
\end{figure}

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=TRUE, fig=TRUE,eps=FALSE>>=
om.hi <- omega(bifact,sl=FALSE)
@
\caption{An exploratory hierarchical solution to the nine variable problem}
\label{fig:om:hi}
\end{center}
\end{figure}

Both of these graphical representations are reflected in the output of the \pfun{omega} function.  The first was done using a Schmid-Leiman transformation, the second was not. As will be seen later, the objects returned from these two analyses may be used as models for a \fun{sem} analysis. It is also useful to examine the estimates of reliability reported by \pfun{omega}.
<<print=FALSE,echo=TRUE>>=
om.bi
@


Yet one more way to show the hierarchical structure of a data set is to consider hierarchical cluster analysis using the \pfun{ICLUST} algorithm (Figure~\ref{fig:iclust}).
\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
ic <- ICLUST(bifact,title="Hierarchical cluster analysis of bifact data")
@
\caption{A hierarchical cluster analysis of the bifact data set using ICLUST}
\label{fig:iclust}
\end{center}
\end{figure}

\section{Confirmatory models}

Although the exploratory models shown above do estimate the goodness of fit of the model and compare the residual matrix to a zero matrix using a $\chi^2$ statistic, they estimate more parameters than are necessary if there is indeed a simple structure, and they do not allow for tests of competing models.  The \fun{sem} function  in the \Rpkg{sem} package by John Fox allows for confirmatory tests.  The interested reader is referred to the \Rpkg{sem} manual for more detail \citep{sem}.



\subsection{Using psych as a front end for the sem package}
Because preparation of the \fun{sem} commands is a bit tedious, several of the \Rpkg{psych} package functions have been designed to provide the appropriate commands.  That is, the functions \pfun{structure.list},  \pfun{phi.list},  \pfun{structure.graph}, \pfun{structure.sem}, and \pfun{omega.graph} may be used as a front end to \fun{sem}. 


\subsection{Testing a congeneric model versus a tau equivalent model}
The congeneric model is a one factor model with possibly unequal factor loadings.  The  tau equivalent model model is one with equal factor loadings. Tests for these may be done by creating the appropriate structures. Either the \pfun{structure.graph} function which requires \fun{Rgraphviz} or the \pfun{structure.sem} function may be used.

The following example tests the hypothesis (which is actually false) that the correlations found in the cong data set (see \ref{congeneric} are tau equivalent.  Because the variable labels in that data set were V1 ... V4, we specify the labels to match those.
<<print=FALSE,echo=TRUE>>=
library(sem)
 mod.tau <- structure.graph(c("a","a","a","a"),labels=paste("V",1:4,sep=""))
mod.tau   #show it
sem.tau <- sem(mod.tau,cong,100)
summary(sem.tau)
@

Test whether the data are congeneric.  That is, whether a one factor model fits.  Compare this to the prior model using the \fun{anova} function.
<<print=FALSE,echo=TRUE>>=
mod.cong <- structure.sem(c("a","b","c","d"),labels=paste("V",1:4,sep=""))
mod.cong  #show the model
sem.cong <- sem(mod.cong,cong,100)
summary(sem.cong)
anova(sem.cong,sem.tau) #test the difference between the two models
@
\subsection{Testing the dimensionality of a hierarchical data set by creating the model}
The bifact correlation matrix was created to represent a hierarchical structure.  Various confirmatory models can be applied to this matrix.

The first example creates the model directly, the next several create models based upon exploratory factor analyses.

<<print=FALSE,echo=TRUE>>=
mod.one <- structure.sem(letters[1:9],labels=paste("V",1:9,sep=""))
mod.one  #show the model
bifact <- round(bifact,5)  #to ensure  that the sem procedure recognizes that this is a symmetric matrix
sem.one <- sem(mod.one,bifact,100)
summary(sem.one)
@

\subsection{Testing the dimensionality based upon an exploratory analysis}
Alternatively, the output from an exploratory factor analysis can be used as input to the structure.sem function.
<<print=FALSE,echo=TRUE>>=
f1 <- factanal(covmat=bifact,factors=1)
mod.f1 <- structure.sem(f1)
sem.f1 <- sem(mod.f1,bifact,100)
summary(sem.f1)
@

\subsection{Specifying a three factor model}
An alternative model is to extract three factors and try this solution.  The \pfun{factor.pa} factor analysis function is used for variety.

<<print=FALSE,echo=TRUE>>=
f3 <- factor.pa(bifact,3)
mod.f3 <- structure.sem(f3)
sem.f3 <- sem(mod.f3,bifact,100)
summary(sem.f3)
@

\subsection{Allowing for an oblique solution}
That solution is clearly very bad.  What would happen if the exploratory solution were allowed to have correlated (oblique) factors?  This analysis is done on a sample of size 100 with the bifactor structure created by \pfun{sim.hierarchical}. Unfortunately, this model does not converge.
<<print=FALSE,echo=TRUE>>=
bifact.s <- sim.hierarchical()  #create the data, use the sample correlation matrix
bifact.s <- round(bifact.s,5)
f3 <- factor.pa(bifact.s,3)     #extract three factors
f3.p <- Promax(f3)              #do a promax transformation
mod.f3p <- structure.sem(f3.p) #create the sem model
mod.f3p   #show it 
@
Unfortunately, this model seems to fail and can not be shown.
<<print=FALSE,echo=TRUE>>=
sem.f3p <-try( sem(mod.f3p,bifact.s,100)) #do the sem which tends to fail
try(summary(sem.f3p)  )     #report it if we can
@

The structure being tested may be seen using \pfun{structure.graph}

\begin{figure}[htbp]
\begin{center}
<<print=FALSE,echo=FALSE, fig=TRUE,eps=FALSE>>=
mod.f3p <- structure.graph(f3.p)
@
\caption{A three factor, oblique solution.}
\label{default}
\end{center}
\end{figure}

\subsection{Extract a bifactor solution using omega and then test that model using sem}

A bifactor solution has previously been shown (Figure~\ref{fig:om:bi}).  The output from the \pfun{omega} function includes the sem commands for the analysis. For completeness, the \fun{std.coef} from \Rpkg{sem} is used as well as the \fun{summary} function.
<<print=FALSE,echo=TRUE>>=
mod.bi <- om.bi$model
sem.bi <- sem(mod.bi,bifact.s,100)
summary(sem.bi)
std.coef(sem.bi)
@

\subsection{Examining a hierarchical solution}
A hierarchical solution to this data set was previously found by the \pfun{omega} function (Figure~\ref{fig:om:hi}).  The output of that analysis can be used as a model for a \fun{sem} analysis.  Once again, the \fun{std.coef} function helps see the structure.

<<print=FALSE,echo=TRUE>>=
mod.hi <- om.hi$model
sem.hi <- sem(mod.hi,bifact.s,100)
summary(sem.hi)
std.coef(sem.hi)
@

The use of exploratory and confirmatory models for understanding real data structures is an important advance in psychological research.  To the extent that the models we use can be tested on simple, artificial examples, it is perhaps easier to practice their application.  The \Rpkg{psych} tools for simulating structural models  and for specifying models are a useful supplement to the power of packages such as \Rpkg{sem}.

\newpage
\bibliography{/Volumes/WR/bill/Documents/Active/book/all} 
\printindex
\end{document}  
