%\VignetteIndexEntry{An R Package for W-NOMINATE} %\VignetteKeywords{multivariate} %\VignettePackage{wnominate} \documentclass[12pt]{article} \usepackage{Sweave} \usepackage{amsmath} \usepackage{amscd} \begin{document} \SweaveOpts{concordance=TRUE} \title{Using W-NOMINATE in R} \author{James Lo} \maketitle \section{Introduction} This package estimates Poole and Rosenthal W-NOMINATE scores from roll call votes supplied though a \verb@rollcall@ object from package \verb@pscl@.\footnote{Production of this package is supported by NSF Grant SES-0611974.} The R version of W-NOMINATE computes ideal points using the same Fortran code base as the previous \emph{wnom9707()} software. It improves upon the earlier software in three ways. First, it is now considerably easier to input new data for estimation, as the current software no longer relies exclusively on the old \emph{ORD} file format for data input. Secondly, the software now allows users to generate standard errors for their ideal point estimates using a parametric bootstrap. Finally, the \verb@wnominate@ package includes a full suite of graphics functions to analyze the results. W-NOMINATE scores are based on the spatial model of voting. Let \textit{s} denote the number of policy dimensions, which are indexed by \textit{k=1, \ldots, s}; let \textit{p} denote the number of legislators (\textit{i=1,\ldots,p}); and \textit{q} denote the number of roll call votes (\textit{j=1,\ldots,q}). Let legislator \textit{i}'s ideal point be \textbf{$x_{i}$}, a vector of length \textit{s}. Each roll call vote is represented by vectors of length \textit{s}, \textbf{$z_{jy}$} and \textbf{$z_{jn}$}, where \textit{y} and \textit{n} stand for the policy outcomes associated with Yea and Nay, respectively. Legislator \textit{i}'s utility for outcome \textit{y} on roll call \textit{j} is: \begin{center} {\Large \begin{math} U_{ijy} = \beta exp{[-\frac{\sum_{k=1}^{s}w^{2}_{k}d^{2}_{ijyk}}{2}]} +\epsilon_{ijy} \end{math} }\\ \end{center} The $d^{2}_{ijyk}$ term in the exponent is the Euclidean distance between a legislator's ideal point $x_{i}$ and the Yea bill location $z_{jyk}$; namely, \begin{center} {\Large \begin{math} d^{2}_{ijy}=\sum_{k=1}^{s}(x_{ik}-z_{jyk})^2 \end{math} } \end{center} Weight $w$ and $\beta$ are estimated but set with initial values of 0.5 and 15 respectively. $\beta$ can be thought of as a signal-to-noise ratio, where as $\beta$ increases in value, the deterministic portion of the utility function overwhelms the stochastic portion. In multiple dimensions, $\beta$ is only estimated for the first dimension and is thereafter kept constant. For all other dimensions, the corresponding $w_{k}$ is estimated, with the starting value of $w_{k}$ set at 0.5 each time. In estimating the outcome points for each bill, W-NOMINATE estimates the outcome points in terms of their midpoint and the distance between them; namely, \begin{center} $z_{jy}=z_{mj}-d_{j}$ and $z_{jn}=z_{mj}+d_{j}$ \end{center} where $z_{mj}$ is the midpoint and $d_{j}=\frac{z_{jy}-z_{jn}}{2}$. \section{Usage Overview} The \verb@wnominate@ package was designed for use in one of three ways. First, users can estimate ideal points from a set of Congressional roll call votes stored in the traditional \emph{ORD} file format. Secondly, users can generate a vote matrix of their own, and feed it directly into \verb@wnominate@ for analysis. Finally, users can also generate test data with ideal points and bill parameters arbitrarily specified as arguments by the user for analysis with \verb@wnominate@. Each of these cases are supported by a similar sequence of function calls, as shown in the diagrams below: \begin{flushleft} $ \begin{CD} \texttt{\emph{ORD} file} @>\texttt{readKH()}>> \texttt{\emph{rollcall} object} @>\texttt{wnominate()}>> \texttt{\emph{wnominate} object} \end{CD} $ $ \begin{CD} \texttt{Vote matrix} @>\texttt{rollcall()}>> \texttt{\emph{rollcall} object} @>\texttt{wnominate()}>> \texttt{\emph{wnominate} object} \end{CD} $ $ \begin{CD} \texttt{Arguments} @>\texttt{generateTestData()}>> \texttt{\emph{rollcall} object} @>\texttt{wnominate()}>> \texttt{\emph{wnominate} object} \end{CD} $ \end{flushleft} Following generation of a \verb@wnominate@ object, the user then analyzes the results using the \emph{plot} and \emph{summary} methods, including: \begin{itemize} \item \textbf{plot.coords()}: Plots ideal points in one or two dimensions. \item \textbf{plot.angles()}: Plots a histogram of cut lines. \item \textbf{plot.cutlines()}: Plots a specified percentage of cutlines (a Coombs mesh). \item \textbf{plot.skree()}: Plots a Skree plot with the first 20 eigenvalues. \item \textbf{plot.nomObject()}: S3 method for a \verb@wnominate@ object that combines the four plots described above. \item \textbf{summary.nomObject()}: S3 method for a \verb@wnominate@ object that summarizes the estimates. \end{itemize} Examples of each of the three cases described here are presented in the following sections. \section{W-NOMINATE with ORD files} This is the use case that the majority of \verb@wnominate@ users are likely to fall into. Roll call votes in a fixed width format \emph{ORD} format for all U.S. Congresses are stored online for download from \emph{https://www.voteview.com/data}. For example, votes from the 117th Congress can be downloaded from \begin{center} \emph{https://www.voteview.com/static/data/out/votes/H117\_votes.ord} \end{center} \verb@wnominate@ takes \verb@rollcall@ objects from Simon Jackman's \verb@pscl@ package as input. The package includes a function, \emph{readKH()}, that takes an \emph{ORD} file and automatically transforms it into a \verb@rollcall@ object as desired. Refer to the documentation in \verb@pscl@ for more detailed information on \emph{readKH()} and \emph{rollcall()}. Using the 90th Senate as an example, we can download the file \emph{sen90kh.ord} and read the data in R as follows:\\ <>= suppressMessages(library(wnominate)) #sen90 <- readKH("ftp://voteview.com/sen90kh.ord") data(sen90) #Does same thing as above sen90 @ To make this example more interesting, suppose we were interested in applying \emph{wnominate()} only to bills that pertained in some way to agriculture. Keith Poole and Howard Rosenthal's VOTEVIEW software allows us to quickly determine which bills in the 90th Senate pertain to agriculture.\footnote{VOTEVIEW for Windows can be downloaded at \emph{www.voteview.com}.} Using this information, we create a vector of roll calls that we wish to select, then select for them in the \verb@rollcall@ object. In doing so, we should also take care to update the variable in the \verb@rollcall@ object that counts the total number of bills, as follows: <>= selector <- c(21,22,44,45,46,47,48,49,50,53,54,55,56,58,59,60,61,62,65,66,67,68,69,70,71,72,73,74,75,77,78,80,81,82,83,84,87,99,100,101,105,118,119,120,128,129,130,131,132,133,134,135,141,142,143,144,145,147,149,151,204,209,211,218,219,220,221,222,223,224,225,226,227,228,229,237,238,239,252,253,257,260,261,265,266,268,269,270,276,281,290,292,293,294,295,296,302,309,319,321,322,323,324,325,327,330,331,332,333,335,336,337,339,340,346,347,357,359,367,375,377,378,379,381,384,386,392,393,394,405,406,410,418,427,437,442,443,444,448,449,450,454,455,456,459,460,461,464,465,467,481,487,489,490,491,492,493,495,497,501,502,503,504,505,506,507,514,515,522,523,529,539,540,541,542,543,544,546,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,565,566,567,568,569,571,584,585,586,589,590,592,593,594,595) sen90$m <- length(selector) sen90$votes <- sen90$votes[,selector] @ \emph{wnominate()} takes a number of arguments described fully in the documentation. Most of the arguments can (and probably should) be left at their defaults, particularly when estimating ideal points from U.S. Congresses. The default options estimate ideal points in two dimensions without standard errors, using the same beta and weight parameters as described in the introduction. Votes where the losing side has less than 2.5 per cent of the vote, and legislators who vote less than 20 times are excluded from analysis. The most important argument that \emph{wnominate()} requires is a set of legislators who have positive ideal points in each dimension. This is the \emph{polarity} argument to \emph{wnominate()}. In two dimensions, this might mean a fiscally conservative legislator on the first dimension, and a socially conservative legislator on the second dimension. Polarity can be set in a number of ways, such as a vector of row indices (the recommended method), a vector of names, or by any arbitrary column in the \emph{legis.data} element of the \verb@rollcall@ object. Here, we use Senators Sparkman and Bartlett to set the polarity for the estimation. The names of the first 12 legislators are shown, and we can see that Sparkman and Bartlett are the second and fifth legislators respectively. <>= rownames(sen90$votes)[1:12] result <- wnominate(sen90, polarity=c(2,5)) @ \verb@result@ now contains all of the information from the W-NOMINATE estimation, the details of which are fully described in the documentation for \emph{wnominate()}. \verb@result$legislators@ contains all of the information from the \verb@nom31.dat@ file from the old Fortran \emph{wnom9707()}, while \verb@result$rollcalls@ contains all of the information from the old \verb@nom33.dat@ file. The information can be browsed using the \emph{fix()} command as follows (not run):\\ \noindent > \emph{legisdata <- result\$legislators}\\ > \emph{fix(legisdata)}\\ For those interested in just the ideal points, a much better way to do this is to use the \emph{summary()} function: <>= summary(result) @ \verb@result@ can also be plotted, with a basic summary plot achieved as follows as shown Figure 1: \begin{figure} \begin{center} <>= plot(result) @ \end{center} \caption{Summary Plot of 90th Senate Agriculture Bill W-NOMINATE Scores} \label{fig:one} \end{figure} This basic plot splits the window into 4 parts and calls \emph{plot.coords()}, \emph{plot.angles()}, \emph{plot.skree()}, and \emph{plot.cutlines()} sequentially. Each of these four functions can be called individually. In this example, the coordinate plot on the top left plots each legislator with their party affiliation. A unit circle is included to illustrate how W-NOMINATE scores are constrained to lie within a unit circle. Observe that with agriculture votes, party affiliation does not appear to be a strong predictor on the first dimension, although the second dimension is largely divided by party line. The cutting angle histogram shows that most votes are well classified by a single dimension (i.e. around 90$^\circ$), although there are a number around 30$^\circ$ as well. The Skree plot shows the first 20 eigenvalues, and the rapid decline after the second eigenvalue suggests that a two-dimensional model describes the voting behavior of the 90th Senate well. The final plot shows 50 random cutlines, and can be modified to show any desired number of cutlines as necessary. Three things should be noted about the use of the \emph{plot()} functions. First, the functions always plot the results from the first two dimensions, but the dimensions used (as well as titles and subheadings) can all be changed by the user if, for example, they wish to plot dimensions 2 and 3 instead. Secondly, plots of one dimensional \verb@wnominate@ objects work somewhat differently than in two dimensions and are covered in the example in the final section. Finally, \emph{plot.coords()} can be modified to include cutlines from whichever votes the user desires. The cutline of the 14th agricultural vote (corresponding to the 58th actual vote) from the 90th Senate with ideal points is plotted below in Figure 2, showing that the vote largely broke down along partisan lines. \begin{figure} \begin{center} <>= par(mfrow=c(1,1)) plot.coords(result,cutline=14) @ \end{center} \caption{90th Senate Agriculture Bill W-NOMINATE Scores with Cutline} \label{fig:two} \end{figure} \section{W-NOMINATE with arbitrary vote matrix} This section describes an example of W-NOMINATE being used for roll call data not already in \emph{ORD} format. The example here is drawn from the first three sessions of the United Nations, discussed further as Figure 5.8 in Keith Poole's \emph{Spatial Models of Parliamentary Voting}. To create a \verb@rollcall@ object for use with \emph{wnominate()}, one ideally should have three things: \begin{itemize} \item A matrix of votes from some source. The matrix should be arranged as a \textit{legislators} x \textit{votes} matrix. It need not be in 1/6/9 or 1/0/NA format, but users must be able to distinguish between Yea, Nay, and missing votes. \item A vector of names for each member in the vote matrix. \item OPTIONAL: A vector describing the party or party-like memberships for the legislator. \end{itemize} The \verb@wnominate@ package includes all three of these items for the United Nations, which can be loaded and browsed with the code shown below. The data comes from Eric Voeten at George Washington University. In practice, one would prepare a roll call data set in a spreadsheet, like the one available one \emph{www.voteview.com/UN.csv}, and read it into R using \emph{read.csv()}. The csv file is also stored in this package and can be read using:\\ \noindent \emph{UN<-read.csv(``library/wnominate/data/UN.csv'',header=FALSE,strip.white=TRUE) }\\ %Does this work on Macs/Linux? The line above reads the exact same data as what is stored in this package as R data, which can be obtained using the following commands: <>= rm(list=ls(all=TRUE)) data(UN) UN<-as.matrix(UN) UN[1:5,1:6] @ Observe that the first column are the names of the legislators (in this case, countries), and the second column lists whether a country is a ``Warsaw Pact'' country or ``Other'', which in this case can be thought of as a `party' variable. All other observations are votes. Our objective here is to use this data to create a \verb@rollcall@ object through the \emph{rollcall} function in \verb@pscl@. The object can then be used with \emph{wnominate()} and its plot/summary functions as in the previous \emph{ORD} example. To do this, we want to extract a vector of names (\emph{UNnames}) and party memberships (\emph{party}), then delete them from the original matrix so we have a matrix of nothing but votes. The \emph{party} variable must be rolled into a matrix as well for inclusion in the \verb@rollcall@ object as follows: <>= UNnames<-UN[,1] legData<-matrix(UN[,2],length(UN[,2]),1) colnames(legData)<-"party" UN<-UN[,-c(1,2)] @ In this particular vote matrix, Yeas are numbered 1, 2, and 3, Nays are 4, 5, and 6, abstentions are 7, 8, and 9, and 0s are missing. Other vote matrices are likely different so the call to \emph{rollcall} will be slightly different depending on how votes are coded. Party identification is included in the function call through \verb@legData@, and a \verb@rollcall@ object is generated and applied to W-NOMINATE as follows. The result is summarized below and plotted in Figure 3: <>= rc <- rollcall(UN, yea=c(1,2,3), nay=c(4,5,6), missing=c(7,8,9),notInLegis=0, legis.names=UNnames, legis.data=legData, desc="UN Votes", source="www.voteview.com") result<-wnominate(rc,polarity=c(1,1)) @ <>= summary(result) @ \begin{figure} \begin{center} <>= plot(result) @ \end{center} \caption{Summary Plot of UN Data} \label{fig:three} \end{figure} \section{W-NOMINATE with Test Data} The W-NOMINATE package includes a test data generator that can be used to generate arbitrary vote matrices. These functions are typically only used for testing purposes, and full documentation of them is included with the package. The two functions are: \begin{itemize} \item \emph{nomprob()}: A function that yields a matrix of probabilities of a Yea vote. \item \emph{generateTestData()}: A function that calls nomprob and generates a \verb@rollcall@ object. \end{itemize} \textbf{nomprob()} takes a matrix of Yea and Nay locations, along with a matrix of ideal points, and generates the vote probability matrix. In this example, the Yea positions on 6 bills is set at 0.3, and the Nay position is set at -0.2. Bill parameters can be set in as many dimensions as desired, but since the input matrices in this example have only one column, the the bills are predominantly one-dimensional, as are the ideal points. For the ideal points, the first 5 legislators are set to have ideal points of -0.2 while the last 5 are set to have ideal points of 0.2. This setup leads to the last 5 legislators having a high probability of voting Yea on all of the bills, as confirmed in the example, and vice versa. The beta and weights can also be specified, and in this example we set them to be the \emph{wnominate()} default values of 15 and 0.5. Both normal and logistic link functions can be used to generate the probabilities, although the default is to use normal probabilities. <>= yp <- matrix(rep(0.3,6),nrow=6) np <- matrix(rep(-0.2,6),nrow=6) ideal <- matrix(c(rep(-0.2,5),rep(0.2,5)),nrow=10) nomprob(yp,np,ideal,15,0.5) @ \textbf{generateTestData()} generates a full rollcall object using \emph{nomprob()}, and in fact passes most of its arguments to it. A totally random set of parties and states are assigned to the legislators, so splits by party and region are not substantively meaningful. The function is set up so that users need only specify the number of legislators and roll call votes they wish to generate, and by default it generates a one dimensional model, as is shown here in Figure 4: <>= dat <- generateTestData(legislators=100, rcVotes=1000) result <- wnominate(dat,polarity=1,dims=1) summary(result) @ \begin{figure} \begin{center} <>= plot(result) @ \end{center} \caption{Summary Plot of Randomly Generated Data} \label{fig:four} \end{figure} Note that the one dimensional plot differs considerably from the previous two dimensional plots, since only a coordinate plot and a Skree plot are shown. This is because in one dimension, all cutlines are angled at 90$^\circ$, so there is no need to plot either the cutlines or a histogram of cutline angles. Also, the plot appears to be compressed, so users need to expand the image manually by using their mouse and dragging along the corner of the plot to expand it. The Skree plot confirms that a single dimension captures the voting patterns accurately, as seen by the flatness of the curve from the second dimension on. \newpage \begin{thebibliography}{3} \bibitem{Bootstrap} Lewis, Jeffrey B. and Poole, Keith (2004) ``Measuring Bias and Uncertainty in Ideal Point Estimates via the Parametric Bootstrap.'' \emph{Political Analysis} 12(2): 105-127. \bibitem{Congress} Poole, Keith (2005) \emph{Spatial Models of Parliamentary Voting.} Cambridge: Cambridge University Press. \bibitem{Congress} Poole, Keith and Howard Rosenthal (1997) \emph{Congress: A Political-Economic History of Roll Call Voting.} New York: Oxford University Press. \end{thebibliography} \end{document}