--- title: "Biclustering and Ranklustering" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Biclustering and Ranklustering} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, out.width = "100%" ) ``` > **Note**: Some computationally intensive examples below are shown with `eval=FALSE` to keep CRAN build times short. For full rendered output, see the [pkgdown site](https://kosugitti.github.io/exametrika/articles/biclustering.html). ```{r library, message=FALSE, warning=FALSE} library(exametrika) ``` ## Biclustering and Ranklustering Biclustering and Ranklustering simultaneously cluster items into fields and examinees into classes/ranks. The difference is specified via the `method` option: - `method = "B"`: Biclustering (no filtering matrix) - `method = "R"`: Ranklustering (with filtering matrix for ordered ranks) ### Biclustering ```{r model-biclustering, message=FALSE, warning=FALSE} Biclustering(J35S515, nfld = 5, ncls = 6, method = "B") ``` ### Ranklustering ```{r model-ranklustering, message=FALSE, warning=FALSE} result.Ranklustering <- Biclustering(J35S515, nfld = 5, ncls = 6, method = "R") ``` ```{r plot-ranklustering, fig.width=7, fig.height=5, message=FALSE, warning=FALSE} plot(result.Ranklustering, type = "Array") plot(result.Ranklustering, type = "FRP", nc = 2, nr = 3) plot(result.Ranklustering, type = "RRV") plot(result.Ranklustering, type = "RMP", students = 1:9, nc = 3, nr = 3) plot(result.Ranklustering, type = "LRD") ``` ## Finding Optimal Number of Classes and Fields ### Grid Search `GridSearch()` systematically evaluates multiple parameter combinations and selects the best-fitting model: ```{r grid-search, eval=FALSE} result <- GridSearch(J35S515, method = "R", max_ncls = 10, max_nfld = 10, index = "BIC") result$optimal_ncls result$optimal_nfld result$optimal_result ``` ### Infinite Relational Model (IRM) The IRM uses the Chinese Restaurant Process to automatically determine the optimal number of fields and classes: ```{r model-irm, eval=FALSE} result.IRM <- Biclustering_IRM(J35S515, gamma_c = 1, gamma_f = 1, verbose = TRUE) plot(result.IRM, type = "Array") plot(result.IRM, type = "FRP", nc = 3) plot(result.IRM, type = "TRP") ``` ## Biclustering for Polytomous Data ### Ordinal Data ```{r bic-ord, eval=FALSE} result.B.ord <- Biclustering(J35S500, ncls = 5, nfld = 5, method = "R") result.B.ord plot(result.B.ord, type = "Array") ``` FRP (Field Reference Profile) shows the expected score per field across latent ranks: ```{r bic-ord-frp, eval=FALSE} plot(result.B.ord, type = "FRP", nc = 3, nr = 2) ``` FCRP (Field Category Response Profile) shows category probabilities across ranks. The `style` parameter can be `"line"` or `"bar"`: ```{r bic-ord-fcrp, eval=FALSE} plot(result.B.ord, type = "FCRP", nc = 3, nr = 2) plot(result.B.ord, type = "FCRP", style = "bar", nc = 3, nr = 2) ``` FCBR (Field Cumulative Boundary Reference) shows cumulative boundary probabilities (ordinal only): ```{r bic-ord-fcbr, eval=FALSE} plot(result.B.ord, type = "FCBR", nc = 3, nr = 2) ``` ScoreField and RRV plots: ```{r bic-ord-scorefield, eval=FALSE} plot(result.B.ord, type = "ScoreField") plot(result.B.ord, type = "RRV") ``` ### Nominal Data ```{r bic-nom, eval=FALSE} result.B.nom <- Biclustering(J20S600, ncls = 5, nfld = 4) result.B.nom plot(result.B.nom, type = "Array") ``` Nominal Biclustering supports FRP, FCRP, ScoreField, and RRV (but not FCBR): ```{r bic-nom-plots, eval=FALSE} plot(result.B.nom, type = "FRP", nc = 2, nr = 2) plot(result.B.nom, type = "FCRP", nc = 2, nr = 2) plot(result.B.nom, type = "FCRP", style = "bar", nc = 2, nr = 2) plot(result.B.nom, type = "ScoreField") plot(result.B.nom, type = "RRV") ``` ### Rated Data (Multiple-Choice with Correct Answers) Rated data has both multiple response categories and correct answers. `Biclustering.rated()` internally runs nominal Biclustering, then sorts classes by correct response rate: ```{r bic-rated, eval=FALSE} result.B.rated <- Biclustering(J21S300, ncls = 5, nfld = 3, method = "R", maxiter = 300) result.B.rated plot(result.B.rated, type = "Array") plot(result.B.rated, type = "FRP", nc = 3, nr = 1) ``` Two layers of fit indices are reported: binary (with CFI/RMSEA) and nominal (AIC/BIC/CAIC only). Access them via `result$TestFitIndices` and `result$TestFitIndices_nominal`. ### Rated IRM The IRM also supports rated data: ```{r irm-rated, eval=FALSE} result.IRM.rated <- Biclustering_IRM(J21S300, gamma_c = 1, gamma_f = 1, verbose = TRUE) plot(result.IRM.rated, type = "Array") plot(result.IRM.rated, type = "FRP", nc = 3, nr = 1) ``` ## Distractor Analysis `DistractorAnalysis()` examines how examinees in each rank/class respond to each item's categories. It computes observed frequencies, proportions, chi-square statistics, p-values, and Cramer's V (effect size) for each item-by-rank cell. ```{r distractor, eval=FALSE} result.B.rated <- Biclustering(J21S300, ncls = 5, nfld = 3, method = "R", maxiter = 300) da <- DistractorAnalysis(result.B.rated) # Full output (grouped by field for Biclustering) print(da) # Filter by items and/or ranks print(da, items = 1:7, ranks = c(1, 5)) # Plot distractor bar charts plot(da, items = 1:6, nc = 3, nr = 2) ``` `DistractorAnalysis()` also works with `LRA()` results for rated data: ```{r distractor-lra, eval=FALSE} result.LRA.rated <- LRA(J21S300, nrank = 5, mic = TRUE) da_lra <- DistractorAnalysis(result.LRA.rated) print(da_lra, items = 1:3) plot(da_lra, items = 1:6, nc = 3, nr = 2) ``` ## Reference Shojima, K. (2022). *Test Data Engineering*. Springer.