% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/oolong.R, R/oolong_ui.R
\name{create_oolong}
\alias{create_oolong}
\alias{wi}
\alias{witi}
\alias{ti}
\alias{wsi}
\alias{gs}
\title{Generate an oolong test}
\usage{
create_oolong(
  input_model = NULL,
  input_corpus = NULL,
  n_top_terms = 5,
  bottom_terms_percentile = 0.6,
  exact_n = NULL,
  frac = 0.01,
  n_top_topics = 3,
  n_topiclabel_words = 8,
  use_frex_words = FALSE,
  difficulty = 1,
  input_dfm = NULL,
  construct = "positive",
  btm_dataframe = NULL,
  n_correct_ws = 3,
  wsi_n_top_terms = 20,
  userid = NA,
  type = "witi"
)

wi(
  input_model = NULL,
  userid = NA,
  n_top_terms = 5,
  bottom_terms_percentile = 0.6,
  difficulty = 1,
  use_frex_words = FALSE
)

witi(
  input_model = NULL,
  input_corpus = NULL,
  userid = NA,
  n_top_terms = 5,
  bottom_terms_percentile = 0.6,
  exact_n = NULL,
  frac = 0.01,
  n_top_topics = 3,
  n_topiclabel_words = 8,
  use_frex_words = FALSE,
  difficulty = 1,
  input_dfm = NULL,
  btm_dataframe = NULL
)

ti(
  input_model = NULL,
  input_corpus = NULL,
  userid = NA,
  exact_n = NULL,
  frac = 0.01,
  n_top_topics = 3,
  n_topiclabel_words = 8,
  use_frex_words = FALSE,
  difficulty = 1,
  input_dfm = NULL,
  btm_dataframe = NULL
)

wsi(
  input_model = NULL,
  userid = NA,
  n_topiclabel_words = 4,
  n_correct_ws = 3,
  wsi_n_top_terms = 20,
  difficulty = 1,
  use_frex_words = FALSE
)

gs(
  input_corpus = NULL,
  userid = NA,
  construct = "positive",
  exact_n = NULL,
  frac = 0.01
)
}
\arguments{
\item{input_model}{(wi, ti, witi, wsi) a STM, WarpLDA, topicmodels, KeyATM, seededlda, textmodel_nb, or BTM object; if it is NULL, create_oolong assumes that you want to create gold standard.}

\item{input_corpus}{(wi, ti, witi, wsi, gs) if input_model is not null, it should be the corpus (character vector or quanteda::corpus object) to generate the model object. If input_model and input_corpus are not NULL, topic intrusion test cases are generated. If input_model is a BTM object, this argument is ignored. If input_model is null, it generates gold standard test cases.}

\item{n_top_terms}{(wi, witi) integer, number of top topic words to be included in the candidates of word intrusion test.}

\item{bottom_terms_percentile}{(wi, witi) double, a term is considered to be an word intruder when its theta less than the percentile of this theta, must be within the range of 0 to 1}

\item{exact_n}{(ti, witi, gs) integer, number of topic intrusion test cases to generate, ignore if frac is not NULL}

\item{frac}{(ti, witi, gs) double, fraction of test cases to be generated from the corpus}

\item{n_top_topics}{(wi, witi) integer, number of most relevant topics to be shown alongside the intruder topic}

\item{n_topiclabel_words}{(witi, ti, wsi) integer, number of topic words to be shown as the topic ("ti" and "witi") / word set ("wsi") label}

\item{use_frex_words}{(wi, witi, ti, wsi) logical, for a STM object, use FREX words if TRUE, use PROB words if FALSE}

\item{difficulty}{(wi, witi, ti, wsi) double, adjust the difficulty of the test. Higher value indicates higher difficulty and must be within the range of 0 to 1, no effect for STM if use_frex_words is FALSE. Ignore for topicmodels objects}

\item{input_dfm}{(wi, witi, ti, wsi) a dfm object used for training the input_model, if input_model is a WarpLDA object}

\item{construct}{(gs) string, an adjective to describe the construct you want your coders to code the the gold standard test cases}

\item{btm_dataframe}{(witi, ti) dataframe used for training the input_model, if input_model is a BTM object}

\item{n_correct_ws}{(wsi) number of word sets to be shown alongside the intruder word set}

\item{wsi_n_top_terms}{(wsi) number of top topic words from each topic to be randomized selected as the word set label}

\item{userid}{a character string to denote the name of the coder. Default to NA (no userid); not recommended}

\item{type}{(create_oolong) a character string to denote what you want to create. "wi": word intrusion test; "ti": topic intrusion test; "witi": both word intrusion test and topic intrusion test; "gs": gold standard generation}
}
\value{
an oolong test object.
}
\description{
\code{create_oolong} generates an oolong test object that can either be used for validating a topic model or for creating ground truth (gold standard) of a text corpus. \code{wi} (word intrusion test), \code{ti} (topic intrusion test), \code{witi} (word and topic intrusion tests), \code{wsi} (word set intrusion test) and \code{gs} are handy wrappers to \code{create_oolong}. It is recommended to use these wrappers instead of \code{create_oolong}.
}
\section{Usage}{


Use \code{wi}, \code{ti}, \code{witi}, \code{wsi} or \code{gs} to generate an oolong test of your choice. It is recommended to supply also \code{userid} (current coder).
The names of the tests (word intrusion test and topic intrusion test) follow Chang et al (2009). In Ying et al. (forthcoming), topic intrusion test is named "T8WSI" (Top 8 Word Set Intrusion). Word set intrusion test in this package is actually the "R4WSI" (Random 4 Word Set Intrusion) in Lu et al (forthcoming). The default settings of \code{wi}, \code{witi}, and \code{ti} follow Chang et al (2009), e.g. \code{n_top_terms} = 5; instead of \code{n_top_terms} = 4 as in Lu et al (forthcoming). The default setting of \code{wsi} follows Ying et al. (forthcoming), e.g. \code{n_topiclabel_words} = 4.
As suggested by Song et al. (2020), 1% of the articles from \code{input_corpus} are randomly selected as the test cases of both \code{ti} and \code{gs}, i.e. \code{frac} = 0.01. However, it is generally believed that this proportion is dependent of the size of \code{input_corpus}, e.g. it does not make sense to draw 1% of the articles from only 100 articles. Use \code{exact_n} in these cases.
}

\section{About create_oolong}{


Because \code{create_oolong} is not intuitive to use, it is no longer recommended to use \code{create_oolong} to generate oolong test. \code{create_oolong} is retained only for backward compatibility purposes. This function generates an oolong test object based on \code{input_model} and \code{input_corpus}. If \code{input_model} is not NULL, it generates oolong test for a topic model (tm). If \code{input_model} is NULL but input_corpus is not NULL, it generates oolong test for generating gold standard (gs).
}

\section{Methods}{

An oolong object, depends on its purpose, has the following methods:
\describe{
  \item{\code{$do_word_intrusion_test()}}{(tm) launch the shiny-based word intrusion test. The coder should find out the intruder word that is not related to other words.}
  \item{\code{$do_topic_intrusion_test()}}{(tm) launch the shiny-based topic intrusion test. The coder should find out the intruder topic that is least likely to be the topic of the document.}
  \item{\code{$do_word_set_intrusion_test()}}{(tm) launch the shiny-based word set intrusion test. The coder should find out the intruder word set that is not related to other word sets.}
  \item{\code{$do_gold_standard_test()}}{(gs) launch the shiny-based test for generating gold standard. The coder should determine the level of the predetermined constructs with a 5-point Likert scale.}
  \item{\code{$lock(force = FALSE)}}{(gs/tm) lock the object so that it cannot be changed anymore. It enables \code{\link{summarize_oolong}} and the following method.}
  \item{\code{$turn_gold()}}{(gs) convert the oolong object into a quanteda compatible corpus.}
}
For more details, please see the overview vignette: \code{vignette("overview", package = "oolong")}
}

\examples{
## Creation of oolong test with only word intrusion test
data(abstracts_keyatm)
data(abstracts)
oolong_test <- wi(input_model = abstracts_keyatm, userid = "Hadley")
## Creation of oolong test with both word intrusion test and topic intrusion test
oolong_test <- witi(input_model = abstracts_keyatm, input_corpus = abstracts$text, userid = "Julia")
## Creation of oolong test with topic intrusion test
oolong_test <- ti(input_model = abstracts_keyatm, input_corpus = abstracts$text, userid = "Jenny")
## Creation of oolong test with word set intrusion test
oolong_test <- wsi(input_model = abstracts_keyatm, userid = "Garrett")
## Creation of gold standard
oolong_test <- gs(input_corpus = trump2k, userid = "Yihui")
## Using create_oolong(); not recommended
oolong_test <- create_oolong(input_model = abstracts_keyatm,
input_corpus = abstracts$text, userid = "JJ")
oolong_test <- create_oolong(input_model = abstracts_keyatm,
input_corpus = abstracts$text, userid = "Mara", type = "ti")
oolong_test <- create_oolong(input_corpus = abstracts$text, userid = "Winston", type = "gs")
}
\references{
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296).

  Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication.

  Ying, L., Montgomery, J. M., & Stewart, B. M. (Forthcoming). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis
}
\author{
Chung-hong Chan, Marius Sältzer
}
