Type: | Package |
Title: | R Library for 'Harmony' |
Version: | 0.3.1 |
Description: | 'Harmony' is a tool using AI which allows you to compare items from questionnaires and identify similar content. You can try 'Harmony' at https://harmonydata.ac.uk/app/ and you can read our blog at https://harmonydata.ac.uk/blog/ or at https://fastdatascience.com/how-does-harmony-work/. Documentation at https://harmonydata.ac.uk/harmony-r-released/. |
URL: | <https://harmonydata.ac.uk> |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Imports: | httr, uuid, base64enc, jsonlite, utils, tools, assertthat, purrr |
RoxygenNote: | 7.3.2 |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-20 15:41:18 UTC; omarhassoun |
Author: | Omar Hassoun [aut, cre], Thomas Wood [ctb], Alex, Nikic [ctb], Ulster University [cph] |
Maintainer: | Omar Hassoun <omtarful@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-20 16:10:01 UTC |
Create instrument from list
Description
This function creates an instrument from a list of questions.
Usage
create_instrument_from_list(
question_texts,
question_numbers = NULL,
instrument_name = "My instrument"
)
Arguments
question_texts |
A character vector of question texts. |
question_numbers |
A character vector of question numbers. If not provided, the question number will be the index of the question text. |
instrument_name |
A character string of the instrument name. |
Author(s)
Alex Nikic
Examples
instrument = create_instrument_from_list(
list("How old are you?",
"What is your gender?",
"What is your name?")
)
Generate Crosswalk Table Function
Description
A crosswalk is a table that lists matched variables from different studies or instruments, enabling data harmonization across datasets.
Usage
generate_crosswalk_table(
instruments,
similarity,
threshold,
is_allow_within_instrument_matches = FALSE,
is_enforce_one_to_one = FALSE
)
Arguments
instruments |
The original list of instruments, each containing a question. The sum of the number of questions in all instruments is the total number of questions which should equal both the width and height of the similarity matrix. |
similarity |
The cosine similarity matrix that is outputed from the |
threshold |
The minimum threshold that we consider a match. This is applied to the absolute match value. So if a question pair has similarity 0.2 and threshold = 0.5, then that question pair will be excluded. Leave as None if you don't want to apply any thresholding. |
is_allow_within_instrument_matches |
Defaults to False. If this is set to True, we include crosswalk items that originate from the same instrument, which would otherwise be excluded by default. |
is_enforce_one_to_one |
Defaults to False. If this is set to True, we force all variables in the crosswalk table to be matched with exactly one other variable. |
Details
This function generates a crosswalk table using a list of instruments and a similarity matrix,
produced by the match_instruments
function.
A crosswalk is a mapping between conceptually similar items (e.g., survey questions or variables) from different instruments. It is used to identify and align comparable variables across datasets that use different formats or wordings. This is especially useful in meta-analysis, data integration, and comparative research, where consistent constructs need to be analyzed across multiple sources.
The similarity matrix passed to this function is usually obtained from match_instruments
.
Value
A crosswalk table as a DataFrame.
Author(s)
Alex Nikic
Omar Hassoun
Examples
instrument_A = create_instrument_from_list(list(
"How old are you?",
"What is your gender?"
))
instrument_B = create_instrument_from_list(list(
"Do you smoke?"
))
instruments = list(instrument_A, instrument_B)
match_response = match_instruments(instruments)
instrument_list = match_response$instruments
similarity_matrix = match_response$matches
crosswalk_table.df = generate_crosswalk_table(
instrument_list, similarity_matrix, threshold = 0.7,
is_allow_within_instrument_matches = FALSE, is_enforce_one_to_one = TRUE
)
Retrieve Example Instruments from 'Harmony Data API'
Description
This function retrieves example instruments from the 'Harmony Data API' using an HTTP POST request.
Usage
get_example_instruments()
Value
A list representing example instruments retrieved from the 'Harmony Data API'.
Author(s)
Ulster University [cph]
Examples
# Load required libraries (httr) and call the function
require(httr)
instruments <- get_example_instruments()
# Print the retrieved JSON content
print(instruments)
Load Instruments from File
Description
This function loads instruments from a file specified by the path
parameter and sends the file content to an API for further processing.
It also accepts a URL leading to a file.
Usage
load_instruments_from_file(path)
Arguments
path |
The path to the file to load instruments from. |
Value
A list of instruments returned from the API.
Author(s)
Ulster University [cph]
Examples
# Load instruments from a PDF file
pdf_file <- "https://www.apa.org/depression-guideline/patient-health-questionnaire.pdf"
response <- load_instruments_from_file(pdf_file)
Match Instruments Function
Description
This function takes a list of instruments, converts it to a format acceptable by the database, and matches the instruments using the 'Harmony Data API'. It returns the matched instruments.
Usage
match_instruments(
instruments,
topics = list(),
is_negate = TRUE,
clustering_algorithm = "affinity_propagation"
)
Arguments
instruments |
A list of instruments to be matched. |
topics |
A list of topics with which to tag the questions. Default is empty. |
is_negate |
A boolean indicating whether to apply negation-based preprocessing. Default is TRUE. This option addresses a common limitation in large language model (LLM) embeddings, where antonyms (e.g., "happy" and "sad") may be treated as similar due to contextual overlap.
When When The Harmony API defaults to |
clustering_algorithm |
A string value to select the clustering algorithm to use. Must be one of: "affinity_propagation", "kmeans", "deterministic", "hdbscan". Default is "affinity_propagation". |
Value
A list containing the matched instruments retrieved from the Harmony Data API. The returned object includes attributes such as the similarity matrix, identified clusters, associated cluster topics, and other relevant metadata.
Author(s)
Ulster University [cph]
Examples
instrument_A <- create_instrument_from_list(list(
"How old are you?",
"What is your gender?"
))
instrument_B <- create_instrument_from_list(list(
"Do you smoke?"
))
instruments <- list(instrument_A, instrument_B)
matched_instruments <- match_instruments(
instruments,
topics = list("anxiety", "depression")
)
Set 'Harmony' URL
Description
This function sets the 'Harmony' API URL to be used in the package. By default, it uses the production 'Harmony' API ('https://api.harmonydata.ac.uk'), but you can override it by providing a different URL.
Usage
set_url(harmony_url = "https://api.harmonydata.ac.uk")
Arguments
harmony_url |
The 'Harmony' API URL to be set. (default: 'https://api.harmonydata.ac.uk') |
Details
The pkg.globals$url
variable is a global variable that holds the 'Harmony' API URL
to be used in the package. Once you set the URL using this function, it will be
used in all the relevant functions within the package.
Value
No return value, called for side effects.
Author(s)
Ulster University [cph]
Examples
set_url('https://staging.harmonydata.org')
set_url() # Will set the URL back to the default 'https://api.harmonydata.ac.uk'