--- title: "Extended Installation Guide" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Extended Installation Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The *text*-package provides users access to HuggingFace Transformers in R through [reticulate](https://rstudio.github.io/reticulate/), which interfaces with [Python](https://www.python.org/). It relies on Python packages such as [torch](https://pytorch.org/) and [transformers](https://huggingface.co/docs/transformers/index).

To use these features, you need to install: • The text package in R • A Python environment with the required packages

The easiest way is to use `textrpp_install()` to install a preconfigured Conda environment and then initialize it with `textrpp_initialize()`. ## Set up a python environment for the text-package ```{r extended_installation_condaenv, eval = FALSE} library(text) library(reticulate) # Install text-required Python packages in a Conda environment text::textrpp_install() # Show available Conda environments reticulate::conda_list() # Initialize the installed Conda environment text::textrpp_initialize(save_profile = TRUE) # Test that textEmbed works textEmbed("hello") ``` ## Troubleshooting ### 1. Check if you have install permissions Can you install an R package like dplyr? ```{r install_other_pacakge, eval = FALSE} install.packages("dplyr") ``` Can you install system-level tools like Python/miniconda? ```{r reticulate_installation, eval = FALSE} library(reticulate) reticulate::install_miniconda() ``` If you do not have premissions, please contact your administrator for advice. ### 2. Remembered to initialize the python environment. After restarting R, functions like textEmbed() can stop working again. Solution: Persist the initialization in your R profile: ```{r initialise, eval = FALSE} text::textrpp_initialize( condaenv = "textrpp_condaenv", refresh_settings = TRUE, save_profile = TRUE, ) ``` ### 3. Install the development version from GitHub. ```{r install_github, eval = FALSE} # install.packages("devtools") devtools::install_github("oscarkjell/text") ``` ### 4. Force reinstallation of the environment ```{r textrpp_install(), eval = FALSE} library(text) text::textrpp_install( update_conda = TRUE, force_conda = TRUE ) ``` ### 5. Install the python environment using reticulate See the article [Installing and Managing Python Environments with `reticulate`]( https://www.r-text.org/articles/reticulate.html), for detailed information. ### 6. Inspect diagnostic information If something isn't working right, it is a good start to examine what is installed and running on your system. ```{r textDiagnostics(), eval = FALSE} library(text) log <- text::textDiagnostics() log ``` Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our [GitHub Actions](https://github.com/OscarKjell/text/actions). If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations. To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested. ## Virtual environments It is also possible to use virtual environments (although it is currently only tested on MacOS). ```{r extended_installation_venv, eval = FALSE} # Create a virtual environment with text required python packages. # Note that you have to provide a python path. text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"), python_path = "/usr/local/bin/python3.9", envname = "textrpp_virtualenv") # Initialize the virtual environment. text::textrpp_initialize(virtualenv = "textrpp_virtualenv", condaenv = NULL, save_profile = TRUE) ``` ## Solving OMP errors and R/Rstudio crashes Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers. Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS: ``` Sys.setenv(OMP_NUM_THREADS = "1") Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1") Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") ``` These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently. The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting: ``` Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE") library(text) ``` The exact way to install these packages may differ across systems. Please see:\ [Python](https://www.python.org/)\ [torch](https://pytorch.org/)\ [transformers](https://huggingface.co/docs/transformers/index) ## Share advise *If you find a good solution please feel free to email oscar [ d_o t] kjell [a_t] psy [DOT] lu [d_o_t]se so that we can update above instructions.*