Type: Package
Title: Cell Type Annotation Using Large Language Models
Version: 2.0.0
Author: Chen Yang [aut, cre, cph]
Maintainer: Chen Yang <cafferychen777@tamu.edu>
Description: Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models. Integrates with Seurat objects and provides uncertainty quantification for annotations. Supports various LLM providers including OpenAI, Anthropic, and Google. For details see Yang et al. (2025) <doi:10.1101/2025.04.10.647852>.
License: MIT + file LICENSE
BugReports: https://github.com/cafferychen777/mLLMCelltype/issues
URL: https://cafferyang.com/mLLMCelltype/
Encoding: UTF-8
Imports: dplyr, httr (≥ 1.4.0), jsonlite (≥ 1.7.0), R6 (≥ 2.5.0), digest (≥ 0.6.25), magrittr, stats, tools, utils
Suggests: knitr, rmarkdown, Seurat
RoxygenNote: 7.3.3
Config/build/clean-inst-doc: TRUE
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-02-08 02:07:33 UTC; apple
Repository: CRAN
Date/Publication: 2026-02-08 10:50:09 UTC

mLLMCelltype: Cell Type Annotation Using Large Language Models

Description

Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models. Integrates with Seurat objects and provides uncertainty quantification for annotations. Supports various LLM providers including OpenAI, Anthropic, and Google. For details see Yang et al. (2025) doi:10.1101/2025.04.10.647852.

Author(s)

Maintainer: Chen Yang cafferychen777@tamu.edu [copyright holder]

See Also

Useful links:


Package startup message

Description

Package startup message

Usage

.onAttach(libname, pkgname)

Package load message

Description

Package load message

Usage

.onLoad(libname, pkgname)

Qwen API Processor

Description

Concrete implementation of BaseAPIProcessor for Qwen models. Handles Qwen-specific API calls, authentication, and response parsing.

Usage

.qwen_endpoint_cache

Format

An object of class environment of length 0.


Anthropic API Processor

Description

Anthropic API Processor

Anthropic API Processor

Details

Concrete implementation of BaseAPIProcessor for Anthropic models. Handles Anthropic-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> AnthropicProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize Anthropic processor

Usage
AnthropicProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default Anthropic API URL

Usage
AnthropicProcessor$get_default_api_url()

Method make_api_call()

Make API call to Anthropic

Usage
AnthropicProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from Anthropic API response

Usage
AnthropicProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
AnthropicProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Base API Processor Class

Description

Base API Processor Class

Base API Processor Class

Details

Abstract base class for API processors that provides common functionality including unified logging, error handling, input processing, and response validation. This eliminates code duplication across all provider-specific processors.

Public fields

provider_name

Name of the API provider

logger

Unified logger instance

base_url

Custom base URL for API endpoints

Methods

Public methods


Method new()

Initialize the base API processor

Usage
BaseAPIProcessor$new(provider_name, base_url = NULL)

Method process_request()

Main entry point for processing API requests

Usage
BaseAPIProcessor$process_request(prompt, model, api_key)

Method get_api_url()

Get the API URL to use for requests

Usage
BaseAPIProcessor$get_api_url()

Method get_default_api_url()

Abstract method to be implemented by subclasses for getting default API URL

Usage
BaseAPIProcessor$get_default_api_url()

Method make_api_call()

Abstract method to be implemented by subclasses for making the actual API call

Usage
BaseAPIProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Abstract method to be implemented by subclasses for extracting content from response Make API call and extract response content

Usage
BaseAPIProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
BaseAPIProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Cache Manager Class

Description

Manages caching of consensus analysis results

Public fields

cache_dir

Directory to store cache files. Options:

  • NULL (default): Uses system cache directory

  • "local": Uses .mllmcelltype_cache in current directory

  • "temp": Uses temporary directory

  • Custom path: Any other string is used as directory path

cache_version

Current cache version

Methods

Public methods


Method new()

Initialize cache manager

Usage
CacheManager$new(cache_dir = NULL)

Method get_cache_dir()

Get actual cache directory path

Usage
CacheManager$get_cache_dir()

Method generate_key()

Generate cache key from input parameters (improved version)

Usage
CacheManager$generate_key(
  input,
  models,
  cluster_id,
  tissue_name = "",
  top_gene_count = 10
)

Method save_to_cache()

Save results to cache

Usage
CacheManager$save_to_cache(key, data)

Method load_from_cache()

Load results from cache

Usage
CacheManager$load_from_cache(key)

Method has_cache()

Check if results exist in cache

Usage
CacheManager$has_cache(key)

Method get_cache_stats()

Get cache statistics

Usage
CacheManager$get_cache_stats()

Method clear_cache()

Clear all cache

Usage
CacheManager$clear_cache(confirm = FALSE)

Method validate_cache()

Validate cache content Extract genes from input in a standardized way Create stable hash from genes list Create stable hash from models list Create stable hash from tissue_name and top_gene_count Create stable hash from cluster ID

Usage
CacheManager$validate_cache(key)

Method clone()

The objects of this class are cloneable with this method.

Usage
CacheManager$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DeepSeek API Processor

Description

DeepSeek API Processor

DeepSeek API Processor

Details

Concrete implementation of BaseAPIProcessor for DeepSeek models. Handles DeepSeek-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> DeepSeekProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize DeepSeek processor

Usage
DeepSeekProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default DeepSeek API URL

Usage
DeepSeekProcessor$get_default_api_url()

Method make_api_call()

Make API call to DeepSeek

Usage
DeepSeekProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from DeepSeek API response

Usage
DeepSeekProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
DeepSeekProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Gemini API Processor

Description

Gemini API Processor

Gemini API Processor

Details

Concrete implementation of BaseAPIProcessor for Gemini models. Handles Gemini-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> GeminiProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize Gemini processor

Usage
GeminiProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default Gemini API URL template

Usage
GeminiProcessor$get_default_api_url()

Method get_api_url_for_model()

Get API URL for specific model

Usage
GeminiProcessor$get_api_url_for_model(model)

Method make_api_call()

Make API call to Gemini

Usage
GeminiProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from Gemini API response

Usage
GeminiProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
GeminiProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Grok API Processor

Description

Grok API Processor

Grok API Processor

Details

Concrete implementation of BaseAPIProcessor for Grok models. Handles Grok-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> GrokProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize Grok processor

Usage
GrokProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default Grok API URL

Usage
GrokProcessor$get_default_api_url()

Method make_api_call()

Make API call to Grok

Usage
GrokProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from Grok API response

Usage
GrokProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
GrokProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Minimax API Processor

Description

Minimax API Processor

Minimax API Processor

Details

Concrete implementation of BaseAPIProcessor for Minimax models. Handles Minimax-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> MinimaxProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize Minimax processor

Usage
MinimaxProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default Minimax API URL

Usage
MinimaxProcessor$get_default_api_url()

Method make_api_call()

Make API call to Minimax

Usage
MinimaxProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from Minimax API response

Usage
MinimaxProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
MinimaxProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


OpenAI API Processor

Description

OpenAI API Processor

OpenAI API Processor

Details

Concrete implementation of BaseAPIProcessor for OpenAI models. Handles OpenAI-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> OpenAIProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize OpenAI processor

Usage
OpenAIProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default OpenAI API URL

Usage
OpenAIProcessor$get_default_api_url()

Method make_api_call()

Make API call to OpenAI

Usage
OpenAIProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from OpenAI API response

Usage
OpenAIProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
OpenAIProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


OpenRouter API Processor

Description

OpenRouter API Processor

OpenRouter API Processor

Details

Concrete implementation of BaseAPIProcessor for OpenRouter models. Handles OpenRouter-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> OpenRouterProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize OpenRouter processor

Usage
OpenRouterProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default OpenRouter API URL

Usage
OpenRouterProcessor$get_default_api_url()

Method make_api_call()

Make API call to OpenRouter

Usage
OpenRouterProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from OpenRouter API response

Usage
OpenRouterProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
OpenRouterProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


StepFun API Processor

Description

StepFun API Processor

StepFun API Processor

Details

Concrete implementation of BaseAPIProcessor for StepFun models. Handles StepFun-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> StepFunProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize StepFun processor

Usage
StepFunProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default StepFun API URL

Usage
StepFunProcessor$get_default_api_url()

Method make_api_call()

Make API call to StepFun

Usage
StepFunProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from StepFun API response

Usage
StepFunProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
StepFunProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Unified Logger for mLLMCelltype Package

Description

Unified Logger for mLLMCelltype Package

Unified Logger for mLLMCelltype Package

Details

This logger provides centralized, multi-level logging with structured output, log rotation, and performance monitoring capabilities.

Public fields

log_dir

Directory for storing log files

log_level

Current logging level

session_id

Unique identifier for the current session

max_log_size

Maximum log file size in MB (default: 10MB)

max_log_files

Maximum number of log files to keep (default: 5)

enable_console

Whether to output to console (default: TRUE)

enable_json

Whether to use JSON format (default: TRUE)

performance_stats

Performance monitoring statistics

Methods

Public methods


Method new()

Initialize the unified logger

Usage
UnifiedLogger$new(
  base_dir = "logs",
  level = "INFO",
  max_size = 10,
  max_files = 5,
  console_output = TRUE,
  json_format = TRUE
)

Method debug()

Log a debug message

Usage
UnifiedLogger$debug(message, context = NULL)

Method info()

Log an info message

Usage
UnifiedLogger$info(message, context = NULL)

Method warn()

Log a warning message

Usage
UnifiedLogger$warn(message, context = NULL)

Method error()

Log an error message

Usage
UnifiedLogger$error(message, context = NULL)

Method log_api_call()

Log API call performance

Usage
UnifiedLogger$log_api_call(
  provider,
  model,
  duration,
  success = TRUE,
  tokens = NULL
)

Method log_api_request_response()

Log complete API request and response for debugging and audit

Usage
UnifiedLogger$log_api_request_response(
  provider,
  model,
  prompt_content,
  response_content,
  request_metadata = NULL,
  response_metadata = NULL
)

Method log_cache_operation()

Log cache operations

Usage
UnifiedLogger$log_cache_operation(operation, key, size = NULL)

Method log_cluster_progress()

Log cluster annotation progress

Usage
UnifiedLogger$log_cluster_progress(cluster_id, stage, progress = NULL)

Method log_discussion()

Log detailed cluster discussion with complete model conversations

Usage
UnifiedLogger$log_discussion(cluster_id, event_type, data = NULL)

Method log_model_response()

Log model response with concise summary in main log and full text in file

Usage
UnifiedLogger$log_model_response(
  provider,
  model,
  response,
  stage = "annotation",
  cluster_id = NULL
)

Method get_performance_summary()

Get performance summary

Usage
UnifiedLogger$get_performance_summary()

Method cleanup_logs()

Clean up old log files

Usage
UnifiedLogger$cleanup_logs(force = FALSE)

Method set_level()

Set logging level

Usage
UnifiedLogger$set_level(level)

Method clone()

The objects of this class are cloneable with this method.

Usage
UnifiedLogger$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Zhipu API Processor

Description

Zhipu API Processor

Zhipu API Processor

Details

Concrete implementation of BaseAPIProcessor for Zhipu models. Handles Zhipu-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> ZhipuProcessor

Methods

Public methods

Inherited methods

Method new()

Initialize Zhipu processor

Usage
ZhipuProcessor$new(base_url = NULL)

Method get_default_api_url()

Get default Zhipu API URL

Usage
ZhipuProcessor$get_default_api_url()

Method make_api_call()

Make API call to Zhipu

Usage
ZhipuProcessor$make_api_call(chunk_content, model, api_key)

Method extract_response_content()

Extract response content from Zhipu API response

Usage
ZhipuProcessor$extract_response_content(response, model)

Method clone()

The objects of this class are cloneable with this method.

Usage
ZhipuProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Cell Type Annotation with Multi-LLM Framework

Description

A comprehensive function for automated cell type annotation using multiple Large Language Models (LLMs). This function supports both Seurat's differential gene expression results and custom gene lists as input. It implements a sophisticated annotation pipeline that leverages state-of-the-art LLMs to identify cell types based on marker gene expression patterns.

The API key can be provided directly or stored in environment variables:

# Direct API key
result <- annotate_cell_types(input, tissue_name, model="gpt-5.2",
                             api_key="sk-...")

# Using environment variables
Sys.setenv(OPENAI_API_KEY="sk-...")
Sys.setenv(ANTHROPIC_API_KEY="sk-ant-...")
Sys.setenv(OPENROUTER_API_KEY="sk-or-...")

# Then use with environment variables
result <- annotate_cell_types(input, tissue_name, model="claude-sonnet-4-5-20250929",
                             api_key=Sys.getenv("ANTHROPIC_API_KEY"))

If NA, returns the generated prompt without making an API call, which is useful for reviewing the prompt before sending it to the API. when input is from Seurat's FindAllMarkers(). Default: 10

Usage

annotate_cell_types(
  input,
  tissue_name,
  model = "gpt-5.2",
  api_key = NA,
  top_gene_count = 10,
  debug = FALSE,
  base_urls = NULL
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() containing columns 'cluster', 'gene', and 'avg_log2FC', or a list with 'genes' field for each cluster

tissue_name

Optional tissue context (e.g., 'human PBMC', 'mouse brain') for more accurate annotations

model

Model name to use. Default: 'gpt-5.2'. See details for supported models

api_key

API key for the selected model provider as a non-empty character scalar. If NA, returns prompt only.

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

debug

Logical indicating whether to enable debug output. Default: FALSE

base_urls

Optional base URLs for API endpoints. Can be a string or named list for custom endpoints

Value

When api_key is provided: Vector of cell type annotations per cluster. When api_key is NA: The generated prompt string

See Also

Examples

# Example 1: Using custom gene lists, returning prompt only (no API call)
annotate_cell_types(
  input = list(
    t_cells = list(genes = c('CD3D', 'CD3E', 'CD3G', 'CD28')),
    b_cells = list(genes = c('CD19', 'CD79A', 'CD79B', 'MS4A1')),
    monocytes = list(genes = c('CD14', 'CD68', 'CSF1R', 'FCGR3A'))
  ),
  tissue_name = 'human PBMC',
  model = 'gpt-5.2',
  api_key = NA  # Returns prompt only without making API call
)

# Example 2: Using with Seurat pipeline and OpenAI model
## Not run: 
library(Seurat)

# Load example data
data("pbmc_small")

# Find marker genes
all.markers <- FindAllMarkers(
  object = pbmc_small,
  only.pos = TRUE,
  min.pct = 0.25,
  logfc.threshold = 0.25
)

# Set API key in environment variable (recommended approach)
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")

# Get cell type annotations using OpenAI model
openai_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'gpt-5.2',
  api_key = Sys.getenv("OPENAI_API_KEY"),
  top_gene_count = 15
)

# Example 3: Using Anthropic Claude model
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")

claude_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'claude-opus-4-6-20260205',
  api_key = Sys.getenv("ANTHROPIC_API_KEY"),
  top_gene_count = 15
)

# Example 4: Using OpenRouter to access multiple models
Sys.setenv(OPENROUTER_API_KEY = "your-openrouter-api-key")

# Access OpenAI models through OpenRouter
openrouter_gpt4_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'openai/gpt-5.2',  # Note the provider/model format
  api_key = Sys.getenv("OPENROUTER_API_KEY"),
  top_gene_count = 15
)

# Access Anthropic models through OpenRouter
openrouter_claude_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'anthropic/claude-opus-4.6',  # Note the provider/model format
  api_key = Sys.getenv("OPENROUTER_API_KEY"),
  top_gene_count = 15
)

# Example 5: Using with mouse brain data
mouse_annotations <- annotate_cell_types(
  input = mouse_markers,  # Your mouse marker genes
  tissue_name = 'mouse brain',  # Specify correct tissue for context
  model = 'gpt-5.2',
  api_key = Sys.getenv("OPENAI_API_KEY"),
  top_gene_count = 20,  # Use more genes for complex tissues
  debug = TRUE  # Enable debug output
)

## End(Not run)


Calculate simple consensus without LLM

Description

Calculate simple consensus without LLM

Usage

calculate_simple_consensus(round_responses)

Check if consensus is reached among models

Description

Check if consensus is reached among models

Usage

check_consensus(
  round_responses,
  api_keys = NULL,
  controversy_threshold = 2/3,
  entropy_threshold = 1,
  consensus_check_model = NULL,
  base_urls = NULL
)

Note

This function uses create_consensus_check_prompt from prompt_templates.R


Clean annotation text by removing prefixes and extra whitespace

Description

Clean annotation text by removing prefixes and extra whitespace

Usage

clean_annotation(annotation)

Combine results from all phases of consensus annotation

Description

Combine results from all phases of consensus annotation

Usage

combine_results(initial_results, controversy_results, discussion_results)

Compare predictions from different models

Description

This function runs the same input through multiple models and compares their predictions. It provides both individual predictions and a consensus analysis.

Usage

compare_model_predictions(
  input,
  tissue_name,
  models = c("claude-opus-4-6-20260205", "gpt-5.2", "gemini-3-pro", "deepseek-r1",
    "o3-pro", "grok-4.1"),
  api_keys,
  top_gene_count = 10,
  consensus_threshold = 0.5,
  base_urls = NULL
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() containing columns 'cluster', 'gene', and 'avg_log2FC', or a list with 'genes' field for each cluster

tissue_name

Tissue context (e.g., 'human PBMC', 'mouse brain') for more accurate annotations

models

Vector of model names to use for comparison. Default includes top models from each provider

api_keys

Named list of API keys for the models, with provider or model names as keys. Every model in models must resolve to a non-NULL API key.

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

consensus_threshold

Minimum agreement threshold for consensus (0-1). Default: 0.5. Consensus is only evaluated when at least two non-missing model predictions are available for a cluster.

base_urls

Optional base URLs for API endpoints. Can be a string or named list for provider-specific custom endpoints.

Value

List containing individual model predictions and consensus analysis If a cluster has fewer than two valid predictions after alignment/padding, its consensus-related outputs are NA.

Note

This function uses create_standardization_prompt from prompt_templates.R Supported models:

  1. With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")

  2. With model names as keys: list("gpt-5" = "sk-...", "claude-sonnet-4-5-20250929" = "sk-ant-...")

The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:

api_keys <- list(
  "openai" = Sys.getenv("OPENAI_API_KEY"),
  "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
  "openrouter" = Sys.getenv("OPENROUTER_API_KEY"),
  "claude-opus-4-6-20260205" = "sk-ant-api03-specific-key-for-opus"
)

Examples

## Not run: 
# Compare predictions using different models
api_keys <- list(
  "claude-sonnet-4-5-20250929" = "your-anthropic-key",
  "deepseek-reasoner" = "your-deepseek-key",
  "gemini-3-pro" = "your-gemini-key",
  "qwen3-max" = "your-qwen-key"
)

results <- compare_model_predictions(
  input = list(gs1=c('CD4','CD3D'), gs2='CD14'),
  tissue_name = 'PBMC',
  api_keys = api_keys
)

## End(Not run)

Set global logger configuration

Description

Set global logger configuration

Usage

configure_logger(level = "INFO", console_output = TRUE, json_format = TRUE)

Arguments

level

Logging level: "DEBUG", "INFO", "WARN", or "ERROR". Default: "INFO"

console_output

Whether to enable console output. Default: TRUE

json_format

Whether to use JSON format for log messages. Default: TRUE

Value

Invisible logger object


Create prompt for cell type annotation

Description

Create prompt for cell type annotation

Usage

create_annotation_prompt(input, tissue_name, top_gene_count = 10)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() or a list for each cluster where each element is either a character vector of genes or a list containing a genes field Cluster IDs in named inputs are preserved as-is; unnamed list input receives sequential IDs starting at "0".

tissue_name

Tissue context for the annotation (e.g., 'human PBMC', 'mouse brain')

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

Value

Character string containing the formatted prompt


Create prompt for checking consensus among model predictions

Description

Create prompt for checking consensus among model predictions

Usage

create_consensus_check_prompt(
  round_responses,
  controversy_threshold = 2/3,
  entropy_threshold = 1
)

Create prompt for additional discussion rounds

Description

Create prompt for additional discussion rounds

Usage

create_discussion_prompt(
  cluster_id,
  cluster_genes,
  tissue_name,
  previous_rounds,
  round_number
)

Create prompt for the initial round of discussion

Description

Create prompt for the initial round of discussion

Usage

create_initial_discussion_prompt(
  cluster_id,
  cluster_genes,
  tissue_name,
  initial_predictions
)

Create prompt for standardizing cell type names

Description

Create prompt for standardizing cell type names

Usage

create_standardization_prompt(all_cell_types)

Custom model manager for mLLMCelltype

Description

This module provides functionality to register and manage custom LLM providers and models. It allows users to integrate their own LLM services with the mLLMCelltype framework.

Usage

custom_providers

Format

An object of class environment of length 0.


Execute consensus check with retry logic

Description

Execute consensus check with retry logic

Usage

execute_consensus_check(
  formatted_responses,
  api_keys,
  models_to_try,
  base_urls = NULL
)

Extract numeric value from line containing a label

Description

Extract numeric value from line containing a label

Usage

extract_labeled_value(lines, pattern, value_pattern)

Facilitate discussion for a controversial cluster

Description

Facilitate discussion for a controversial cluster

Usage

facilitate_cluster_discussion(
  cluster_id,
  input,
  tissue_name,
  models,
  api_keys,
  initial_predictions,
  top_gene_count,
  max_rounds = 3,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  consensus_check_model = NULL,
  base_urls = NULL
)

Note

This function uses create_initial_discussion_prompt and create_discussion_prompt from prompt_templates.R


Filter out error responses from model round responses

Description

Filter out error responses from model round responses

Usage

filter_valid_responses(responses, cluster_id, round = NULL)

Find majority prediction from response lines

Description

Find majority prediction from response lines

Usage

find_majority_prediction(lines)

Utility functions for API key management

Description

This file contains utility functions for managing API keys and related operations. Get API key for a specific model

Usage

get_api_key(model, api_keys)

Arguments

model

Model name to get API key for

api_keys

Named list of API keys with provider or model names as keys

Details

This function retrieves the appropriate API key for a given model by first checking the provider name and then the model name in the provided API keys list.

Value

API key string for the specified model


Get initial predictions from all models

Description

This function retrieves initial cell type predictions from all specified models. It is an internal helper function used by the interactive_consensus_annotation function.

Usage

get_initial_predictions(
  input,
  tissue_name,
  models,
  api_keys,
  top_gene_count,
  base_urls = NULL
)

Get the global logger instance

Description

Get the global logger instance

Usage

get_logger()

Get response from a specific model

Description

Get response from a specific model

Usage

get_model_response(prompt, model, api_key, base_urls = NULL)

Determine provider from model name

Description

This function determines the appropriate provider (e.g., OpenAI, Anthropic, Google, OpenRouter) based on the model name. Uses prefix-based matching for efficient and maintainable provider detection. New models following existing naming conventions are automatically supported.

Usage

get_provider(model)

Arguments

model

Character string specifying the model name (e.g., "gpt-5.2", "claude-sonnet-4.5").

Details

Supported providers and model prefixes:

Value

Character string of the provider name (e.g., "openai", "anthropic").


Identify controversial clusters based on consensus analysis

Description

Identify controversial clusters based on consensus analysis

Usage

identify_controversial_clusters(
  input,
  individual_predictions,
  controversy_threshold,
  entropy_threshold,
  api_keys,
  consensus_check_model = NULL,
  base_urls = NULL
)

Reinitialize global logger with a specific directory

Description

Preserves the current logger configuration (level, size, retention, console/json) while changing the log directory for a new annotation session.

Usage

initialize_logger(log_dir = "logs")

Arguments

log_dir

Directory for log files

Value

Invisible logger object


Interactive consensus building for cell type annotation

Description

This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:

  1. Initial voting by all LLMs

  2. Identification of controversial clusters

  3. Detailed discussion for controversial clusters

  4. Final summary by a designated LLM (default: Claude)

Usage

interactive_consensus_annotation(
  input,
  tissue_name,
  models = c("claude-opus-4-6-20260205", "gpt-5.2", "gemini-3-pro", "deepseek-r1",
    "grok-4.1"),
  api_keys,
  top_gene_count = 10,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  max_discussion_rounds = 3,
  consensus_check_model = NULL,
  log_dir = "logs",
  cache_dir = NULL,
  use_cache = TRUE,
  base_urls = NULL,
  clusters_to_analyze = NULL,
  force_rerun = FALSE
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'), or a list where each element is either a character vector of genes or a list containing a genes field.

tissue_name

Character string specifying the tissue type for context-aware cell type annotation (e.g., 'human PBMC', 'mouse brain'). Required.

models

Character vector of model names to use for consensus annotation. Minimum 2 models required. Supports models from OpenAI, Anthropic, DeepSeek, Google, Alibaba, Stepfun, Zhipu, MiniMax, X.AI, and OpenRouter.

api_keys

Named, non-empty list of API keys. Can use provider names as keys (e.g., "openai", "anthropic") or model names as keys (e.g., "gpt-5").

top_gene_count

Integer specifying the number of top marker genes to use for annotation per cluster (default: 10).

controversy_threshold

Numeric value between 0 and 1 for consensus proportion threshold. Clusters below this threshold are considered controversial (default: 0.7).

entropy_threshold

Numeric value for entropy threshold. Higher entropy indicates more disagreement among models (default: 1.0).

max_discussion_rounds

Integer specifying maximum number of discussion rounds for controversial clusters (default: 3).

consensus_check_model

Character string specifying which model to use for consensus checking. If NULL, uses the first model from the models list.

log_dir

Character scalar specifying directory for log files (default: "logs"). This function reinitializes the session logger with this directory at the start of each call.

cache_dir

Character string or NULL. Cache directory for storing results. NULL uses system cache, "local" uses current directory, "temp" uses temporary directory, or specify custom path.

use_cache

Logical indicating whether to use caching (default: TRUE).

base_urls

Named list or character string specifying custom API base URLs. Useful for proxies or alternative endpoints. If NULL, uses official endpoints.

clusters_to_analyze

Character or numeric vector specifying which clusters to analyze. If NULL (default), all clusters are analyzed.

force_rerun

Logical indicating whether to force rerun of all specified clusters, ignoring cache. Only affects controversial cluster discussions (default: FALSE).

Value

A list containing:


Get list of registered custom models

Description

Get list of registered custom models

Usage

list_custom_models()

Get list of registered custom providers

Description

Get list of registered custom providers

Usage

list_custom_providers()

Convenience functions for logging

Description

Convenience functions for logging

Usage

log_debug(message, context = NULL)

log_info(message, context = NULL)

log_warn(message, context = NULL)

log_error(message, context = NULL)

Arguments

message

Log message string

context

Optional context information (list or character)

Value

Invisible NULL


Get mLLMCelltype cache location

Description

Display the cache directory location

Usage

mllmcelltype_cache_dir(cache_dir = NULL)

Arguments

cache_dir

Cache directory specification. NULL uses system default, "local" uses current dir, "temp" uses temp dir, or custom path

Value

Invisible cache directory path

Examples

## Not run: 
mllmcelltype_cache_dir()
mllmcelltype_cache_dir("local")

## End(Not run)

Clear mLLMCelltype cache

Description

Clear the mLLMCelltype cache

Usage

mllmcelltype_clear_cache(cache_dir = NULL)

Arguments

cache_dir

Cache directory specification. NULL uses system default, "local" uses current dir, "temp" uses temp dir, or custom path

Value

Invisible NULL

Examples

## Not run: 
mllmcelltype_clear_cache()
mllmcelltype_clear_cache("local")

## End(Not run)

Normalize annotation for comparison

Description

Normalize annotation for comparison

Usage

normalize_annotation(annotation)

Prompt templates for mLLMCelltype

Description

This file contains all prompt template functions used in mLLMCelltype. These functions create various prompts for different stages of the cell type annotation process. Normalize list input into a canonical cluster->genes mapping

Usage

normalize_cluster_gene_list(input)

Arguments

input

List input for cluster annotation

Details

For list input, each element can be either:

  1. a list containing a genes field, or

  2. a character vector of genes.

Naming rules:

Value

Named list of character vectors (cluster_id -> genes)


Parse consensus response from model

Description

Parse consensus response from model

Usage

parse_consensus_response(response)

Parse flexible format consensus response

Description

Parse flexible format consensus response

Usage

parse_flexible_format(lines)

Parse standard 4-line consensus response format

Description

Parse standard 4-line consensus response format

Usage

parse_standard_format(result_lines)

Parse text-format model predictions into a named list

Description

Handles multiple output formats from LLMs:

Usage

parse_text_predictions(model_preds, all_clusters = NULL)

Arguments

model_preds

Character vector of prediction lines from a model

all_clusters

Optional character vector of cluster IDs for positional fallback

Value

Named list mapping cluster_id -> cell_type


Prepare list of models to try for consensus checking

Description

Prepare list of models to try for consensus checking

Usage

prepare_models_list(consensus_check_model = NULL)

Description

This function prints a detailed summary of the consensus building process, including initial predictions from all models, uncertainty metrics, and final consensus for each controversial cluster.

Usage

print_consensus_summary(results)

Details


Process request using Anthropic models

Description

Process request using Anthropic models

Usage

process_anthropic(prompt, model, api_key, base_url = NULL)

Process controversial clusters through discussion

Description

Process controversial clusters through discussion

Usage

process_controversial_clusters(
  controversial_clusters,
  input,
  tissue_name,
  successful_models,
  api_keys,
  individual_predictions,
  top_gene_count,
  controversy_threshold,
  entropy_threshold,
  max_discussion_rounds,
  cache_manager,
  use_cache,
  consensus_check_model = NULL,
  force_rerun = FALSE,
  base_urls = NULL
)

Process request using custom provider

Description

Process request using custom provider

Usage

process_custom(prompt, model, api_key)

Process request using DeepSeek models

Description

Process request using DeepSeek models

Usage

process_deepseek(prompt, model, api_key, base_url = NULL)

Process request using Gemini models

Description

Process request using Gemini models

Usage

process_gemini(prompt, model, api_key, base_url = NULL)

Process request using Grok models

Description

Process request using Grok models

Usage

process_grok(prompt, model, api_key, base_url = NULL)

Process request using MiniMax models

Description

Process request using MiniMax models

Usage

process_minimax(prompt, model, api_key, base_url = NULL)

Process request using OpenAI models

Description

Process request using OpenAI models

Usage

process_openai(prompt, model, api_key, base_url = NULL)

Process request using OpenRouter models

Description

Process request using OpenRouter models

Usage

process_openrouter(prompt, model, api_key, base_url = NULL)

Process request using Qwen models

Description

Process request using Qwen models

Usage

process_qwen(prompt, model, api_key, base_url = NULL)

Process request using StepFun models

Description

Process request using StepFun models

Usage

process_stepfun(prompt, model, api_key, base_url = NULL)

Process request using Zhipu models

Description

Process request using Zhipu models

Usage

process_zhipu(prompt, model, api_key, base_url = NULL)

Register a custom model for a provider

Description

Register a custom model for a provider

Usage

register_custom_model(model_name, provider_name, model_config = list())

Arguments

model_name

Unique name for the custom model

provider_name

Name of the provider this model belongs to

model_config

List of configuration parameters for the model (e.g., temperature, max_tokens)

Value

Invisible TRUE on success

Examples

## Not run: 
register_custom_model(
  model_name = "my_model",
  provider_name = "my_provider",
  model_config = list(
    temperature = 0.7,
    max_tokens = 2000
  )
)

## End(Not run)

Register a custom LLM provider

Description

Register a custom LLM provider

Usage

register_custom_provider(provider_name, process_fn, description = NULL)

Arguments

provider_name

Unique name for the custom provider

process_fn

Function that processes LLM requests. Must accept parameters: prompt, model, api_key

description

Optional description of the provider

Value

Invisible NULL

Examples

## Not run: 
register_custom_provider(
  provider_name = "my_provider",
  process_fn = function(prompt, model, api_key) {
    # Custom implementation
    response <- httr::POST(
      url = "your_api_endpoint",
      body = list(prompt = prompt),
      encode = "json"
    )
    return(httr::content(response)$choices[[1]]$text)
  }
)

## End(Not run)

URL Utilities for Base URL Resolution

Description

This file contains utility functions for resolving custom base URLs for different API providers. Resolve provider-specific base URL

Usage

resolve_provider_base_url(provider, base_urls)

Arguments

provider

Provider name (e.g., "openai", "anthropic")

base_urls

User-provided base URLs: NULL, a single string, or a named list

Details

This is the single entry point for all base URL resolution. It resolves the appropriate URL and normalizes it (strips trailing slashes).

Value

Resolved and normalized base URL, or NULL if not specified


Select the best prediction from consensus results

Description

Select the best prediction from consensus results

Usage

select_best_prediction(consensus_result, valid_predictions)

Standardize cell type names using a language model

Description

This function takes predictions from multiple models and standardizes the cell type nomenclature to ensure consistent naming across different models' outputs.

Usage

standardize_cell_type_names(
  predictions,
  models,
  api_keys,
  standardization_model = "claude-sonnet-4-20250514",
  base_urls = NULL
)

Details

  1. With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")

  2. With model names as keys: list("gpt-5" = "sk-...", "claude-sonnet-4-5-20250929" = "sk-ant-...")