Help for package mLLMCelltype

Type:

Package

Title:

Cell Type Annotation Using Large Language Models

Version:

2.0.0

Author:

Chen Yang [aut, cre, cph]

Maintainer:

Chen Yang <cafferychen777@tamu.edu>

Description:

Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models. Integrates with Seurat objects and provides uncertainty quantification for annotations. Supports various LLM providers including OpenAI, Anthropic, and Google. For details see Yang et al. (2025) <doi:10.1101/2025.04.10.647852>.

License:

MIT + file LICENSE

BugReports:

https://github.com/cafferychen777/mLLMCelltype/issues

URL:

https://cafferyang.com/mLLMCelltype/

Encoding:

UTF-8

Imports:

dplyr, httr (≥ 1.4.0), jsonlite (≥ 1.7.0), R6 (≥ 2.5.0), digest (≥ 0.6.25), magrittr, stats, tools, utils

Suggests:

knitr, rmarkdown, Seurat

RoxygenNote:

7.3.3

Config/build/clean-inst-doc:

TRUE

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-02-08 02:07:33 UTC; apple

Repository:

CRAN

Date/Publication:

2026-02-08 10:50:09 UTC

mLLMCelltype: Cell Type Annotation Using Large Language Models

Description

Author(s)

Maintainer: Chen Yang cafferychen777@tamu.edu [copyright holder]

Package startup message

Description

Package startup message

Usage

.onAttach(libname, pkgname)

Package load message

Description

Package load message

Usage

.onLoad(libname, pkgname)

Qwen API Processor

Description

Concrete implementation of BaseAPIProcessor for Qwen models. Handles Qwen-specific API calls, authentication, and response parsing.

Usage

.qwen_endpoint_cache

Format

An object of class environment of length 0.

Anthropic API Processor

Description

Anthropic API Processor

Details

Concrete implementation of BaseAPIProcessor for Anthropic models. Handles Anthropic-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> AnthropicProcessor

Methods

Public methods

AnthropicProcessor$new()
AnthropicProcessor$get_default_api_url()
AnthropicProcessor$make_api_call()
AnthropicProcessor$extract_response_content()
AnthropicProcessor$clone()

Inherited methods

Method `new()`

Initialize Anthropic processor

Usage

AnthropicProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default Anthropic API URL

Usage

AnthropicProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to Anthropic

Usage

AnthropicProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from Anthropic API response

Usage

AnthropicProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

AnthropicProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Base API Processor Class

Description

Base API Processor Class

Details

Abstract base class for API processors that provides common functionality including unified logging, error handling, input processing, and response validation. This eliminates code duplication across all provider-specific processors.

Public fields

provider_name: Name of the API provider
logger: Unified logger instance
base_url: Custom base URL for API endpoints

Methods

Public methods

BaseAPIProcessor$new()
BaseAPIProcessor$process_request()
BaseAPIProcessor$get_api_url()
BaseAPIProcessor$get_default_api_url()
BaseAPIProcessor$make_api_call()
BaseAPIProcessor$extract_response_content()
BaseAPIProcessor$clone()

Method `new()`

Initialize the base API processor

Usage

BaseAPIProcessor$new(provider_name, base_url = NULL)

Method `process_request()`

Main entry point for processing API requests

Usage

BaseAPIProcessor$process_request(prompt, model, api_key)

Method `get_api_url()`

Get the API URL to use for requests

Usage

BaseAPIProcessor$get_api_url()

Method `get_default_api_url()`

Abstract method to be implemented by subclasses for getting default API URL

Usage

BaseAPIProcessor$get_default_api_url()

Method `make_api_call()`

Abstract method to be implemented by subclasses for making the actual API call

Usage

BaseAPIProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Abstract method to be implemented by subclasses for extracting content from response Make API call and extract response content

Usage

BaseAPIProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BaseAPIProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Cache Manager Class

Description

Manages caching of consensus analysis results

Public fields

cache_dir

Directory to store cache files. Options:

NULL (default): Uses system cache directory
"local": Uses .mllmcelltype_cache in current directory
"temp": Uses temporary directory
Custom path: Any other string is used as directory path

cache_version

Current cache version

Methods

Method `new()`

Initialize cache manager

NULL (default): Uses system cache directory via tools::R_user_dir()
"local": Uses .mllmcelltype_cache in current directory
"temp": Uses temporary directory (cleared on R restart)
Custom path: Any other string is used as directory path

Usage

CacheManager$new(cache_dir = NULL)

Method `get_cache_dir()`

Get actual cache directory path

Usage

CacheManager$get_cache_dir()

Method `generate_key()`

Generate cache key from input parameters (improved version)

Usage

CacheManager$generate_key(
  input,
  models,
  cluster_id,
  tissue_name = "",
  top_gene_count = 10
)

Method `save_to_cache()`

Save results to cache

Usage

CacheManager$save_to_cache(key, data)

Method `load_from_cache()`

Load results from cache

Usage

CacheManager$load_from_cache(key)

Method `has_cache()`

Check if results exist in cache

Usage

CacheManager$has_cache(key)

Method `get_cache_stats()`

Get cache statistics

Usage

CacheManager$get_cache_stats()

Method `clear_cache()`

Clear all cache

Usage

CacheManager$clear_cache(confirm = FALSE)

Method `validate_cache()`

Validate cache content Extract genes from input in a standardized way Create stable hash from genes list Create stable hash from models list Create stable hash from tissue_name and top_gene_count Create stable hash from cluster ID

Usage

CacheManager$validate_cache(key)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

CacheManager$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

DeepSeek API Processor

Description

DeepSeek API Processor

Details

Concrete implementation of BaseAPIProcessor for DeepSeek models. Handles DeepSeek-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> DeepSeekProcessor

Methods

Public methods

DeepSeekProcessor$new()
DeepSeekProcessor$get_default_api_url()
DeepSeekProcessor$make_api_call()
DeepSeekProcessor$extract_response_content()
DeepSeekProcessor$clone()

Inherited methods

Method `new()`

Initialize DeepSeek processor

Usage

DeepSeekProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default DeepSeek API URL

Usage

DeepSeekProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to DeepSeek

Usage

DeepSeekProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from DeepSeek API response

Usage

DeepSeekProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

DeepSeekProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Gemini API Processor

Description

Gemini API Processor

Details

Concrete implementation of BaseAPIProcessor for Gemini models. Handles Gemini-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> GeminiProcessor

Methods

Public methods

GeminiProcessor$new()
GeminiProcessor$get_default_api_url()
GeminiProcessor$get_api_url_for_model()
GeminiProcessor$make_api_call()
GeminiProcessor$extract_response_content()
GeminiProcessor$clone()

Inherited methods

Method `new()`

Initialize Gemini processor

Usage

GeminiProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default Gemini API URL template

Usage

GeminiProcessor$get_default_api_url()

Method `get_api_url_for_model()`

Get API URL for specific model

Usage

GeminiProcessor$get_api_url_for_model(model)

Method `make_api_call()`

Make API call to Gemini

Usage

GeminiProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from Gemini API response

Usage

GeminiProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GeminiProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Grok API Processor

Description

Grok API Processor

Details

Concrete implementation of BaseAPIProcessor for Grok models. Handles Grok-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> GrokProcessor

Methods

Public methods

GrokProcessor$new()
GrokProcessor$get_default_api_url()
GrokProcessor$make_api_call()
GrokProcessor$extract_response_content()
GrokProcessor$clone()

Inherited methods

Method `new()`

Initialize Grok processor

Usage

GrokProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default Grok API URL

Usage

GrokProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to Grok

Usage

GrokProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from Grok API response

Usage

GrokProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GrokProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Minimax API Processor

Description

Minimax API Processor

Details

Concrete implementation of BaseAPIProcessor for Minimax models. Handles Minimax-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> MinimaxProcessor

Methods

Public methods

MinimaxProcessor$new()
MinimaxProcessor$get_default_api_url()
MinimaxProcessor$make_api_call()
MinimaxProcessor$extract_response_content()
MinimaxProcessor$clone()

Inherited methods

Method `new()`

Initialize Minimax processor

Usage

MinimaxProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default Minimax API URL

Usage

MinimaxProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to Minimax

Usage

MinimaxProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from Minimax API response

Usage

MinimaxProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

MinimaxProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

OpenAI API Processor

Description

OpenAI API Processor

Details

Concrete implementation of BaseAPIProcessor for OpenAI models. Handles OpenAI-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> OpenAIProcessor

Methods

Public methods

OpenAIProcessor$new()
OpenAIProcessor$get_default_api_url()
OpenAIProcessor$make_api_call()
OpenAIProcessor$extract_response_content()
OpenAIProcessor$clone()

Inherited methods

Method `new()`

Initialize OpenAI processor

Usage

OpenAIProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default OpenAI API URL

Usage

OpenAIProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to OpenAI

Usage

OpenAIProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from OpenAI API response

Usage

OpenAIProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

OpenAIProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

OpenRouter API Processor

Description

OpenRouter API Processor

Details

Concrete implementation of BaseAPIProcessor for OpenRouter models. Handles OpenRouter-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> OpenRouterProcessor

Methods

Public methods

OpenRouterProcessor$new()
OpenRouterProcessor$get_default_api_url()
OpenRouterProcessor$make_api_call()
OpenRouterProcessor$extract_response_content()
OpenRouterProcessor$clone()

Inherited methods

Method `new()`

Initialize OpenRouter processor

Usage

OpenRouterProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default OpenRouter API URL

Usage

OpenRouterProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to OpenRouter

Usage

OpenRouterProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from OpenRouter API response

Usage

OpenRouterProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

OpenRouterProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

StepFun API Processor

Description

StepFun API Processor

Details

Concrete implementation of BaseAPIProcessor for StepFun models. Handles StepFun-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> StepFunProcessor

Methods

Public methods

StepFunProcessor$new()
StepFunProcessor$get_default_api_url()
StepFunProcessor$make_api_call()
StepFunProcessor$extract_response_content()
StepFunProcessor$clone()

Inherited methods

Method `new()`

Initialize StepFun processor

Usage

StepFunProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default StepFun API URL

Usage

StepFunProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to StepFun

Usage

StepFunProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from StepFun API response

Usage

StepFunProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

StepFunProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Unified Logger for mLLMCelltype Package

Description

Unified Logger for mLLMCelltype Package

Details

This logger provides centralized, multi-level logging with structured output, log rotation, and performance monitoring capabilities.

Public fields

log_dir: Directory for storing log files
log_level: Current logging level
session_id: Unique identifier for the current session
max_log_size: Maximum log file size in MB (default: 10MB)
max_log_files: Maximum number of log files to keep (default: 5)
enable_console: Whether to output to console (default: TRUE)
enable_json: Whether to use JSON format (default: TRUE)
performance_stats: Performance monitoring statistics

Methods

Public methods

UnifiedLogger$new()
UnifiedLogger$debug()
UnifiedLogger$info()
UnifiedLogger$warn()
UnifiedLogger$error()
UnifiedLogger$log_api_call()
UnifiedLogger$log_api_request_response()
UnifiedLogger$log_cache_operation()
UnifiedLogger$log_cluster_progress()
UnifiedLogger$log_discussion()
UnifiedLogger$log_model_response()
UnifiedLogger$get_performance_summary()
UnifiedLogger$cleanup_logs()
UnifiedLogger$set_level()
UnifiedLogger$clone()

Method `new()`

Initialize the unified logger

Usage

UnifiedLogger$new(
  base_dir = "logs",
  level = "INFO",
  max_size = 10,
  max_files = 5,
  console_output = TRUE,
  json_format = TRUE
)

Method `debug()`

Log a debug message

Usage

UnifiedLogger$debug(message, context = NULL)

Method `info()`

Log an info message

Usage

UnifiedLogger$info(message, context = NULL)

Method `warn()`

Log a warning message

Usage

UnifiedLogger$warn(message, context = NULL)

Method `error()`

Log an error message

Usage

UnifiedLogger$error(message, context = NULL)

Method `log_api_call()`

Log API call performance

Usage

UnifiedLogger$log_api_call(
  provider,
  model,
  duration,
  success = TRUE,
  tokens = NULL
)

Method `log_api_request_response()`

Log complete API request and response for debugging and audit

Usage

UnifiedLogger$log_api_request_response(
  provider,
  model,
  prompt_content,
  response_content,
  request_metadata = NULL,
  response_metadata = NULL
)

Method `log_cache_operation()`

Log cache operations

Usage

UnifiedLogger$log_cache_operation(operation, key, size = NULL)

Method `log_cluster_progress()`

Log cluster annotation progress

Usage

UnifiedLogger$log_cluster_progress(cluster_id, stage, progress = NULL)

Method `log_discussion()`

Log detailed cluster discussion with complete model conversations

Usage

UnifiedLogger$log_discussion(cluster_id, event_type, data = NULL)

Method `log_model_response()`

Log model response with concise summary in main log and full text in file

Usage

UnifiedLogger$log_model_response(
  provider,
  model,
  response,
  stage = "annotation",
  cluster_id = NULL
)

Method `get_performance_summary()`

Get performance summary

Usage

UnifiedLogger$get_performance_summary()

Method `cleanup_logs()`

Clean up old log files

Usage

UnifiedLogger$cleanup_logs(force = FALSE)

Method `set_level()`

Set logging level

Usage

UnifiedLogger$set_level(level)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

UnifiedLogger$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Zhipu API Processor

Description

Zhipu API Processor

Details

Concrete implementation of BaseAPIProcessor for Zhipu models. Handles Zhipu-specific API calls, authentication, and response parsing.

Super class

mLLMCelltype::BaseAPIProcessor -> ZhipuProcessor

Methods

Public methods

ZhipuProcessor$new()
ZhipuProcessor$get_default_api_url()
ZhipuProcessor$make_api_call()
ZhipuProcessor$extract_response_content()
ZhipuProcessor$clone()

Inherited methods

Method `new()`

Initialize Zhipu processor

Usage

ZhipuProcessor$new(base_url = NULL)

Method `get_default_api_url()`

Get default Zhipu API URL

Usage

ZhipuProcessor$get_default_api_url()

Method `make_api_call()`

Make API call to Zhipu

Usage

ZhipuProcessor$make_api_call(chunk_content, model, api_key)

Method `extract_response_content()`

Extract response content from Zhipu API response

Usage

ZhipuProcessor$extract_response_content(response, model)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ZhipuProcessor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Cell Type Annotation with Multi-LLM Framework

Description

A comprehensive function for automated cell type annotation using multiple Large Language Models (LLMs). This function supports both Seurat's differential gene expression results and custom gene lists as input. It implements a sophisticated annotation pipeline that leverages state-of-the-art LLMs to identify cell types based on marker gene expression patterns.

A data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'). The function will select the top genes based on avg_log2FC for each cluster.
A list where each element has a 'genes' field containing marker genes for a cluster. This can be in one of these formats:
- Named with cluster IDs: list("0" = list(genes = c(...)), "1" = list(genes = c(...)))
- Named with cell type names: list(t_cells = list(genes = c(...)), b_cells = list(genes = c(...)))
- Unnamed list: list(list(genes = c(...)), list(genes = c(...)))
Cluster IDs are preserved as-is. The function does not modify or re-index cluster IDs. 'mouse brain'). This helps provide context for more accurate annotations.
OpenAI: 'gpt-5.2', 'gpt-5.1', 'gpt-5', 'gpt-4.1', 'gpt-4o', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'
Anthropic: 'claude-opus-4-6-20260205', 'claude-opus-4-5-20251101', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001', 'claude-opus-4-1-20250805', 'claude-sonnet-4-20250514', 'claude-3-7-sonnet-20250219'
DeepSeek: 'deepseek-chat', 'deepseek-reasoner', 'deepseek-r1'
Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash'
Alibaba: 'qwen3-max', 'qwen-max-2025-01-25', 'qwen-plus'
Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'
Zhipu: 'glm-4.7', 'glm-4-plus'
MiniMax: 'minimax-m2.1', 'minimax-m2', 'MiniMax-Text-01'
X.AI: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
- OpenAI models: 'openai/gpt-5.2', 'openai/gpt-5', 'openai/o3-pro', 'openai/o4-mini'
- Anthropic models: 'anthropic/claude-opus-4.5', 'anthropic/claude-sonnet-4.5', 'anthropic/claude-haiku-4.5'
- Meta models: 'meta-llama/llama-4-maverick', 'meta-llama/llama-4-scout', 'meta-llama/llama-3.3-70b-instruct'
- Google models: 'google/gemini-3-pro', 'google/gemini-3-flash', 'google/gemini-2.5-pro'
- Mistral models: 'mistralai/mistral-large', 'mistralai/magistral-medium-2506'
- Other models: 'deepseek/deepseek-r1', 'deepseek/deepseek-chat-v3.1', 'microsoft/mai-ds-r1' Each provider requires a specific API key format and authentication method:
OpenAI: "sk-..." (obtain from OpenAI platform)
Anthropic: "sk-ant-..." (obtain from Anthropic console)
Google: A Google API key for Gemini models (obtain from Google AI)
DeepSeek: API key from DeepSeek platform
Qwen: API key from Alibaba Cloud
Stepfun: API key from Stepfun AI
Zhipu: API key from Zhipu AI
MiniMax: API key from MiniMax
X.AI: API key for Grok models
OpenRouter: "sk-or-..." (obtain from OpenRouter) OpenRouter provides access to multiple models through a single API key

The API key can be provided directly or stored in environment variables:

# Direct API key
result <- annotate_cell_types(input, tissue_name, model="gpt-5.2",
                             api_key="sk-...")

# Using environment variables
Sys.setenv(OPENAI_API_KEY="sk-...")
Sys.setenv(ANTHROPIC_API_KEY="sk-ant-...")
Sys.setenv(OPENROUTER_API_KEY="sk-or-...")

# Then use with environment variables
result <- annotate_cell_types(input, tissue_name, model="claude-sonnet-4-5-20250929",
                             api_key=Sys.getenv("ANTHROPIC_API_KEY"))

If NA, returns the generated prompt without making an API call, which is useful for reviewing the prompt before sending it to the API. when input is from Seurat's FindAllMarkers(). Default: 10

A single character string: Applied to all providers (e.g., "https://api.proxy.com/v1")
A named list: Provider-specific URLs (e.g., list(openai = "https://openai-proxy.com/v1", anthropic = "https://anthropic-proxy.com/v1")). This is useful for:
- Users accessing international APIs through proxies
- Enterprise users with internal API gateways
- Development/testing with local or alternative endpoints If NULL (default), uses official API endpoints for each provider.

Usage

annotate_cell_types(
  input,
  tissue_name,
  model = "gpt-5.2",
  api_key = NA,
  top_gene_count = 10,
  debug = FALSE,
  base_urls = NULL
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() containing columns 'cluster', 'gene', and 'avg_log2FC', or a list with 'genes' field for each cluster

tissue_name

Optional tissue context (e.g., 'human PBMC', 'mouse brain') for more accurate annotations

model

Model name to use. Default: 'gpt-5.2'. See details for supported models

api_key

API key for the selected model provider as a non-empty character scalar. If NA, returns prompt only.

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

debug

Logical indicating whether to enable debug output. Default: FALSE

base_urls

Optional base URLs for API endpoints. Can be a string or named list for custom endpoints

Value

When api_key is provided: Vector of cell type annotations per cluster. When api_key is NA: The generated prompt string

Examples

# Example 1: Using custom gene lists, returning prompt only (no API call)
annotate_cell_types(
  input = list(
    t_cells = list(genes = c('CD3D', 'CD3E', 'CD3G', 'CD28')),
    b_cells = list(genes = c('CD19', 'CD79A', 'CD79B', 'MS4A1')),
    monocytes = list(genes = c('CD14', 'CD68', 'CSF1R', 'FCGR3A'))
  ),
  tissue_name = 'human PBMC',
  model = 'gpt-5.2',
  api_key = NA  # Returns prompt only without making API call
)

# Example 2: Using with Seurat pipeline and OpenAI model
## Not run: 
library(Seurat)

# Load example data
data("pbmc_small")

# Find marker genes
all.markers <- FindAllMarkers(
  object = pbmc_small,
  only.pos = TRUE,
  min.pct = 0.25,
  logfc.threshold = 0.25
)

# Set API key in environment variable (recommended approach)
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")

# Get cell type annotations using OpenAI model
openai_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'gpt-5.2',
  api_key = Sys.getenv("OPENAI_API_KEY"),
  top_gene_count = 15
)

# Example 3: Using Anthropic Claude model
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")

claude_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'claude-opus-4-6-20260205',
  api_key = Sys.getenv("ANTHROPIC_API_KEY"),
  top_gene_count = 15
)

# Example 4: Using OpenRouter to access multiple models
Sys.setenv(OPENROUTER_API_KEY = "your-openrouter-api-key")

# Access OpenAI models through OpenRouter
openrouter_gpt4_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'openai/gpt-5.2',  # Note the provider/model format
  api_key = Sys.getenv("OPENROUTER_API_KEY"),
  top_gene_count = 15
)

# Access Anthropic models through OpenRouter
openrouter_claude_annotations <- annotate_cell_types(
  input = all.markers,
  tissue_name = 'human PBMC',
  model = 'anthropic/claude-opus-4.6',  # Note the provider/model format
  api_key = Sys.getenv("OPENROUTER_API_KEY"),
  top_gene_count = 15
)

# Example 5: Using with mouse brain data
mouse_annotations <- annotate_cell_types(
  input = mouse_markers,  # Your mouse marker genes
  tissue_name = 'mouse brain',  # Specify correct tissue for context
  model = 'gpt-5.2',
  api_key = Sys.getenv("OPENAI_API_KEY"),
  top_gene_count = 20,  # Use more genes for complex tissues
  debug = TRUE  # Enable debug output
)

## End(Not run)

Calculate simple consensus without LLM

Description

Calculate simple consensus without LLM

Usage

calculate_simple_consensus(round_responses)

Check if consensus is reached among models

Description

Check if consensus is reached among models

Usage

check_consensus(
  round_responses,
  api_keys = NULL,
  controversy_threshold = 2/3,
  entropy_threshold = 1,
  consensus_check_model = NULL,
  base_urls = NULL
)

Note

This function uses create_consensus_check_prompt from prompt_templates.R

Clean annotation text by removing prefixes and extra whitespace

Description

Clean annotation text by removing prefixes and extra whitespace

Usage

clean_annotation(annotation)

Combine results from all phases of consensus annotation

Description

Combine results from all phases of consensus annotation

Usage

combine_results(initial_results, controversy_results, discussion_results)

Compare predictions from different models

Description

This function runs the same input through multiple models and compares their predictions. It provides both individual predictions and a consensus analysis.

Usage

compare_model_predictions(
  input,
  tissue_name,
  models = c("claude-opus-4-6-20260205", "gpt-5.2", "gemini-3-pro", "deepseek-r1",
    "o3-pro", "grok-4.1"),
  api_keys,
  top_gene_count = 10,
  consensus_threshold = 0.5,
  base_urls = NULL
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() containing columns 'cluster', 'gene', and 'avg_log2FC', or a list with 'genes' field for each cluster

tissue_name

Tissue context (e.g., 'human PBMC', 'mouse brain') for more accurate annotations

models

Vector of model names to use for comparison. Default includes top models from each provider

api_keys

Named list of API keys for the models, with provider or model names as keys. Every model in models must resolve to a non-NULL API key.

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

consensus_threshold

Minimum agreement threshold for consensus (0-1). Default: 0.5. Consensus is only evaluated when at least two non-missing model predictions are available for a cluster.

base_urls

Optional base URLs for API endpoints. Can be a string or named list for provider-specific custom endpoints.

Value

List containing individual model predictions and consensus analysis If a cluster has fewer than two valid predictions after alignment/padding, its consensus-related outputs are NA.

Note

This function uses create_standardization_prompt from prompt_templates.R Supported models:

OpenAI: 'gpt-5.2', 'gpt-5.1', 'gpt-5', 'gpt-4.1', 'gpt-4o', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'
Anthropic: 'claude-opus-4-6-20260205', 'claude-opus-4-5-20251101', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001', 'claude-opus-4-1-20250805', 'claude-sonnet-4-20250514', 'claude-3-7-sonnet-20250219'
DeepSeek: 'deepseek-chat', 'deepseek-reasoner', 'deepseek-r1'
Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash'
Alibaba: 'qwen3-max', 'qwen-max-2025-01-25', 'qwen-plus'
Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'
Zhipu: 'glm-4.7', 'glm-4-plus'
MiniMax: 'minimax-m2.1', 'minimax-m2', 'MiniMax-Text-01'
X.AI: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
- OpenAI models: 'openai/gpt-5.2', 'openai/gpt-5', 'openai/o3-pro', 'openai/o4-mini'
- Anthropic models: 'anthropic/claude-opus-4.5', 'anthropic/claude-sonnet-4.5', 'anthropic/claude-haiku-4.5'
- Meta models: 'meta-llama/llama-4-maverick', 'meta-llama/llama-4-scout', 'meta-llama/llama-3.3-70b-instruct'
- Google models: 'google/gemini-3-pro', 'google/gemini-3-flash', 'google/gemini-2.5-pro'
- Mistral models: 'mistralai/mistral-large', 'mistralai/magistral-medium-2506'
- Other models: 'deepseek/deepseek-r1', 'deepseek/deepseek-chat-v3.1', 'microsoft/mai-ds-r1'

With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")
With model names as keys: list("gpt-5" = "sk-...", "claude-sonnet-4-5-20250929" = "sk-ant-...")

The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:

api_keys <- list(
  "openai" = Sys.getenv("OPENAI_API_KEY"),
  "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
  "openrouter" = Sys.getenv("OPENROUTER_API_KEY"),
  "claude-opus-4-6-20260205" = "sk-ant-api03-specific-key-for-opus"
)

Examples

## Not run: 
# Compare predictions using different models
api_keys <- list(
  "claude-sonnet-4-5-20250929" = "your-anthropic-key",
  "deepseek-reasoner" = "your-deepseek-key",
  "gemini-3-pro" = "your-gemini-key",
  "qwen3-max" = "your-qwen-key"
)

results <- compare_model_predictions(
  input = list(gs1=c('CD4','CD3D'), gs2='CD14'),
  tissue_name = 'PBMC',
  api_keys = api_keys
)

## End(Not run)

Set global logger configuration

Description

Set global logger configuration

Usage

configure_logger(level = "INFO", console_output = TRUE, json_format = TRUE)

Arguments

level

Logging level: "DEBUG", "INFO", "WARN", or "ERROR". Default: "INFO"

console_output

Whether to enable console output. Default: TRUE

json_format

Whether to use JSON format for log messages. Default: TRUE

Value

Invisible logger object

Create prompt for cell type annotation

Description

Create prompt for cell type annotation

Usage

create_annotation_prompt(input, tissue_name, top_gene_count = 10)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() or a list for each cluster where each element is either a character vector of genes or a list containing a genes field Cluster IDs in named inputs are preserved as-is; unnamed list input receives sequential IDs starting at "0".

tissue_name

Tissue context for the annotation (e.g., 'human PBMC', 'mouse brain')

top_gene_count

Number of top genes to use per cluster when input is from Seurat. Default: 10

Value

Character string containing the formatted prompt

Create prompt for checking consensus among model predictions

Description

Create prompt for checking consensus among model predictions

Usage

create_consensus_check_prompt(
  round_responses,
  controversy_threshold = 2/3,
  entropy_threshold = 1
)

Create prompt for additional discussion rounds

Description

Create prompt for additional discussion rounds

Usage

create_discussion_prompt(
  cluster_id,
  cluster_genes,
  tissue_name,
  previous_rounds,
  round_number
)

Create prompt for the initial round of discussion

Description

Create prompt for the initial round of discussion

Usage

create_initial_discussion_prompt(
  cluster_id,
  cluster_genes,
  tissue_name,
  initial_predictions
)

Create prompt for standardizing cell type names

Description

Create prompt for standardizing cell type names

Usage

create_standardization_prompt(all_cell_types)

Custom model manager for mLLMCelltype

Description

This module provides functionality to register and manage custom LLM providers and models. It allows users to integrate their own LLM services with the mLLMCelltype framework.

Usage

custom_providers

Format

An object of class environment of length 0.

Execute consensus check with retry logic

Description

Execute consensus check with retry logic

Usage

execute_consensus_check(
  formatted_responses,
  api_keys,
  models_to_try,
  base_urls = NULL
)

Extract numeric value from line containing a label

Description

Extract numeric value from line containing a label

Usage

extract_labeled_value(lines, pattern, value_pattern)

Facilitate discussion for a controversial cluster

Description

Facilitate discussion for a controversial cluster

Usage

facilitate_cluster_discussion(
  cluster_id,
  input,
  tissue_name,
  models,
  api_keys,
  initial_predictions,
  top_gene_count,
  max_rounds = 3,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  consensus_check_model = NULL,
  base_urls = NULL
)

Note

This function uses create_initial_discussion_prompt and create_discussion_prompt from prompt_templates.R

Filter out error responses from model round responses

Description

Filter out error responses from model round responses

Usage

filter_valid_responses(responses, cluster_id, round = NULL)

Find majority prediction from response lines

Description

Find majority prediction from response lines

Usage

find_majority_prediction(lines)

Utility functions for API key management

Description

This file contains utility functions for managing API keys and related operations. Get API key for a specific model

Usage

get_api_key(model, api_keys)

Arguments

model

Model name to get API key for

api_keys

Named list of API keys with provider or model names as keys

Details

This function retrieves the appropriate API key for a given model by first checking the provider name and then the model name in the provided API keys list.

Value

API key string for the specified model

Get initial predictions from all models

Description

This function retrieves initial cell type predictions from all specified models. It is an internal helper function used by the interactive_consensus_annotation function.

Usage

get_initial_predictions(
  input,
  tissue_name,
  models,
  api_keys,
  top_gene_count,
  base_urls = NULL
)

Get the global logger instance

Description

Get the global logger instance

Usage

get_logger()

Get response from a specific model

Description

Get response from a specific model

Usage

get_model_response(prompt, model, api_key, base_urls = NULL)

Determine provider from model name

Description

This function determines the appropriate provider (e.g., OpenAI, Anthropic, Google, OpenRouter) based on the model name. Uses prefix-based matching for efficient and maintainable provider detection. New models following existing naming conventions are automatically supported.

Usage

get_provider(model)

Arguments

model

Character string specifying the model name (e.g., "gpt-5.2", "claude-sonnet-4.5").

Details

Supported providers and model prefixes:

OpenAI: gpt-, o1, o3*, o4*, chatgpt-, codex- (e.g., 'gpt-5.2', 'o3-pro', 'o4-mini')
Anthropic: claude-* (e.g., 'claude-opus-4.6', 'claude-sonnet-4.5')
DeepSeek: deepseek-* (e.g., 'deepseek-chat', 'deepseek-r1')
Google: gemini-* (e.g., 'gemini-3-pro', 'gemini-2.5-flash')
Qwen: qwen*, qwq-* (e.g., 'qwen3-max', 'qwq-32b')
Stepfun: step-* (e.g., 'step-2-mini', 'step-2-16k')
Zhipu: glm-, chatglm (e.g., 'glm-4.7', 'glm-4-plus')
MiniMax: minimax-* (e.g., 'minimax-m2.1', 'minimax-m1')
Grok: grok-* (e.g., 'grok-4', 'grok-4-heavy')
OpenRouter: Any model with '/' in the name (e.g., 'openai/gpt-5.2', 'anthropic/claude-sonnet-4.5')

Value

Character string of the provider name (e.g., "openai", "anthropic").

Identify controversial clusters based on consensus analysis

Description

Identify controversial clusters based on consensus analysis

Usage

identify_controversial_clusters(
  input,
  individual_predictions,
  controversy_threshold,
  entropy_threshold,
  api_keys,
  consensus_check_model = NULL,
  base_urls = NULL
)

Reinitialize global logger with a specific directory

Description

Preserves the current logger configuration (level, size, retention, console/json) while changing the log directory for a new annotation session.

Usage

initialize_logger(log_dir = "logs")

Arguments

log_dir

Directory for log files

Value

Invisible logger object

Interactive consensus building for cell type annotation

Description

This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:

Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)

Usage

interactive_consensus_annotation(
  input,
  tissue_name,
  models = c("claude-opus-4-6-20260205", "gpt-5.2", "gemini-3-pro", "deepseek-r1",
    "grok-4.1"),
  api_keys,
  top_gene_count = 10,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  max_discussion_rounds = 3,
  consensus_check_model = NULL,
  log_dir = "logs",
  cache_dir = NULL,
  use_cache = TRUE,
  base_urls = NULL,
  clusters_to_analyze = NULL,
  force_rerun = FALSE
)

Arguments

input

Either a data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'), or a list where each element is either a character vector of genes or a list containing a genes field.

tissue_name

Character string specifying the tissue type for context-aware cell type annotation (e.g., 'human PBMC', 'mouse brain'). Required.

models

Character vector of model names to use for consensus annotation. Minimum 2 models required. Supports models from OpenAI, Anthropic, DeepSeek, Google, Alibaba, Stepfun, Zhipu, MiniMax, X.AI, and OpenRouter.

api_keys

Named, non-empty list of API keys. Can use provider names as keys (e.g., "openai", "anthropic") or model names as keys (e.g., "gpt-5").

top_gene_count

Integer specifying the number of top marker genes to use for annotation per cluster (default: 10).

controversy_threshold

Numeric value between 0 and 1 for consensus proportion threshold. Clusters below this threshold are considered controversial (default: 0.7).

entropy_threshold

Numeric value for entropy threshold. Higher entropy indicates more disagreement among models (default: 1.0).

max_discussion_rounds

Integer specifying maximum number of discussion rounds for controversial clusters (default: 3).

consensus_check_model

Character string specifying which model to use for consensus checking. If NULL, uses the first model from the models list.

log_dir

Character scalar specifying directory for log files (default: "logs"). This function reinitializes the session logger with this directory at the start of each call.

cache_dir

Character string or NULL. Cache directory for storing results. NULL uses system cache, "local" uses current directory, "temp" uses temporary directory, or specify custom path.

use_cache

Logical indicating whether to use caching (default: TRUE).

base_urls

Named list or character string specifying custom API base URLs. Useful for proxies or alternative endpoints. If NULL, uses official endpoints.

clusters_to_analyze

Character or numeric vector specifying which clusters to analyze. If NULL (default), all clusters are analyzed.

force_rerun

Logical indicating whether to force rerun of all specified clusters, ignoring cache. Only affects controversial cluster discussions (default: FALSE).

Value

A list containing:

initial_results: Initial voting results, consensus checks, and controversial cluster IDs
final_annotations: Final annotations keyed by cluster ID
controversial_clusters: Clusters identified as controversial
discussion_logs: Detailed discussion logs for controversial clusters
session_id: Logger session identifier
voting_results: Backward-compatible alias of initial_results
discussion_results: Backward-compatible alias of discussion_logs
final_consensus: Backward-compatible alias of final_annotations

Get list of registered custom models

Description

Get list of registered custom models

Usage

list_custom_models()

Get list of registered custom providers

Description

Get list of registered custom providers

Usage

list_custom_providers()

Convenience functions for logging

Description

Convenience functions for logging

Usage

log_debug(message, context = NULL)

log_info(message, context = NULL)

log_warn(message, context = NULL)

log_error(message, context = NULL)

Arguments

message

Log message string

context

Optional context information (list or character)

Value

Invisible NULL

Get mLLMCelltype cache location

Description

Display the cache directory location

Usage

mllmcelltype_cache_dir(cache_dir = NULL)

Arguments

cache_dir

Cache directory specification. NULL uses system default, "local" uses current dir, "temp" uses temp dir, or custom path

Value

Invisible cache directory path

Examples

## Not run: 
mllmcelltype_cache_dir()
mllmcelltype_cache_dir("local")

## End(Not run)

Clear mLLMCelltype cache

Description

Clear the mLLMCelltype cache

Usage

mllmcelltype_clear_cache(cache_dir = NULL)

Arguments

cache_dir

Cache directory specification. NULL uses system default, "local" uses current dir, "temp" uses temp dir, or custom path

Value

Invisible NULL

Examples

## Not run: 
mllmcelltype_clear_cache()
mllmcelltype_clear_cache("local")

## End(Not run)

Normalize annotation for comparison

Description

Normalize annotation for comparison

Usage

normalize_annotation(annotation)

Prompt templates for mLLMCelltype

Description

This file contains all prompt template functions used in mLLMCelltype. These functions create various prompts for different stages of the cell type annotation process. Normalize list input into a canonical cluster->genes mapping

Usage

normalize_cluster_gene_list(input)

Arguments

input

List input for cluster annotation

Details

For list input, each element can be either:

a list containing a genes field, or
a character vector of genes.

Naming rules:

unnamed lists are assigned 0-based IDs ("0", "1", ...)
numeric names are preserved as-is (e.g., "1", "2", "3" stays unchanged)
non-numeric names are preserved as-is

Value

Named list of character vectors (cluster_id -> genes)

Parse consensus response from model

Description

Parse consensus response from model

Usage

parse_consensus_response(response)

Parse flexible format consensus response

Description

Parse flexible format consensus response

Usage

parse_flexible_format(lines)

Parse standard 4-line consensus response format

Description

Parse standard 4-line consensus response format

Usage

parse_standard_format(result_lines)

Parse text-format model predictions into a named list

Description

Handles multiple output formats from LLMs:

"cluster_id: cell_type" format
"1. cell_type" numeric index format
Positional fallback (line index maps to cluster index)

Usage

parse_text_predictions(model_preds, all_clusters = NULL)

Arguments

model_preds

Character vector of prediction lines from a model

all_clusters

Optional character vector of cluster IDs for positional fallback

Value

Named list mapping cluster_id -> cell_type

Prepare list of models to try for consensus checking

Description

Prepare list of models to try for consensus checking

Usage

prepare_models_list(consensus_check_model = NULL)

Print summary of consensus results

Description

This function prints a detailed summary of the consensus building process, including initial predictions from all models, uncertainty metrics, and final consensus for each controversial cluster.

Usage

print_consensus_summary(results)

Details

initial_results: A list containing individual_predictions, consensus_results, and controversial_clusters
final_annotations: A list of final cell type annotations for each cluster
controversial_clusters: A character vector of cluster IDs that were controversial
discussion_logs: A list of discussion logs for each controversial cluster

Process request using Anthropic models

Description

Process request using Anthropic models

Usage

process_anthropic(prompt, model, api_key, base_url = NULL)

Process controversial clusters through discussion

Description

Process controversial clusters through discussion

Usage

process_controversial_clusters(
  controversial_clusters,
  input,
  tissue_name,
  successful_models,
  api_keys,
  individual_predictions,
  top_gene_count,
  controversy_threshold,
  entropy_threshold,
  max_discussion_rounds,
  cache_manager,
  use_cache,
  consensus_check_model = NULL,
  force_rerun = FALSE,
  base_urls = NULL
)

Process request using custom provider

Description

Process request using custom provider

Usage

process_custom(prompt, model, api_key)

Process request using DeepSeek models

Description

Process request using DeepSeek models

Usage

process_deepseek(prompt, model, api_key, base_url = NULL)

Process request using Gemini models

Description

Process request using Gemini models

Usage

process_gemini(prompt, model, api_key, base_url = NULL)

Process request using Grok models

Description

Process request using Grok models

Usage

process_grok(prompt, model, api_key, base_url = NULL)

Process request using MiniMax models

Description

Process request using MiniMax models

Usage

process_minimax(prompt, model, api_key, base_url = NULL)

Process request using OpenAI models

Description

Process request using OpenAI models

Usage

process_openai(prompt, model, api_key, base_url = NULL)

Process request using OpenRouter models

Description

Process request using OpenRouter models

Usage

process_openrouter(prompt, model, api_key, base_url = NULL)

Process request using Qwen models

Description

Process request using Qwen models

Usage

process_qwen(prompt, model, api_key, base_url = NULL)

Process request using StepFun models

Description

Process request using StepFun models

Usage

process_stepfun(prompt, model, api_key, base_url = NULL)

Process request using Zhipu models

Description

Process request using Zhipu models

Usage

process_zhipu(prompt, model, api_key, base_url = NULL)

Register a custom model for a provider

Description

Usage

register_custom_model(model_name, provider_name, model_config = list())

Arguments

model_name

Unique name for the custom model

provider_name

Name of the provider this model belongs to

model_config

List of configuration parameters for the model (e.g., temperature, max_tokens)

Value

Invisible TRUE on success

Examples

## Not run: 
register_custom_model(
  model_name = "my_model",
  provider_name = "my_provider",
  model_config = list(
    temperature = 0.7,
    max_tokens = 2000
  )
)

## End(Not run)

Register a custom LLM provider

Description

Usage

register_custom_provider(provider_name, process_fn, description = NULL)

Arguments

provider_name

Unique name for the custom provider

process_fn

Function that processes LLM requests. Must accept parameters: prompt, model, api_key

description

Optional description of the provider

Value

Invisible NULL

Examples

## Not run: 
register_custom_provider(
  provider_name = "my_provider",
  process_fn = function(prompt, model, api_key) {
    # Custom implementation
    response <- httr::POST(
      url = "your_api_endpoint",
      body = list(prompt = prompt),
      encode = "json"
    )
    return(httr::content(response)$choices[[1]]$text)
  }
)

## End(Not run)

URL Utilities for Base URL Resolution

Description

This file contains utility functions for resolving custom base URLs for different API providers. Resolve provider-specific base URL

Usage

resolve_provider_base_url(provider, base_urls)

Arguments

provider

Provider name (e.g., "openai", "anthropic")

base_urls

User-provided base URLs: NULL, a single string, or a named list

Details

This is the single entry point for all base URL resolution. It resolves the appropriate URL and normalizes it (strips trailing slashes).

Value

Resolved and normalized base URL, or NULL if not specified

Select the best prediction from consensus results

Description

Select the best prediction from consensus results

Usage

select_best_prediction(consensus_result, valid_predictions)

Standardize cell type names using a language model

Description

This function takes predictions from multiple models and standardizes the cell type nomenclature to ensure consistent naming across different models' outputs.

Usage

standardize_cell_type_names(
  predictions,
  models,
  api_keys,
  standardization_model = "claude-sonnet-4-20250514",
  base_urls = NULL
)

Details

With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")
With model names as keys: list("gpt-5" = "sk-...", "claude-sonnet-4-5-20250929" = "sk-ant-...")

mLLMCelltype: Cell Type Annotation Using Large Language Models

Description

Author(s)

See Also

Package startup message

Description

Usage

Package load message

Description

Usage

Qwen API Processor

Description

Usage

Format

Anthropic API Processor

Description

Details

Super class

Methods

Public methods

Method new()

Usage

Method get_default_api_url()

Usage

Method make_api_call()

Usage

Method extract_response_content()

Usage

Method clone()

Usage

Arguments

Base API Processor Class

Description

Details

Public fields

Methods

Public methods

Method new()

Usage

Method process_request()

Usage

Method get_api_url()

Usage

Method get_default_api_url()

Usage

Method make_api_call()

Usage

Method extract_response_content()

Usage

Method clone()

Usage

Arguments

Cache Manager Class

Description

Public fields

Methods

Public methods

Method new()

Usage

Method get_cache_dir()

Usage

Method generate_key()

Usage

Method save_to_cache()

Usage

Method load_from_cache()

Usage

Method has_cache()

Usage

Method get_cache_stats()

Usage

Method clear_cache()

Usage

Method validate_cache()

Usage

Method clone()

Usage

Arguments

DeepSeek API Processor

Description

Method `new()`

Method `get_default_api_url()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`

Method `new()`

Method `process_request()`

Method `get_api_url()`

Method `get_default_api_url()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`

Method `new()`

Method `get_cache_dir()`

Method `generate_key()`

Method `save_to_cache()`

Method `load_from_cache()`

Method `has_cache()`

Method `get_cache_stats()`

Method `clear_cache()`

Method `validate_cache()`

Method `clone()`

Method `new()`

Method `get_default_api_url()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`

Method `new()`

Method `get_default_api_url()`

Method `get_api_url_for_model()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`

Method `new()`

Method `get_default_api_url()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`

Method `new()`

Method `get_default_api_url()`

Method `make_api_call()`

Method `extract_response_content()`

Method `clone()`