Type: Package
Title: Native R Implementation of an Efficient BLAST-Like Algorithm
Version: 1.0.7
Date: 2023-08-22
Maintainer: Manu Tamminen <mavatam@utu.fi>
Description: Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) <doi:10.1101/399782>.
License: BSD_3_clause + file LICENSE
Imports: Rcpp (≥ 1.0.5)
LinkingTo: Rcpp
SystemRequirements: C++
RoxygenNote: 7.2.3
URL: https://github.com/tamminenlab/blaster
NeedsCompilation: yes
Packaged: 2023-08-22 13:12:19 UTC; mavatam
Author: Manu Tamminen ORCID iD [aut, cre], Timothy Julian ORCID iD [aut], Aditya Jeevennavar ORCID iD [aut], Steven Schmid [aut]
Repository: CRAN
Date/Publication: 2023-08-22 14:40:09 UTC

blaster: Native R Implementation of an Efficient BLAST-Like Algorithm

Description

Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) doi:10.1101/399782.

Author(s)

Maintainer: Manu Tamminen mavatam@utu.fi (ORCID)

Authors:

See Also

Useful links:


Runs BLAST sequence comparison algorithm.

Description

Runs BLAST sequence comparison algorithm.

Usage

blast(
  query,
  db,
  maxAccepts = 1,
  maxRejects = 16,
  minIdentity = 0.75,
  alphabet = "nucleotide",
  strand = "both",
  output_to_file = FALSE
)

Arguments

query

A dataframe of the query sequences (containing Id and Seq columns) or a string specifying the FASTA file of the query sequences.

db

A dataframe of the database sequences (containing Id and Seq columns) or a string specifying the FASTA file of the database sequences.

maxAccepts

A number specifying the maximum accepted hits.

maxRejects

A number specifying the maximum rejected hits.

minIdentity

A number specifying the minimal accepted sequence similarity between the query and hit sequences.

alphabet

A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'.

strand

A string specifying the strand to search: 'plus', 'minus' or 'both'. Defaults to 'both'. Only affects nucleotide searches.

output_to_file

A boolean specifying the output type. If TRUE, the results are written into a temporary file a string containing the file name and location is returned. Otherwise a dataframe of the results is returned. Defaults to FALSE.

Value

A dataframe or a string. A dataframe is returned by default, containing the BLAST output in columns QueryId, TargetId, QueryMatchStart, QueryMatchEnd, TargetMatchStart, TargetMatchEnd, QueryMatchSeq, TargetMatchSeq, NumColumns, NumMatches, NumMismatches, NumGaps, Identity and Alignment. A string is returned if 'output_to_file' is set to TRUE. This string points to the file containing the output table.

Examples


query <- system.file("extdata", "query.fasta", package = "blaster")
db <- system.file("extdata", "db.fasta", package = "blaster")

blast_table <- blast(query = query, db = db)

query <- read_fasta(filename = query)
db <- read_fasta(filename = db)
blast_table <- blast(query = query, db = db)

prot <- system.file("extdata", "prot.fasta", package = "blaster")
prot_blast_table <- blast(query = prot, db = prot, alphabet = "protein")


Reads the contents of nucleotide or protein FASTA file into a dataframe.

Description

Reads the contents of nucleotide or protein FASTA file into a dataframe.

Usage

read_fasta(
  filename,
  filter = "",
  non_standard_chars = "error",
  alphabet = "nucleotide"
)

Arguments

filename

A string specifying the name of the FASTA file to be imported.

filter

An optional string specifying a sequence motif for sequence filtering. Only keeps those sequences containing this motif. Also splits the matched sequences and provides the split parts in two additional columns.

non_standard_chars

A string specifying instructions for handling non-standard nucleotide or amino acid characters. Options include 'remove', 'ignore' or throw an 'error'. Defaults to 'error'.

alphabet

A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'.

Value

A dataframe containing FASTA ids (Id column) and sequences (Seq column). If 'filter' is specified, the split sequences are stored in additional columns Part1 and Part2.

Examples


query <- system.file("extdata", "query.fasta", package = "blaster")

query <- read_fasta(filename = query)