Type: | Package |
Title: | Native R Implementation of an Efficient BLAST-Like Algorithm |
Version: | 1.0.7 |
Date: | 2023-08-22 |
Maintainer: | Manu Tamminen <mavatam@utu.fi> |
Description: | Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) <doi:10.1101/399782>. |
License: | BSD_3_clause + file LICENSE |
Imports: | Rcpp (≥ 1.0.5) |
LinkingTo: | Rcpp |
SystemRequirements: | C++ |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/tamminenlab/blaster |
NeedsCompilation: | yes |
Packaged: | 2023-08-22 13:12:19 UTC; mavatam |
Author: | Manu Tamminen |
Repository: | CRAN |
Date/Publication: | 2023-08-22 14:40:09 UTC |
blaster: Native R Implementation of an Efficient BLAST-Like Algorithm
Description
Implementation of an efficient BLAST-like sequence comparison algorithm, written in 'C++11' and using native R datatypes. Blaster is based on 'nsearch' - Schmid et al (2018) doi:10.1101/399782.
Author(s)
Maintainer: Manu Tamminen mavatam@utu.fi (ORCID)
Authors:
Timothy Julian tim.julian@eawag.ch (ORCID)
Aditya Jeevennavar aditya.a.jeevannavar@utu.fi (ORCID)
Steven Schmid stevschmid@gmail.com
See Also
Useful links:
Runs BLAST sequence comparison algorithm.
Description
Runs BLAST sequence comparison algorithm.
Usage
blast(
query,
db,
maxAccepts = 1,
maxRejects = 16,
minIdentity = 0.75,
alphabet = "nucleotide",
strand = "both",
output_to_file = FALSE
)
Arguments
query |
A dataframe of the query sequences (containing Id and Seq columns) or a string specifying the FASTA file of the query sequences. |
db |
A dataframe of the database sequences (containing Id and Seq columns) or a string specifying the FASTA file of the database sequences. |
maxAccepts |
A number specifying the maximum accepted hits. |
maxRejects |
A number specifying the maximum rejected hits. |
minIdentity |
A number specifying the minimal accepted sequence similarity between the query and hit sequences. |
alphabet |
A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'. |
strand |
A string specifying the strand to search: 'plus', 'minus' or 'both'. Defaults to 'both'. Only affects nucleotide searches. |
output_to_file |
A boolean specifying the output type. If TRUE, the results are written into a temporary file a string containing the file name and location is returned. Otherwise a dataframe of the results is returned. Defaults to FALSE. |
Value
A dataframe or a string. A dataframe is returned by default, containing the BLAST output in columns QueryId, TargetId, QueryMatchStart, QueryMatchEnd, TargetMatchStart, TargetMatchEnd, QueryMatchSeq, TargetMatchSeq, NumColumns, NumMatches, NumMismatches, NumGaps, Identity and Alignment. A string is returned if 'output_to_file' is set to TRUE. This string points to the file containing the output table.
Examples
query <- system.file("extdata", "query.fasta", package = "blaster")
db <- system.file("extdata", "db.fasta", package = "blaster")
blast_table <- blast(query = query, db = db)
query <- read_fasta(filename = query)
db <- read_fasta(filename = db)
blast_table <- blast(query = query, db = db)
prot <- system.file("extdata", "prot.fasta", package = "blaster")
prot_blast_table <- blast(query = prot, db = prot, alphabet = "protein")
Reads the contents of nucleotide or protein FASTA file into a dataframe.
Description
Reads the contents of nucleotide or protein FASTA file into a dataframe.
Usage
read_fasta(
filename,
filter = "",
non_standard_chars = "error",
alphabet = "nucleotide"
)
Arguments
filename |
A string specifying the name of the FASTA file to be imported. |
filter |
An optional string specifying a sequence motif for sequence filtering. Only keeps those sequences containing this motif. Also splits the matched sequences and provides the split parts in two additional columns. |
non_standard_chars |
A string specifying instructions for handling non-standard nucleotide or amino acid characters. Options include 'remove', 'ignore' or throw an 'error'. Defaults to 'error'. |
alphabet |
A string specifying the query and database alphabet: 'nucleotide' or 'protein'. Defaults to 'nucleotide'. |
Value
A dataframe containing FASTA ids (Id column) and sequences (Seq column). If 'filter' is specified, the split sequences are stored in additional columns Part1 and Part2.
Examples
query <- system.file("extdata", "query.fasta", package = "blaster")
query <- read_fasta(filename = query)