Type: | Package |
Title: | Parse xlsx Files |
Version: | 1.2.1 |
Date: | 2024-05-30 |
Description: | Uses C++ via the 'Rcpp' package to parse modern Excel files ('.xlsx'). Memory usage is kept minimal by decompressing only parts of the file at a time, while employing multiple threads to achieve significant runtime reduction. Uses https://github.com/richgel999/miniz and https://github.com/lemire/fast_double_parser. |
License: | MIT + file LICENSE |
Imports: | Rcpp (≥ 1.0.5) |
LinkingTo: | Rcpp |
URL: | https://github.com/fhenz/SheetReader-r |
BugReports: | https://github.com/fhenz/SheetReader-r/issues |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2024-05-30 17:46:00 UTC; Felix |
Author: | Felix Henze [aut, cre], Rich Geldreich [ctb, cph] (Author of included miniz code), Daniel Lemire [ctb, cph] (Author of included fast_double_parser code) |
Maintainer: | Felix Henze <felixhenze0@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-05-30 20:40:02 UTC |
Fast and efficient xlsx parsing
Description
Uses C++ via the 'Rcpp' package to parse modern Excel files ('.xlsx'). Memory usage is kept minimal by decompressing only parts of the file at a time, while employing multiple threads to achieve significant runtime reduction.
Details
The only function provided by this package is read_xlsx()
,
with options to determine parsing behaviour.
Author(s)
Felix Henze
Maintainer: Felix Henze <felixhenze0@gmail.com>
Parse data from a xlsx file
Description
Parse tabular data from a sheet inside a xlsx file into a data.frame
Usage
read_xlsx(
path,
sheet = NULL,
headers = TRUE,
skip_rows = 0,
skip_columns = 0,
num_threads = -1,
col_types = NULL
)
Arguments
path |
The path to the xlsx file that is to be parsed. |
sheet |
Which sheet in the file to parse. Can be either the index/position (1 = first sheet) or name. By default parses the first sheet. |
headers |
Whether to interpret the first row as column names. |
skip_rows |
How many rows should be skipped before values are read. |
skip_columns |
How many columns should be skipped before values are read. |
num_threads |
The number of threads to use for parsing. Will be automatically determined if not provided. |
col_types |
A named or unnamed character vector containing one of:
"guess", "logical", "numeric", "date", "text". If unnamed, the types are assigned by column index
(after |
Value
data.frame
Examples
exampleFile <- system.file("extdata", "multi-test.xlsx", package = "SheetReader")
# Read first sheet of the file, using first row as column names
df1 <- read_xlsx(exampleFile, sheet = 1, headers = TRUE)
head(df1)
# Read the "encoding" sheet, skipping 1 row and not using the next row as column names
df2 <- read_xlsx(exampleFile, sheet = "encoding", headers = FALSE, skip_rows = 1)
head(df2)
# Coerce the column with header "Integer" as text
df3 <- read_xlsx(exampleFile, sheet = 1, headers = TRUE, col_types=c("Integer"="text"))
head(df3)