Title: | USGS INL Project Office Publications |
Version: | 1.3.0 |
Description: | Contains bibliographic information for the U.S. Geological Survey (USGS) Idaho National Laboratory (INL) Project Office. |
Depends: | R (≥ 4.1) |
Imports: | checkmate, stats, tm |
Suggests: | chromote, connectapi, covr, cyclocomp, graphics, htmltools, htmlwidgets, jsonlite, kableExtra, knitr, lintr, magick, markdown, png, pkgbuild, pkgdown, pkgload, pdftools, reactable, renv, rmarkdown, rsconnect, RWeka, stringi, tesseract, textutils, tinytest, utils, webshot2, wordcloud2 |
License: | CC0 |
URL: | https://rconnect.usgs.gov/INLPO/inlpubs-main/, https://code.usgs.gov/inl/inlpubs |
BugReports: | https://code.usgs.gov/inl/inlpubs/-/issues |
Copyright: | This software is in the public domain because it contains materials that originally came from the United States Geological Survey (USGS), an agency of the United States Department of Interior. For more information, see the official USGS copyright policy at https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits |
Encoding: | UTF-8 |
SystemRequirements: | Complete functionality necessitates Amazon Corretto (win), and default-jre, pandoc, libxml2-dev, libpoppler-cpp-dev, libmagick++-dev, optipng, libtesseract-dev, libleptonica-dev, tesseract-ocr-eng (deb) |
LazyData: | true |
LazyDataCompression: | xz |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-06-25 20:39:49 UTC; jfisher |
Author: | Jason C. Fisher |
Maintainer: | Jason C. Fisher <jfisher@usgs.gov> |
Repository: | CRAN |
Date/Publication: | 2025-06-25 23:10:02 UTC |
Add Content from PDF Documents
Description
Incorporate the text or cover image from a PDF document into the inlpubs package.
Usage
add_content(
pub_id,
year,
type = c("text", "image"),
...,
srcdir = "archive",
destdir = tempdir(),
ignore = NULL,
pubs = inlpubs::pubs,
overwrite = FALSE
)
Arguments
pub_id |
'character' vector.
Unique identifier for the publication.
May also be specified using the |
year |
'integer' vector. Year of publication. |
type |
'character' string. Type of content to obtain from the PDF file. Specify as either "text" (the default) or "image". |
... |
Arguments to be passed to the function used to obtain the context,
|
srcdir |
'character' string. The PDF document is located in a subdirectory of the source directory, and this subdirectory is named after the publication year. It is set to default to the 'archive' directory, which is found in the working directory. |
destdir |
'character' string. Target folder for the cover image that is saved in JPEG format. Defaults to the temporary directory. |
ignore |
'character' vector. Publication identifier(s) to ignore. |
pubs |
'pub' table.
Publications of the INLPO, see |
overwrite |
'logical' flag. Whether to overwrite an existing text or image file. |
Value
Returns the path to the saved text or image file, invisibly.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
Contributing Authors to INLPO Publications
Description
Authors who have contributed to the publications by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
Usage
authors
Format
An object of class 'author' that inherits behavior from the 'data.frame' class and includes the following columns:
author_id
Unique identifier for the author.
name
Name of author, surname first and initials or given name.
person
Information about the person like email address and ORCiD identifier.
pub_id
Identifier(s) of the publication(s) the author has contributed to, referes to the primry key of the
pubs
data table.total_pub
Total number of publications.
single_authored
Number of single-authored publications.
multi_authored
Number of multi-authored publications.
first_authored
Number of multi-authored publications where the researcher appears as first author.
first_year
First year author published.
last_year
Last year author published.
Source
Curated by INLPO staff.
Examples
# Subset Jason Fisher's information and display structure:
author <- authors["jfisher", ]
str(author, max.level = 3, width = 75, strict.width = "cut")
# Print author's given name:
author$person |> format(include = "given")
Filter Data List Column
Description
Create a data list column filter for a React Table. Requires that the htmltools packages is available.
Usage
filter_data_list(table_id, style = "width: 100%; height: 28px;")
Arguments
table_id |
'character' string. Unique table identifier. |
style |
'character' string. CSS style applied to input HTML tag. |
Value
Returns a function to perform filtering.
Examples
f <- filter_data_list("table-id")
Obtain Image from a PDF Document
Description
Obtain an image from any PDF document. Requires that the pdftools and magick packages are available.
Usage
get_pdf_image(
input,
output = tempfile(fileext = ".jpg"),
page = 1,
width = 300,
depth = 8,
quality = 70
)
Arguments
input |
'character' string. File path to PDF document. |
output |
'character' string. Location to write the JPEG image file. |
page |
'integer' number. Page number in the document. Defaults to page 1. |
width |
'integer' number. Image width in pixels. |
depth |
'integer' number. Image color depth (either 8 or 16). Defaults to 8. |
quality |
'integer' number. JPEG quality, a number between 0 and 100. Defaults to 70. |
Value
Returns the path to the image file.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
add_content
function to add cover images to the inlpubs package.
Examples
input <- system.file("extdata", "test.pdf", package = "inlpubs")
path <- get_pdf_image(input)
unlink(path)
Obtain Text from a PDF Document
Description
Obtain text from any PDF document. Requires that the pdftools and tesseract packages are available.
Usage
get_pdf_text(input, output = tempfile(fileext = ".txt"), dpi = 600, psm = 1)
Arguments
input |
'character' string. File path to PDF document. |
output |
'character' string. Location to write the text file. |
dpi |
'integer' number between 100 and 1200. Dots per inch (DPI). The resolution of an image, specifically the number of pixels per inch. For optimal optical character recognition (OCR) accuracy, 600 DPI (the default) is recommended. |
psm |
|
Value
Returns the path to the text file. Each page from the PDF is transcribed as a separate line in the file.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
add_content
function to add texts to the inlpubs-package corpus.
Examples
## Not run:
input <- system.file("extdata", "test.pdf", package = "inlpubs")
path <- get_pdf_text(input)
unlink(path)
## End(Not run)
Get Person(s)
Description
Filter a list of individuals based on their distinct identifiers.
Usage
get_person(x, persons)
Arguments
x |
'character' vector. Identifier for one or more persons. |
persons |
'person' named list. Information about an arbitrary number of persons. Each element in the list is assigned a name, which uniquely identifies a person. |
Value
A subset of persons
.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
Examples
get_person("jfisher", persons = inlpubs::authors$person)
Create Author and Publication Webpages
Description
Creates a webpage for each author, listing their publications. Each webpage is saved as an R Markdown file.
Usage
make_webpages(authors = NULL, pubs = NULL, destdir = tempdir(), quiet = FALSE)
Arguments
authors |
'author' data frame.
Contributing authors to the INLPO publications, see |
pubs |
'pub' data frame.
Publications of the INLPO, see |
destdir |
'character' string. Destination directory to write files, with tilde-expansion performed. Defaults to a temporary directory. |
quiet |
'logical' flag. Whether to suppress printing of debugging information. |
Value
NULL
invisibly.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
Examples
destdir <- tempfile("")
make_webpages(
authors = inlpubs::authors,
pubs = inlpubs::pubs,
destdir = destdir,
quiet = TRUE
)
unlink(destdir, recursive = TRUE)
Create Word Cloud
Description
Create a word cloud from a frequency table of words, and save to a PNG file.
Requires R-packages htmltools, htmlwidgets, magick, webshot2,
and wordcloud2 are available.
System dependencies include the the following:
ImageMagick for displaying the PNG image,
OptiPNG for PNG file compression, and
Chrome- or a Chromium-based browser
with support for the Chrome DevTools protocol.
Use find_chromate
function to find the path to the Chrome browser.
Usage
make_wordcloud(
x,
max_terms = 200,
size = 1,
shape = "circle",
ellipticity = 0.65,
...,
width = 910,
output = NULL,
display = FALSE
)
Arguments
x |
'data.frame'. A frequency table of terms that includes "term" and "freq" in each column. |
max_terms |
'integer' number. Maximum number of terms to include in the word cloud. |
size |
'numeric' number. Font size. |
shape |
'character' string. Shape of the “cloud” to draw. Possible shapes include a "circle", "cardioid", "diamond", "triangle-forward", "triangle", "pentagon", and "star". |
ellipticity |
'numeric' number. Degree of “flatness” of the shape to draw, a value between 0 and 1. |
... |
Additional arguments to be passed to the |
width |
'integer' number. Desired image width in pixels. |
output |
'character' string. Path to the output file, by default the word cloud is copied to a temporary file. |
display |
'logical' flag. Whether to display the saved PNG file in a graphics window. Requires access to the magick package. |
Value
File path to the word cloud plot in PNG format.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
mine_text
function to perform a term frequency text analysis.
Examples
## Not run:
d <- wordcloud2::demoFreq |> head(n = 10)
colnames(d) <- c("term", "freq")
file <- make_wordcloud(d, display = interactive())
unlink(file)
## End(Not run)
Mine Text
Description
Performs a term frequency text analysis. A term is defined as a word or group of words.
Usage
mine_text(docs, ngmin = 1, ngmax = ngmin, sparse = NULL)
Arguments
docs |
'list' or 'character' vector. Document text to analyze. Each list item contains the extracted text from a single document. |
ngmin , ngmax |
integer number. Splits strings into n-grams with given minimal and maximal numbers of grams. An n-gram is an ordered sequence of n words taken from the body of a text. Requires the RWeka package is available and that the environment variable JAVA_HOME points to where the Java software is located. Recommended for single text compoents only. |
sparse |
'numeric' number that is greater than 0 and less than 1.
A threshold of relative document frequency for a term.
It specifies the proportion of documents in which a term must appear to be retained.
For example if you specify |
Details
HTML entities are decoded when the textutils package is available.
Value
A term-frequency data table giving the number of times each word occurs in the text.
A column in the table represents a single component in the docs
argument,
and each row provides frequency counts for a particular word (also known as a 'term').
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
search_terms
function to search for terms within the resulting term-frequency data table.
make_wordcloud
function to create a word cloud.
Examples
d <- c(
"The quick brown fox jumps over the lazy lazy dog.",
"Pack my brown box.",
"Jazz fly brown dog."
) |>
mine_text()
d <- list(
"A" = "The quick brown fox jumps over the lazy lazy dog.",
"B" = c("Pack my brown box.", NA, "Jazz fly brown dog."),
"C" = NA_character_
) |>
mine_text()
Publications of the INLPO
Description
Bibliographic information for reports, articles, maps, and theses related to scientific monitoring and research conducted by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
Usage
pubs
Format
An object of class 'pub' that inherits behavior from the 'data.frame' class and includes the following columns:
pub_id
Unique identifier for the publication.
institution
Name of the institution that published and/or sponsored the report.
type
Type of publication.
text_ref
Text reference (also known as the in-text citation) that excludes the year of publication.
year
Year of publication.
author_id
Identifier(s) of the author(s), referes to the primry key of the
authors
data table.title
Title of publication.
bibentry
Bibliographic entry of class
bibentry
.abstract
Abstract of publication.
annotation
Annotation of publication.
annotation_src
Identifier for the annotation source publication (Knobel and others, 2005; Bartholomay, 2022).
files
File names associated with the publication.
Source
Many of these publications are available through the USGS Publications Warehouse.
References
Bartholomay, R.C., 2022, Historical development of the U.S. Geological Survey hydrological monitoring and investigative programs at the Idaho National Laboratory, Idaho, 2002-2020: U.S. Geological Survey Open-File Report 2022-1027 (DOE/ID-22256), 54 p., doi:10.3133/ofr20221027.
Knobel, L.L., Bartholomay, R.C., and Rousseau, J.P., 2005, Historical development of the U.S. Geological Survey hydrologic monitoring and investigative programs at the Idaho National Engineering and Environmental Laboratory, Idaho, 1949 to 2001: U.S. Geological Survey Open-File Report 2005–1223 (DOE/ID–22195), 93 p., doi:10.3133/ofr20051223.
Examples
# Subset Fisher and others (2012) and display structure:
id <- "FisherOthers2012"
pub <- pubs[id, ]
str(pub, max.level = 3, width = 75, strict.width = "cut")
# Print suggested citation:
attr(unclass(pub$bibentry[[1]])[[1]], which = "textVersion")
# Print authors full name:
format(pub$bibentry[[1]]$author, include = c("given", "family"))
# Print abstract:
pub$abstract
Search Terms
Description
Pattern matches a search term within the term-frequency data table.
Usage
search_terms(
x,
data = inlpubs::terms,
ignore_case = TRUE,
...,
low_freq = 1,
high_freq = Inf,
simplify = TRUE
)
Arguments
x |
'character' string. Term searched for in the term-frequency data table. |
data |
'term' and 'data.frame' class.
Term-frequency data table.
Defaults to using the term frequencies from the INLPO publications,
see |
ignore_case |
'logical' flag. Whether to ignore character case during pattern matching. |
... |
Additional arguments passed to the |
low_freq |
'numeric' number. Lower frequency bound. |
high_freq |
'numeric' number. Upper frequency bound. |
simplify |
'logical' flag. Whether to return only the unique publication identifiers. |
Value
A subset of the data table sorted by decreasing frequency.
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
mine_text
function to perform a term frequency text analysis.
Examples
search_terms("mlms")
out <- search_terms("mlms", simplify = FALSE)
head(out)
Term Frequency from INLPO Publications
Description
Term frequency from publications by the U.S. Geological Survey (USGS), Idaho Water Science Center, Idaho National Laboratory Project Office (INLPO).
Usage
terms
Format
An object of class 'term' that inherits behavior from the 'data.frame' class and includes the following columns:
term
Term, a word or group of words, represented by an ASCII character string in lowercase.
pub_id
Identifier for a publication, referes to the primry key of the
pubs
data table.freq
Frequency count from text analysis.
Source
The publication text was sourced from the original PDF documents using the get_pdf_text
function,
and term frequencies were extracted from the text using the mine_text
function.
Examples
str(terms, max.level = 3, width = 75, strict.width = "cut")