Title: | An Interface to Google's 'BigQuery' 'API' |
Version: | 1.5.1 |
Description: | Easily talk to Google's 'BigQuery' database from R. |
License: | MIT + file LICENSE |
URL: | https://bigrquery.r-dbi.org, https://github.com/r-dbi/bigrquery |
BugReports: | https://github.com/r-dbi/bigrquery/issues |
Depends: | R (≥ 3.6) |
Imports: | bit64, brio, cli, clock, curl, DBI, gargle (≥ 1.5.0), httr, jsonlite, lifecycle, methods, prettyunits, rlang (≥ 1.1.0), tibble |
Suggests: | blob, covr, dbplyr (≥ 2.4.0), dplyr (≥ 1.1.0), hms, readr, sodium, testthat (≥ 3.1.5), wk (≥ 0.3.2), withr |
LinkingTo: | cli, cpp11, rapidjsonr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | TRUE |
Config/testthat/start-first: | bq-table, dplyr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Collate: | 'bigrquery-package.R' 'bq-auth.R' 'bq-dataset.R' 'bq-download.R' 'bq-field.R' 'bq-job.R' 'bq-param.R' 'bq-parse.R' 'bq-perform.R' 'bq-project.R' 'bq-projects.R' 'bq-query.R' 'bq-refs.R' 'bq-request.R' 'bq-table.R' 'bq-test.R' 'camelCase.R' 'connections-page.R' 'cpp11.R' 'dbi-driver.R' 'dbi-connection.R' 'dbi-result.R' 'dplyr.R' 'gs-object.R' 'import-standalone-obj-type.R' 'import-standalone-s3-register.R' 'import-standalone-types-check.R' 'utils.R' 'zzz.R' |
NeedsCompilation: | yes |
Packaged: | 2024-03-14 16:42:02 UTC; hadleywickham |
Author: | Hadley Wickham |
Maintainer: | Hadley Wickham <hadley@posit.co> |
Repository: | CRAN |
Date/Publication: | 2024-03-14 17:10:02 UTC |
bigrquery: An Interface to Google's 'BigQuery' 'API'
Description
Easily talk to Google's 'BigQuery' database from R.
Package options
bigrquery.quiet
Verbose output during processing? The default value,
NA
, turns on verbose output for queries that run longer than two seconds. UseFALSE
for immediate verbose output,TRUE
for quiet operation.bigrquery.page.size
Default page size for fetching data, defaults to 1e4.
Author(s)
Maintainer: Hadley Wickham hadley@posit.co (ORCID)
Authors:
Jennifer Bryan jenny@posit.co (ORCID)
Other contributors:
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/r-dbi/bigrquery/issues
DBI methods
Description
Implementations of pure virtual functions defined in the DBI
package.
Usage
## S4 method for signature 'bq_dataset'
dbConnect(drv, ...)
## S4 method for signature 'BigQueryDriver'
show(object)
## S4 method for signature 'BigQueryDriver'
dbGetInfo(dbObj, ...)
## S4 method for signature 'BigQueryDriver'
dbIsValid(dbObj, ...)
## S4 method for signature 'BigQueryDriver'
dbDataType(dbObj, obj, ...)
## S4 method for signature 'BigQueryConnection'
show(object)
## S4 method for signature 'BigQueryConnection'
dbIsValid(dbObj, ...)
## S4 method for signature 'BigQueryConnection'
dbDisconnect(conn, ...)
## S4 method for signature 'BigQueryConnection,character'
dbSendQuery(conn, statement, ..., params = NULL)
## S4 method for signature 'BigQueryConnection,character'
dbExecute(conn, statement, ...)
## S4 method for signature 'BigQueryConnection,character'
dbQuoteString(conn, x, ...)
## S4 method for signature 'BigQueryConnection,SQL'
dbQuoteString(conn, x, ...)
## S4 method for signature 'BigQueryConnection,character'
dbQuoteIdentifier(conn, x, ...)
## S4 method for signature 'BigQueryConnection,SQL'
dbQuoteIdentifier(conn, x, ...)
## S4 method for signature 'BigQueryConnection,logical'
dbQuoteLiteral(conn, x, ...)
## S4 method for signature 'BigQueryConnection'
dbDataType(dbObj, obj, ...)
## S4 method for signature 'BigQueryConnection,character,data.frame'
dbWriteTable(
conn,
name,
value,
...,
overwrite = FALSE,
append = FALSE,
field.types = NULL,
temporary = FALSE,
row.names = NA
)
## S4 method for signature 'BigQueryConnection,Id,data.frame'
dbWriteTable(
conn,
name,
value,
...,
overwrite = FALSE,
append = FALSE,
field.types = NULL,
temporary = FALSE,
row.names = NA
)
## S4 method for signature 'BigQueryConnection,AsIs,data.frame'
dbWriteTable(
conn,
name,
value,
...,
overwrite = FALSE,
append = FALSE,
field.types = NULL,
temporary = FALSE,
row.names = NA
)
## S4 method for signature 'BigQueryConnection,character,data.frame'
dbAppendTable(conn, name, value, ..., row.names = NULL)
## S4 method for signature 'BigQueryConnection,Id,data.frame'
dbAppendTable(conn, name, value, ..., row.names = NULL)
## S4 method for signature 'BigQueryConnection,AsIs,data.frame'
dbAppendTable(conn, name, value, ..., row.names = NULL)
## S4 method for signature 'BigQueryConnection'
dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE)
## S4 method for signature 'BigQueryConnection'
dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE)
## S4 method for signature 'BigQueryConnection,character'
dbReadTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,Id'
dbReadTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,AsIs'
dbReadTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection'
dbListTables(conn, ...)
## S4 method for signature 'BigQueryConnection,character'
dbExistsTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,Id'
dbExistsTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,AsIs'
dbExistsTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,character'
dbListFields(conn, name, ...)
## S4 method for signature 'BigQueryConnection,Id'
dbListFields(conn, name, ...)
## S4 method for signature 'BigQueryConnection,AsIs'
dbListFields(conn, name, ...)
## S4 method for signature 'BigQueryConnection,character'
dbRemoveTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,Id'
dbRemoveTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection,AsIs'
dbRemoveTable(conn, name, ...)
## S4 method for signature 'BigQueryConnection'
dbGetInfo(dbObj, ...)
## S4 method for signature 'BigQueryConnection'
dbBegin(conn, ...)
## S4 method for signature 'BigQueryConnection'
dbCommit(conn, ...)
## S4 method for signature 'BigQueryConnection'
dbRollback(conn, ...)
## S4 method for signature 'BigQueryResult'
show(object)
## S4 method for signature 'BigQueryResult'
dbIsValid(dbObj, ...)
## S4 method for signature 'BigQueryResult'
dbClearResult(res, ...)
## S4 method for signature 'BigQueryResult'
dbFetch(res, n = -1, ...)
## S4 method for signature 'BigQueryResult'
dbHasCompleted(res, ...)
## S4 method for signature 'BigQueryResult'
dbGetStatement(res, ...)
## S4 method for signature 'BigQueryResult'
dbColumnInfo(res, ...)
## S4 method for signature 'BigQueryResult'
dbGetRowCount(res, ...)
## S4 method for signature 'BigQueryResult'
dbGetRowsAffected(res, ...)
## S4 method for signature 'BigQueryResult'
dbBind(res, params, ...)
Arguments
... |
Other arguments to methods. |
object |
Any R object |
dbObj |
An object inheriting from DBIObject, i.e. DBIDriver, DBIConnection, or a DBIResult |
obj |
An R object whose SQL type we want to determine. |
conn |
A DBIConnection object, as returned by
|
statement |
a character string containing SQL. |
params |
For |
x |
A character vector to quote as string. |
name |
The table name, passed on to
|
value |
A data.frame (or coercible to data.frame). |
overwrite |
a logical specifying whether to overwrite an existing table
or not. Its default is |
append |
a logical specifying whether to append to an existing table
in the DBMS. Its default is |
field.types , temporary |
Ignored. Included for compatibility with generic. |
row.names |
A logical specifying whether the |
fields |
Either a character vector or a data frame. A named character vector: Names are column names, values are types.
Names are escaped with A data frame: field types are generated using
|
res |
An object inheriting from DBIResult. |
n |
maximum number of records to retrieve per fetch. Use |
BigQuery datasets
Description
Basic create-read-update-delete verbs for datasets.
Usage
bq_dataset_create(x, location = "US", ...)
bq_dataset_meta(x, fields = NULL)
bq_dataset_exists(x)
bq_dataset_update(x, ...)
bq_dataset_delete(x, delete_contents = FALSE)
bq_dataset_tables(x, page_size = 50, max_pages = Inf, warn = TRUE, ...)
Arguments
x |
|
location |
Dataset location |
... |
Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase. |
fields |
An optional field specification for partial response |
delete_contents |
If |
page_size |
Number of items per page. |
max_pages |
Maximum number of pages to retrieve. Use |
warn |
If |
Google BigQuery API documentation
Examples
ds <- bq_dataset(bq_test_project(), "dataset_api")
bq_dataset_exists(ds)
bq_dataset_create(ds)
bq_dataset_exists(ds)
str(bq_dataset_meta(ds))
bq_dataset_delete(ds)
bq_dataset_exists(ds)
# Use bq_test_dataset() to create a temporary dataset that will
# be automatically deleted
ds <- bq_test_dataset()
bq_table_create(bq_table(ds, "x1"))
bq_table_create(bq_table(ds, "x2"))
bq_table_create(bq_table(ds, "x3"))
bq_dataset_tables(ds)
BigQuery job: retrieve metadata
Description
To perform a job, see api-perform. These functions all retrieve metadata (in various forms) about an existing job.
Usage
bq_job_meta(x, fields = NULL)
bq_job_status(x)
bq_job_show_statistics(x)
bq_job_wait(
x,
quiet = getOption("bigrquery.quiet"),
pause = 0.5,
call = caller_env()
)
Arguments
x |
A bq_job |
fields |
An optional field specification for partial response |
quiet |
If |
pause |
amount of time to wait between status requests |
call |
The execution environment of a currently
running function, e.g. |
Google BigQuery API documentation
Examples
jobs <- bq_project_jobs(bq_test_project())
jobs[[1]]
# Show statistics about job
bq_job_show_statistics(jobs[[1]])
# Wait for job to complete
bq_job_wait(jobs[[1]])
BigQuery jobs: perform a job
Description
These functions are low-level functions designed to be used by experts. Each of these low-level functions is paired with a high-level function that you should use instead:
-
bq_perform_copy()
:bq_table_copy()
. -
bq_perform_query()
:bq_dataset_query()
,bq_project_query()
. -
bq_perform_upload()
:bq_table_upload()
. -
bq_perform_load()
:bq_table_load()
. -
bq_perform_extract()
:bq_table_save()
.
Usage
bq_perform_extract(
x,
destination_uris,
destination_format = "NEWLINE_DELIMITED_JSON",
compression = "NONE",
...,
print_header = TRUE,
billing = x$project
)
bq_perform_upload(
x,
values,
fields = NULL,
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_EMPTY",
...,
billing = x$project
)
bq_perform_load(
x,
source_uris,
billing = x$project,
source_format = "NEWLINE_DELIMITED_JSON",
fields = NULL,
nskip = 0,
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_EMPTY",
...
)
bq_perform_query(
query,
billing,
...,
parameters = NULL,
destination_table = NULL,
default_dataset = NULL,
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_EMPTY",
use_legacy_sql = FALSE,
priority = "INTERACTIVE"
)
bq_perform_query_dry_run(
query,
billing,
...,
default_dataset = NULL,
parameters = NULL,
use_legacy_sql = FALSE
)
bq_perform_copy(
src,
dest,
create_disposition = "CREATE_IF_NEEDED",
write_disposition = "WRITE_EMPTY",
...,
billing = NULL
)
Arguments
x |
A bq_table |
destination_uris |
A character vector of fully-qualified Google Cloud
Storage URIs where the extracted table should be written. Can export
up to 1 Gb of data per file. Use a wild card URI (e.g.
|
destination_format |
The exported file format. Possible values include "CSV", "NEWLINE_DELIMITED_JSON" and "AVRO". Tables with nested or repeated fields cannot be exported as CSV. |
compression |
The compression type to use for exported files. Possible values include "GZIP", "DEFLATE", "SNAPPY", and "NONE". "DEFLATE" and "SNAPPY" are only supported for Avro. |
... |
Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase. |
print_header |
Whether to print out a header row in the results. |
billing |
Identifier of project to bill. |
values |
Data frame of values to insert. |
fields |
A bq_fields specification, or something coercible to it
(like a data frame). Leave as |
create_disposition |
Specifies whether the job is allowed to create new tables. The following values are supported:
|
write_disposition |
Specifies the action that occurs if the destination table already exists. The following values are supported:
|
source_uris |
The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one ''*'“ wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups: Exactly one URI can be specified. Also, the '*' wildcard character is not allowed. |
source_format |
The format of the data files:
|
nskip |
For |
query |
SQL query string. |
parameters |
Named list of parameters match to query parameters.
Parameter Generally, you can supply R vectors and they will be automatically
converted to the correct type. If you need greater control, you can call
See https://cloud.google.com/bigquery/docs/parameterized-queries for more details. |
destination_table |
A bq_table where results should be stored. If not supplied, results will be saved to a temporary table that lives in a special dataset. You must supply this parameter for large queries (> 128 MB compressed). |
default_dataset |
A bq_dataset used to automatically qualify table names. |
use_legacy_sql |
If |
priority |
Specifies a priority for the query. Possible values include "INTERACTIVE" and "BATCH". Batch queries do not start immediately, but are not rate-limited in the same way as interactive queries. |
Value
A bq_job.
Google BigQuery API documentation
Additional information at:
Examples
ds <- bq_test_dataset()
bq_mtcars <- bq_table(ds, "mtcars")
job <- bq_perform_upload(bq_mtcars, mtcars)
bq_table_exists(bq_mtcars)
bq_job_wait(job)
bq_table_exists(bq_mtcars)
head(bq_table_download(bq_mtcars))
BigQuery project methods
Description
Projects have two primary components: datasets and jobs. Unlike other
BigQuery objects, is no accompanying bq_project
S3 class because a project
is a simple string.
Usage
bq_project_datasets(x, page_size = 100, max_pages = 1, warn = TRUE)
bq_project_jobs(x, page_size = 100, max_pages = 1, warn = TRUE)
Arguments
x |
A string giving a project name. |
page_size |
Number of items per page. |
max_pages |
Maximum number of pages to retrieve. Use |
warn |
If |
Value
-
bq_project_datasets()
: a list of bq_datasets -
bq_project_jobs()
: a list of bq_jobs.
Google BigQuery API documentation
One day we might also expose the general project metadata.
Examples
bq_project_datasets("bigquery-public-data")
bq_project_datasets("githubarchive")
bq_project_jobs(bq_test_project(), page_size = 10)
BigQuery tables
Description
Basic create-read-update-delete verbs for tables, as well as functions
uploading data (bq_table_upload()
), saving to/loading from Google
Cloud Storage (bq_table_load()
, bq_table_save()
), and getting
various values from the metadata.
Usage
bq_table_create(x, fields = NULL, ...)
bq_table_meta(x, fields = NULL)
bq_table_fields(x)
bq_table_size(x)
bq_table_nrow(x)
bq_table_exists(x)
bq_table_delete(x)
bq_table_copy(x, dest, ..., quiet = NA)
bq_table_upload(x, values, ..., quiet = NA)
bq_table_save(x, destination_uris, ..., quiet = NA)
bq_table_load(x, source_uris, ..., quiet = NA)
bq_table_patch(x, fields)
Arguments
x |
A bq_table, or an object coercible to a |
fields |
A bq_fields specification, or something coercible to it (like a data frame). |
... |
Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase. |
dest |
Source and destination bq_tables. |
quiet |
If |
values |
Data frame of values to insert. |
destination_uris |
A character vector of fully-qualified Google Cloud
Storage URIs where the extracted table should be written. Can export
up to 1 Gb of data per file. Use a wild card URI (e.g.
|
source_uris |
The fully-qualified URIs that point to your data in Google Cloud. For Google Cloud Storage URIs: Each URI can contain one ''*'“ wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups: Exactly one URI can be specified. Also, the '*' wildcard character is not allowed. |
Value
-
bq_table_copy()
,bq_table_create()
,bq_table_delete()
,bq_table_upload()
: an invisible bq_table -
bq_table_exists()
: eitherTRUE
orFALSE
. -
bq_table_size()
: the size of the table in bytes -
bq_table_fields()
: a bq_fields.
Google BigQuery API documentation
Examples
ds <- bq_test_dataset()
bq_mtcars <- bq_table(ds, "mtcars")
bq_table_exists(bq_mtcars)
bq_table_create(
bq_mtcars,
fields = mtcars,
friendly_name = "Motor Trend Car Road Tests",
description = "The data was extracted from the 1974 Motor Trend US magazine",
labels = list(category = "example")
)
bq_table_exists(bq_mtcars)
bq_table_upload(bq_mtcars, mtcars)
bq_table_fields(bq_mtcars)
bq_table_size(bq_mtcars)
str(bq_table_meta(bq_mtcars))
bq_table_delete(bq_mtcars)
bq_table_exists(bq_mtcars)
my_natality <- bq_table(ds, "mynatality")
bq_table_copy("publicdata.samples.natality", my_natality)
BigQuery DBI driver
Description
Creates a BigQuery DBI driver for use in DBI::dbConnect()
.
Usage
## S4 method for signature 'BigQueryDriver'
dbConnect(
drv,
project,
dataset = NULL,
billing = project,
page_size = 10000,
quiet = NA,
use_legacy_sql = FALSE,
bigint = c("integer", "integer64", "numeric", "character"),
...
)
Arguments
drv |
an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection). |
project , dataset |
Project and dataset identifiers |
billing |
Identifier of project to bill. |
page_size |
Number of items per page. |
quiet |
If |
use_legacy_sql |
If |
bigint |
The R type that BigQuery's 64-bit integer types should be mapped to.
The default is |
... |
Other arguments for compatibility with generic; currently ignored. |
Examples
con <- DBI::dbConnect(
bigquery(),
project = "publicdata",
dataset = "samples",
billing = bq_test_project()
)
con
DBI::dbListTables(con)
DBI::dbReadTable(con, "natality", n_max = 10)
# Create a temporary dataset to explore
ds <- bq_test_dataset()
con <- DBI::dbConnect(
bigquery(),
project = ds$project,
dataset = ds$dataset
)
DBI::dbWriteTable(con, "mtcars", mtcars)
DBI::dbReadTable(con, "mtcars")[1:6, ]
DBI::dbGetQuery(con, "SELECT count(*) FROM mtcars")
res <- DBI::dbSendQuery(con, "SELECT cyl, mpg FROM mtcars")
dbColumnInfo(res)
dbFetch(res, 10)
dbFetch(res, -1)
DBI::dbHasCompleted(res)
Authorize bigrquery
Description
Authorize bigrquery to view and manage your BigQuery projects. This function is a
wrapper around gargle::token_fetch()
.
By default, you are directed to a web browser, asked to sign in to your Google account, and to grant bigrquery permission to operate on your behalf with Google BigQuery. By default, with your permission, these user credentials are cached in a folder below your home directory, from where they can be automatically refreshed, as necessary. Storage at the user level means the same token can be used across multiple projects and tokens are less likely to be synced to the cloud by accident.
Usage
bq_auth(
email = gargle::gargle_oauth_email(),
path = NULL,
scopes = c("https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/cloud-platform"),
cache = gargle::gargle_oauth_cache(),
use_oob = gargle::gargle_oob_default(),
token = NULL
)
Arguments
email |
Optional. If specified,
Defaults to the option named |
path |
JSON identifying the service account, in one of the forms
supported for the |
scopes |
A character vector of scopes to request. Pick from those listed at https://developers.google.com/identity/protocols/oauth2/scopes. |
cache |
Specifies the OAuth token cache. Defaults to the option named
|
use_oob |
Whether to use out-of-band authentication (or, perhaps, a
variant implemented by gargle and known as "pseudo-OOB") when first
acquiring the token. Defaults to the value returned by
If the OAuth client is provided implicitly by a wrapper package, its type
probably defaults to the value returned by
|
token |
A token with class Token2.0 or an object of
httr's class |
Details
Most users, most of the time, do not need to call bq_auth()
explicitly – it is triggered by the first action that requires
authorization. Even when called, the default arguments often suffice.
However, when necessary, bq_auth()
allows the user to explicitly:
Declare which Google identity to use, via an
email
specification.Use a service account token or workload identity federation via
path
.Bring your own
token
.Customize
scopes
.Use a non-default
cache
folder or turn caching off.Explicitly request out-of-band (OOB) auth via
use_oob
.
If you are interacting with R within a browser (applies to RStudio
Server, Posit Workbench, Posit Cloud, and Google Colaboratory), you need
OOB auth or the pseudo-OOB variant. If this does not happen
automatically, you can request it explicitly with use_oob = TRUE
or,
more persistently, by setting an option via
options(gargle_oob_default = TRUE)
.
The choice between conventional OOB or pseudo-OOB auth is determined
by the type of OAuth client. If the client is of the "installed" type,
use_oob = TRUE
results in conventional OOB auth. If the client is of
the "web" type, use_oob = TRUE
results in pseudo-OOB auth. Packages
that provide a built-in OAuth client can usually detect which type of
client to use. But if you need to set this explicitly, use the
"gargle_oauth_client_type"
option:
options(gargle_oauth_client_type = "web") # pseudo-OOB # or, alternatively options(gargle_oauth_client_type = "installed") # conventional OOB
For details on the many ways to find a token, see
gargle::token_fetch()
. For deeper control over auth, use
bq_auth_configure()
to bring your own OAuth client or API key.
To learn more about gargle options, see gargle::gargle_options.
See Also
Other auth functions:
bq_auth_configure()
,
bq_deauth()
Examples
## Not run:
## load/refresh existing credentials, if available
## otherwise, go to browser for authentication and authorization
bq_auth()
## force use of a token associated with a specific email
bq_auth(email = "jenny@example.com")
## force a menu where you can choose from existing tokens or
## choose to get a new one
bq_auth(email = NA)
## use a 'read only' scope, so it's impossible to change data
bq_auth(
scopes = "https://www.googleapis.com/auth/devstorage.read_only"
)
## use a service account token
bq_auth(path = "foofy-83ee9e7c9c48.json")
## End(Not run)
Edit and view auth configuration
Description
These functions give more control over and visibility into the auth
configuration than bq_auth()
does. bq_auth_configure()
lets the user specify their own:
OAuth client, which is used when obtaining a user token.
See the vignette("get-api-credentials", package = "gargle")
for more.
If the user does not configure these settings, internal defaults
are used.
bq_oauth_client()
retrieves the currently configured OAuth client.
Usage
bq_auth_configure(client, path, app = deprecated())
bq_oauth_client()
Arguments
client |
A Google OAuth client, presumably constructed via
|
path |
JSON downloaded from Google Cloud Console, containing a client id and
secret, in one of the forms supported for the |
app |
Value
-
bq_auth_configure()
: An object of R6 class gargle::AuthState, invisibly. -
bq_oauth_client()
: the current user-configured OAuth client.
See Also
Other auth functions:
bq_auth()
,
bq_deauth()
Examples
# see and store the current user-configured OAuth client (probably `NULL`)
(original_client <- bq_oauth_client())
# the preferred way to configure your own client is via a JSON file
# downloaded from Google Developers Console
# this example JSON is indicative, but fake
path_to_json <- system.file(
"extdata", "data", "client_secret_123.googleusercontent.com.json",
package = "bigrquery"
)
bq_auth_configure(path = path_to_json)
# confirm the changes
bq_oauth_client()
# restore original auth config
bq_auth_configure(client = original_client)
Clear current token
Description
Clears any currently stored token. The next time bigrquery needs a token, the
token acquisition process starts over, with a fresh call to bq_auth()
and,
therefore, internally, a call to gargle::token_fetch()
. Unlike some other
packages that use gargle, bigrquery is not usable in a de-authorized state.
Therefore, calling bq_deauth()
only clears the token, i.e. it does NOT
imply that subsequent requests are made with an API key in lieu of a token.
Usage
bq_deauth()
See Also
Other auth functions:
bq_auth()
,
bq_auth_configure()
Examples
## Not run:
bq_deauth()
## End(Not run)
BigQuery field (and fields) class
Description
bq_field()
and bq_fields()
create; as_bq_field()
and as_bq_fields()
coerce from lists.
Usage
bq_field(name, type, mode = "NULLABLE", fields = list(), description = NULL)
bq_fields(x)
as_bq_field(x)
as_bq_fields(x)
Arguments
name |
The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 300 characters. |
type |
The field data type. Possible values include:
|
mode |
The field mode. Possible values include: |
fields |
For a field of type "record", a list of sub-fields. |
description |
The field description. The maximum length is 1,024 characters. |
x |
A list of |
See Also
bq_field()
corresponds to a TableFieldSchema
, see
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TableFieldSchema
for more details.
Examples
bq_field("name", "string")
as_bq_fields(list(
list(name = "name", type = "string"),
bq_field("age", "integer")
))
# as_bq_fields() can also take a data frame
as_bq_fields(mtcars)
Is there a token on hand?
Description
Reports whether bigrquery has stored a token, ready for use in downstream requests.
Usage
bq_has_token()
Value
Logical.
See Also
Other low-level API functions:
bq_token()
Examples
bq_has_token()
Get currently configured OAuth app (deprecated)
Description
In light of the new gargle::gargle_oauth_client()
constructor and class of
the same name, bq_oauth_app()
is being replaced by
bq_oauth_client()
.
Usage
bq_oauth_app()
Explicitly define query parameters
Description
By default, bigrquery will assume vectors of length 1 are scalars,
and longer vectors are arrays. If you need to pass a length-1 array,
you'll need to explicitly use bq_param_array()
.
Usage
bq_param(value, type = NULL, name = NULL)
bq_param_scalar(value, type = NULL, name = NULL)
bq_param_array(value, type = NULL, name = NULL)
Arguments
value |
vector of parameter values |
type |
BigQuery type of the parameter |
name |
name of the parameter in the query, omitting the |
Examples
# bq_param() automatically picks scalar vs array based on length
bq_param("a")
bq_param(c("a", "b", "c"))
# use bq_param_array() to create a length-1 array
bq_param_array("a")
List available projects
Description
List all projects that you have access to. You can also work with
public datasets,
but you will need to provide a billing
project whenever you perform
any non-free operation.
Usage
bq_projects(page_size = 100, max_pages = 1, warn = TRUE)
Arguments
page_size |
Number of items per page. |
max_pages |
Maximum number of pages to retrieve. Use |
warn |
If |
Value
A character vector.
Google BigQuery API documentation
Examples
bq_projects()
Submit query to BigQuery
Description
These submit a query (using bq_perform_query()
) and then wait for it
complete (with bq_job_wait()
). All BigQuery queries save their results
into a table (temporary or otherwise), so these functions return a bq_table
which you can then query for more information.
Usage
bq_project_query(x, query, destination_table = NULL, ..., quiet = NA)
bq_dataset_query(
x,
query,
destination_table = NULL,
...,
billing = NULL,
quiet = NA
)
Arguments
x |
Either a project (a string) or a bq_dataset. |
query |
SQL query string. |
destination_table |
A bq_table where results should be stored. If not supplied, results will be saved to a temporary table that lives in a special dataset. You must supply this parameter for large queries (> 128 MB compressed). |
... |
Passed on to |
quiet |
If |
billing |
If you query a dataset that you only have read access
for, such as a public dataset, you must also submit a |
Value
A bq_table
Examples
# Querying a project requires full name in query
tb <- bq_project_query(
bq_test_project(),
"SELECT count(*) FROM publicdata.samples.natality"
)
bq_table_fields(tb)
bq_table_download(tb)
# Querying a dataset sets default dataset so you can use bare table name,
# but for public data, you'll need to set a project to bill.
ds <- bq_dataset("publicdata", "samples")
tb <- bq_dataset_query(ds,
query = "SELECT count(*) FROM natality",
billing = bq_test_project()
)
bq_table_download(tb)
tb <- bq_dataset_query(ds,
query = "SELECT count(*) FROM natality WHERE state = @state",
parameters = list(state = "KS"),
billing = bq_test_project()
)
bq_table_download(tb)
S3 classes for BigQuery datasets, tables and jobs
Description
Create references to BigQuery datasets, jobs, and tables. Each class
has a constructor function (bq_dataset()
, bq_table()
, bq_job()
)
and a coercion function (as_bq_dataset()
, as_bq_table()
, as_bq_job()
).
The coercions functions come with methods for strings (which find components
by splitting on .
), and lists (which look for named components like
projectId
or project_id
).
All bq_table_
, bq_dataset_
and bq_job_
functions call the appropriate
coercion functions on their first argument, allowing you to flexible specify
their inputs.
Usage
bq_dataset(project, dataset)
as_bq_dataset(x, ..., error_arg = caller_arg(x), error_call = caller_env())
bq_table(project, dataset, table = NULL, type = "TABLE")
as_bq_table(x, ..., error_arg = caller_arg(x), error_call = caller_env())
bq_job(project, job, location = "US")
as_bq_job(x, ..., error_arg = caller_arg(x), error_call = caller_env())
Arguments
project , dataset , table , job , type |
Individual project, dataset, table, job identifiers and table type (strings). For |
x |
An object to coerce to a |
... |
Other arguments passed on to methods. |
error_arg |
An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem. |
error_call |
The execution environment of a currently
running function, e.g. |
location |
Job location |
See Also
api-job, api-perform, api-dataset, and api-table for functions that work with these objects.
Examples
# Creation ------------------------------------------------
samples <- bq_dataset("publicdata", "samples")
natality <- bq_table("publicdata", "samples", "natality")
natality
# Or
bq_table(samples, "natality")
bq_job("bigrquery-examples", "m0SgFu2ycbbge6jgcvzvflBJ_Wft")
# Coercion ------------------------------------------------
as_bq_dataset("publicdata.shakespeare")
as_bq_table("publicdata.samples.natality")
as_bq_table(list(
project_id = "publicdata",
dataset_id = "samples",
table_id = "natality"
))
as_bq_job(list(
projectId = "bigrquery-examples",
jobId = "job_m0SgFu2ycbbge6jgcvzvflBJ_Wft",
location = "US"
))
Download table data
Description
This retrieves rows in chunks of page_size
. It is most suitable for results
of smaller queries (<100 MB, say). For larger queries, it is better to
export the results to a CSV file stored on google cloud and use the
bq command line tool to download locally.
Usage
bq_table_download(
x,
n_max = Inf,
page_size = NULL,
start_index = 0L,
max_connections = 6L,
quiet = NA,
bigint = c("integer", "integer64", "numeric", "character"),
max_results = deprecated()
)
Arguments
x |
A bq_table |
n_max |
Maximum number of results to retrieve. Use |
page_size |
The number of rows requested per chunk. It is recommended to
leave this unspecified until you have evidence that the When |
start_index |
Starting row index (zero-based). |
max_connections |
Number of maximum simultaneous connections to BigQuery servers. |
quiet |
If |
bigint |
The R type that BigQuery's 64-bit integer types should be
mapped to. The default is |
max_results |
Value
Because data retrieval may generate list-columns and the data.frame
print method can have problems with list-columns, this method returns
a tibble. If you need a data.frame
, coerce the results with
as.data.frame()
.
Complex data
bigrquery will retrieve nested and repeated columns in to list-columns as follows:
Repeated values (arrays) will become a list-column of vectors.
Records will become list-columns of named lists.
Repeated records will become list-columns of data frames.
Larger datasets
In my timings, this code takes around 1 minute per 100 MB of data. If you need to download considerably more than this, I recommend:
Export a
.csv
file to Cloud Storage usingbq_table_save()
.Use the
gsutil
command line utility to download it.Read the csv file into R with
readr::read_csv()
ordata.table::fread()
.
Unfortunately you can not export nested or repeated formats into CSV, and the formats that BigQuery supports (arvn and ndjson) that allow for nested/repeated values, are not well supported in R.
Google BigQuery API documentation
Examples
df <- bq_table_download("publicdata.samples.natality", n_max = 35000)
Project to use for testing bigrquery
Description
You'll need to set the BIGQUERY_TEST_PROJECT
(name of a project) and
BIGQUERY_TEST_BUCKET
(name of bucket) env vars in order to run bigrquery
tests locally. I recommend creating a new project because the tests involve
both reading and writing in BigQuery and Cloud Storage.
The BIGQUERY_TEST_PROJECT
must have billing enabled for the project. While
logged in, via bq_auth()
, as a user with permission to work in
BIGQUERY_TEST_PROJECT
, run bq_test_init()
once to perform some setup.
Usage
bq_test_project()
bq_test_init(name = "basedata")
bq_test_dataset(name = random_name(), location = "US")
bq_testable()
bq_authable()
gs_test_bucket()
gs_test_object(name = random_name())
Arguments
name |
Dataset name - used only for testing. |
Value
bq_test_project()
returns the name of a project suitable for use in
testing. bq_test_dataset()
creates a temporary dataset whose lifetime is
tied to the lifetime of the object that it returns.
Testing
In tests, bq_test_project()
(and hence bq_test_dataset()
) will
automatically skip if auth and a test project are not available.
Examples
ds <- bq_test_dataset()
bq_mtcars <- bq_table_upload(bq_table(ds, "mtcars"), mtcars)
# dataset and table will be automatically deleted when ds is GC'd
Produce configured token
Description
For internal use or for those programming around the BigQuery API.
Returns a token pre-processed with httr::config()
. Most users
do not need to handle tokens "by hand" or, even if they need some
control, bq_auth()
is what they need. If there is no current
token, bq_auth()
is called to either load from cache or
initiate OAuth2.0 flow.
If auth has been deactivated via bq_deauth()
, bq_token()
returns NULL
.
Usage
bq_token()
Value
A request
object (an S3 class provided by httr).
See Also
Other low-level API functions:
bq_has_token()
Examples
## Not run:
bq_token()
## End(Not run)
Get info on current user
Description
Reveals the email address of the user associated with the current token. If no token has been loaded yet, this function does not initiate auth.
Usage
bq_user()
Value
An email address or, if no token has been loaded, NULL
.
See Also
gargle::token_userinfo()
, gargle::token_email()
,
gargle::token_tokeninfo()
Examples
## Not run:
bq_user()
## End(Not run)
A BigQuery data source for dplyr.
Description
Create the connection to the database with DBI::dbConnect()
then
use dplyr::tbl()
to connect to tables within that database. Generally,
it's best to provide the fully qualified name of the table (i.e.
project.dataset.table
) but if you supply a default dataset
in the
connection, you can use just the table name. (This, however, will
prevent you from making joins across datasets.)
Usage
src_bigquery(project, dataset, billing = project, max_pages = 10)
Arguments
project |
project id or name |
dataset |
dataset name |
billing |
billing project, if different to |
max_pages |
(IGNORED) maximum pages returned by a query |
Examples
## Not run:
library(dplyr)
# To run this example, replace billing with the id of one of your projects
# set up for billing
con <- DBI::dbConnect(bigquery(), project = bq_test_project())
shakespeare <- con %>% tbl("publicdata.samples.shakespeare")
shakespeare
shakespeare %>%
group_by(word) %>%
summarise(n = sum(word_count, na.rm = TRUE)) %>%
arrange(desc(n))
## End(Not run)