Type: | Package |
Title: | Interface to the 'Azure Machine Learning' 'SDK' |
Version: | 1.10.0 |
URL: | https://github.com/azure/azureml-sdk-for-r |
BugReports: | https://github.com/azure/azureml-sdk-for-r/issues |
Description: | Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml. |
Encoding: | UTF-8 |
License: | MIT + file LICENSE |
RoxygenNote: | 7.1.1 |
Depends: | R (≥ 3.5.0) |
Imports: | ggplot2, reticulate (≥ 1.12), plyr (≥ 1.8), DT, rstudioapi (≥ 0.7), htmltools, servr, shiny, shinycssloaders |
Suggests: | rmarkdown, knitr, testthat, dplyr, jsonlite, foreach, iterators, utils |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2020-09-16 21:35:24 UTC; dipeck |
Author: | Diondra Peck [cre, aut], Minna Xiao [aut], AzureML R SDK Team [ctb], Microsoft [cph, fnd], Google Inc. [cph] (Examples and Tutorials), The TensorFlow Authors [cph] (Examples and Tutorials), RStudio Inc. [cph] (Examples and Tutorials) |
Maintainer: | Diondra Peck <Diondra.Peck@microsoft.com> |
Repository: | CRAN |
Date/Publication: | 2020-09-22 15:40:07 UTC |
Create a deployment config for deploying an ACI web service
Description
Deploy a web service to Azure Container Instances for testing or debugging. Use ACI for low-scale CPU-based workloads that require less than 48 GB of RAM.
Deploy to ACI if one of the following conditions is true:
You need to quickly deploy and validate your model. You do not need to create ACI containers ahead of time. They are created as part of the deployment process.
You are testing a model that is under development.
Usage
aci_webservice_deployment_config(
cpu_cores = NULL,
memory_gb = NULL,
tags = NULL,
properties = NULL,
description = NULL,
location = NULL,
auth_enabled = NULL,
ssl_enabled = NULL,
enable_app_insights = NULL,
ssl_cert_pem_file = NULL,
ssl_key_pem_file = NULL,
ssl_cname = NULL,
dns_name_label = NULL
)
Arguments
cpu_cores |
The number of cpu cores to allocate for
the web service. Can be a decimal. Defaults to |
memory_gb |
The amount of memory (in GB) to allocate for
the web service. Can be a decimal. Defaults to |
tags |
A named list of key-value tags for the web service,
e.g. |
properties |
A named list of key-value properties for the web
service, e.g. |
description |
A string of the description to give the web service. |
location |
A string of the Azure region to deploy the web service to. If not specified the workspace location will be used. More details on available regions can be found here. |
auth_enabled |
If |
ssl_enabled |
If |
enable_app_insights |
If |
ssl_cert_pem_file |
A string of the cert file needed if SSL is enabled. |
ssl_key_pem_file |
A string of the key file needed if SSL is enabled. |
ssl_cname |
A string of the cname if SSL is enabled. |
dns_name_label |
A string of the dns name label for the scoring endpoint. If not specified a unique dns name label will be generated for the scoring endpoint. |
Value
The AciServiceDeploymentConfiguration
object.
See Also
deploy_model()
Examples
## Not run:
deployment_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
## End(Not run)
Create a deployment config for deploying an AKS web service
Description
Deploy a web service to Azure Kubernetes Service for high-scale prodution deployments. Provides fast response time and autoscaling of the deployed service. Using GPU for inference when deployed as a web service is only supported on AKS.
Deploy to AKS if you need one or more of the following capabilities:
Fast response time
Autoscaling of the deployed service
Hardware acceleration options
Usage
aks_webservice_deployment_config(
autoscale_enabled = NULL,
autoscale_min_replicas = NULL,
autoscale_max_replicas = NULL,
autoscale_refresh_seconds = NULL,
autoscale_target_utilization = NULL,
auth_enabled = NULL,
cpu_cores = NULL,
memory_gb = NULL,
enable_app_insights = NULL,
scoring_timeout_ms = NULL,
replica_max_concurrent_requests = NULL,
max_request_wait_time = NULL,
num_replicas = NULL,
primary_key = NULL,
secondary_key = NULL,
tags = NULL,
properties = NULL,
description = NULL,
gpu_cores = NULL,
period_seconds = NULL,
initial_delay_seconds = NULL,
timeout_seconds = NULL,
success_threshold = NULL,
failure_threshold = NULL,
namespace = NULL,
token_auth_enabled = NULL
)
Arguments
autoscale_enabled |
If |
autoscale_min_replicas |
An int of the minimum number of containers
to use when autoscaling the web service. Defaults to |
autoscale_max_replicas |
An int of the maximum number of containers
to use when autoscaling the web service. Defaults to |
autoscale_refresh_seconds |
An int of how often in seconds the
autoscaler should attempt to scale the web service. Defaults to |
autoscale_target_utilization |
An int of the target utilization
(in percent out of 100) the autoscaler should attempt to maintain for
the web service. Defaults to |
auth_enabled |
If |
cpu_cores |
The number of cpu cores to allocate for
the web service. Can be a decimal. Defaults to |
memory_gb |
The amount of memory (in GB) to allocate for
the web service. Can be a decimal. Defaults to |
enable_app_insights |
If |
scoring_timeout_ms |
An int of the timeout (in milliseconds) to
enforce for scoring calls to the web service. Defaults to |
replica_max_concurrent_requests |
An int of the number of maximum
concurrent requests per node to allow for the web service. Defaults to |
max_request_wait_time |
An int of the maximum amount of time a request
will stay in the queue (in milliseconds) before returning a 503 error.
Defaults to |
num_replicas |
An int of the number of containers to allocate for the web service. If this parameter is not set then the autoscaler is enabled by default. |
primary_key |
A string of the primary auth key to use for the web service. |
secondary_key |
A string of the secondary auth key to use for the web service. |
tags |
A named list of key-value tags for the web service,
e.g. |
properties |
A named list of key-value properties for the web
service, e.g. |
description |
A string of the description to give the web service |
gpu_cores |
An int of the number of gpu cores to allocate for the
web service. Defaults to |
period_seconds |
An int of how often in seconds to perform the
liveness probe. Default to |
initial_delay_seconds |
An int of the number of seconds after
the container has started before liveness probes are initiated.
Defaults to |
timeout_seconds |
An int of the number of seconds after which the
liveness probe times out. Defaults to |
success_threshold |
An int of the minimum consecutive successes
for the liveness probe to be considered successful after having failed.
Defaults to |
failure_threshold |
An int of the number of times Kubernetes will try
the liveness probe when a Pod starts and the probe fails, before giving up.
Defaults to |
namespace |
A string of the Kubernetes namespace in which to deploy the web service: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first last characters cannot be hyphens. |
token_auth_enabled |
If |
Details
AKS compute target
When deploying to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace:
Create the AKS cluster using Azure ML (see
create_aks_compute()
).Attach an existing AKS cluster to your workspace (see
attach_aks_compute()
).
Pass the AksCompute
object to the deployment_target
parameter of deploy_model()
.
Token-based authentication
We strongly recommend that you create your Azure ML workspace in the same region as your AKS cluster. To authenticate with a token, the web service will make a call to the region in which your workspace is created. If your workspace's region is unavailable, then you will not be able to fetch a token for your web service, even if your cluster is in a different region than your workspace. This effectively results in token-based auth being unavailable until your workspace's region is available again. In addition, the greater the distance between your cluster's region and your workspace's region, the longer it will take to fetch a token.
Value
The AksServiceDeploymentConfiguration
object.
See Also
deploy_model()
Examples
## Not run:
deployment_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
## End(Not run)
Attach an existing AKS cluster to a workspace
Description
If you already have an AKS cluster in your Azure subscription, and it is version 1.12.##, you can attach it to your workspace to use for deployments. The existing AKS cluster can be in a different Azure region than your workspace.
If you want to secure your AKS cluster using an Azure Virtual Network, you must create the virtual network first. For more information, see Secure Azure ML experimentation and inference jobs within an Azure Virtual Network
If you want to re-attach an AKS cluster, for example to to change SSL or other
cluster configuration settings, you must first remove the existing attachment
with detach_aks_compute()
.
Attaching a cluster will take approximately 5 minutes.
Usage
attach_aks_compute(
workspace,
resource_group,
cluster_name,
cluster_purpose = c("FastProd", "DevTest")
)
Arguments
workspace |
The |
resource_group |
A string of the resource group in which the AKS cluster is located. |
cluster_name |
A string of the name of the AKS cluster. |
cluster_purpose |
The targeted usage of the cluster. The possible values are "DevTest" or "FastProd". This is used to provision Azure Machine Learning components to ensure the desired level of fault-tolerance and QoS. If your cluster has less than 12 virtual CPUs, you will need to specify "DevTest" for this argument. We recommend that your cluster have at least 2 virtual CPUs for dev/test usage. |
Value
The AksCompute
object.
Examples
ws <- load_workspace_from_config() compute_target <- attach_aks_compute(ws, resource_group = 'myresourcegroup', cluster_name = 'myakscluster')
If the cluster has less than 12 virtual CPUs, you will need to also specify the
cluster_purpose
parameter in the attach_aks_compute()
call: cluster_purpose = 'DevTest'
.
See Also
detach_aks_compute()
azureml module User can access functions/modules in azureml that are not exposed through the exported R functions.
Description
azureml module User can access functions/modules in azureml that are not exposed through the exported R functions.
Usage
azureml
Format
An object of class python.builtin.module
(inherits from python.builtin.object
) of length 5.
Define a Bandit policy for early termination of HyperDrive runs
Description
Bandit is an early termination policy based on slack factor/slack amount and evaluation interval. The policy early terminates any runs where the primary metric is not within the specified slack factor/slack amount with respect to the best performing training run.
Usage
bandit_policy(
slack_factor = NULL,
slack_amount = NULL,
evaluation_interval = 1L,
delay_evaluation = 0L
)
Arguments
slack_factor |
A double of the ratio of the allowed distance from the best performing run. |
slack_amount |
A double of the absolute distance allowed from the best performing run. |
evaluation_interval |
An integer of the frequency for applying policy. |
delay_evaluation |
An integer of the number of intervals for which to delay the first evaluation. |
Value
The BanditPolicy
object.
Details
The Bandit policy takes the following configuration parameters:
-
slack_factor
orslack_amount
: The slack allowed with respect to the best performing training run.slack_factor
specifies the allowable slack as a ration.slack_amount
specifies the allowable slack as an absolute amount, instead of a ratio. -
evaluation_interval
: Optional. The frequency for applying the policy. Each time the training script logs the primary metric counts as one interval. -
delay_evaluation
: Optional. The number of intervals to delay the policy evaluation. Use this parameter to avoid premature termination of training runs. If specified, the policy applies every multiple ofevaluation_interval
that is greater than or equal todelay_evaluation
.
Any run that doesn't fall within the slack factor or slack amount of the evaluation metric with respect to the best performing run will be terminated.
Consider a Bandit policy with slack_factor = 0.2
and
evaluation_interval = 100
. Assume that run X is the currently best
performing run with an AUC (performance metric) of 0.8 after 100 intervals.
Further, assume the best AUC reported for a run is Y. This policy compares
the value (Y + Y * 0.2)
to 0.8, and if smaller, cancels the run.
If delay_evaluation = 200
, then the first time the policy will be applied
is at interval 200.
Now, consider a Bandit policy with slack_amount = 0.2
and
evaluation_interval = 100
. If run 3 is the currently best performing run
with an AUC (performance metric) of 0.8 after 100 intervals, then any run
with an AUC less than 0.6 (0.8 - 0.2
) after 100 iterations will be
terminated. Similarly, the delay_evaluation
can also be used to delay the
first termination policy evaluation for a specific number of sequences.
Examples
# In this example, the early termination policy is applied at every interval
# when metrics are reported, starting at evaluation interval 5. Any run whose
# best metric is less than (1 / (1 + 0.1)) or 91\% of the best performing run will
# be terminated
## Not run:
early_termination_policy = bandit_policy(slack_factor = 0.1,
evaluation_interval = 1L,
delay_evaluation = 5L)
## End(Not run)
Define Bayesian sampling over a hyperparameter search space
Description
Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent choices on the hyperparameter values to sample next. It picks the sample based on how the previous samples performed, such that the new sample improves the reported primary metric.
Usage
bayesian_parameter_sampling(parameter_space)
Arguments
parameter_space |
A named list containing each parameter and its
distribution, e.g. |
Value
The BayesianParameterSampling
object.
Details
When you use Bayesian sampling, the number of concurrent runs has an impact on the effectiveness of the tuning process. Typically, a smaller number of concurrent runs can lead to better sampling convergence, since the smaller degree of parallelism increases the number of runs that benefit from previously completed runs.
Bayesian sampling only supports choice()
, uniform()
, and quniform()
distributions over the search space.
Bayesian sampling does not support any early termination policy. When
using Bayesian parameter sampling, early_termination_policy
must be
NULL
.
See Also
choice()
, uniform()
, quniform()
Examples
## Not run:
param_sampling <- bayesian_parameter_sampling(list("learning_rate" = uniform(0.05, 0.1),
"batch_size" = choice(c(16, 32, 64, 128))))
## End(Not run)
Cancel a run
Description
Cancel an ongoing run.
Usage
cancel_run(run)
Arguments
run |
The |
Value
TRUE
if cancellation was successful, else FALSE
.
Specify a discrete set of options to sample from
Description
Specify a discrete set of options to sample the hyperparameters from.
Usage
choice(options)
Arguments
options |
A vector of values to choose from. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Mark a run as completed.
Description
Mark the run as completed. Use for an interactive logging run.
Usage
complete_run(run)
Arguments
run |
The |
Value
None
See Also
Specify Azure Container Registry details
Description
Returns a ContainerRegistry
object with the details for an
Azure Container Registry (ACR). This is needed when a custom
Docker image used for training or deployment is located in
a private image registry. Provide a ContainerRegistry
object
to the image_registry_details
parameter of either r_environment()
or estimator()
.
Usage
container_registry(address = NULL, username = NULL, password = NULL)
Arguments
address |
A string of the DNS name or IP address of the Azure Container Registry (ACR). |
username |
A string of the username for ACR. |
password |
A string of the password for ACR. |
Value
The ContainerRegistry
object.
See Also
r_environment()
, estimator()
Convert the current dataset into a FileDataset containing CSV files.
Description
Convert the current dataset into a FileDataset containing CSV files.
Usage
convert_to_dataset_with_csv_files(dataset, separator = ",")
Arguments
dataset |
The Tabular Dataset object. |
separator |
The separator to use to separate values in the resulting file. |
Value
A new FileDataset object with a set of CSV files containing the data in this dataset.
Convert the current dataset into a FileDataset containing Parquet files.
Description
Convert the current dataset into a FileDataset containing Parquet files. The resulting dataset will contain one or more Parquet files, each corresponding to a partition of data from the current dataset. These files are not materialized until they are downloaded or read from.
Usage
convert_to_dataset_with_parquet_files(dataset)
Arguments
dataset |
The Tabular Dataset object. |
Value
A new FileDataset object with a set of Parquet files containing the data in this dataset.
Specifies a CRAN package to install in environment
Description
Specifies a CRAN package to install in run environment
Usage
cran_package(name, version = NULL, repo = "https://cloud.r-project.org")
Arguments
name |
The package name |
version |
A string of the package version. If not provided, version will default to latest |
repo |
The base URL of the repository to use, e.g., the URL of a CRAN mirror. If not provided, the package will be pulled from "https://cloud.r-project.org". |
Value
A named list containing the package specifications
Examples
pkg1 <- cran_package("ggplot2", version = "3.3.0") pkg2 <- cran_package("stringr") pkg3 <- cran_package("ggplot2", version = "0.9.1", repo = "http://cran.us.r-project.org") env <- r_environment(name = "r_env", cran_packages = list(pkg1, pkg2, pkg3))
See Also
Create an AksCompute cluster
Description
Provision an Azure Kubernetes Service instance (AksCompute) as a compute target for web service deployment. AksCompute is recommended for high-scale production deployments and provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure ML R SDK. To change the nodes in the AksCompute cluster, use the UI for the cluster in the Azure portal. Once created, the cluster can be reused for multiple deployments.
Usage
create_aks_compute(
workspace,
cluster_name,
agent_count = NULL,
vm_size = NULL,
ssl_cname = NULL,
ssl_cert_pem_file = NULL,
ssl_key_pem_file = NULL,
location = NULL,
vnet_resourcegroup_name = NULL,
vnet_name = NULL,
subnet_name = NULL,
service_cidr = NULL,
dns_service_ip = NULL,
docker_bridge_cidr = NULL,
cluster_purpose = c("FastProd", "DevTest")
)
Arguments
workspace |
The |
cluster_name |
A string of the name of the cluster. |
agent_count |
An integer of the number of agents (VMs) to host
containers. Defaults to |
vm_size |
A string of the size of agent VMs. More details can be found
here.
Note that not all sizes are available in all regions, as detailed in the
aformentioned link. Defaults to |
ssl_cname |
A string of a CName to use if enabling SSL validation on the cluster. Must provide all three - CName, cert file, and key file - to enable SSL validation. |
ssl_cert_pem_file |
A string of a file path to a file containing cert information for SSL validation. Must provide all three - CName, cert file, and key file - to enable SSL validation. |
ssl_key_pem_file |
A string of a file path to a file containing key information for SSL validation. Must provide all three - CName, cert file, and key file - to enable SSL validation. |
location |
A string of the location to provision the cluster in. If not specified, defaults to the workspace location. Available regions for this compute can be found here: "https://azure.microsoft.com/global-infrastructure/services/?regions=all&products=kubernetes-service". |
vnet_resourcegroup_name |
A string of the name of the resource group where the virtual network is located. |
vnet_name |
A string of the name of the virtual network. |
subnet_name |
A string of the name of the subnet inside the vnet. |
service_cidr |
A string of a CIDR notation IP range from which to assign service cluster IPs. |
dns_service_ip |
A string of the container's DNS server IP address. |
docker_bridge_cidr |
A string of a CIDR notation IP for Docker bridge. |
cluster_purpose |
A string describing targeted usage of the cluster. This is used to provision Azure Machine Learning components to ensure the desired level of fault-tolerance and QoS. 'FastProd' will provision components to handle higher levels of traffic with production quality fault-tolerance. This will default the AKS cluster to have 3 nodes. 'DevTest' will provision components at a minimal level for testing. This will default the AKS cluster to have 1 node. 'FastProd'is the default value. |
Value
An AksCompute
object.
Details
For more information on using an AksCompute resource within a virtual network, see Secure Azure ML experimentation and inference jobs within an Azure Virtual Network.
Examples
# Create an AksCompute cluster using the default configuration (you can also # provide parameters to customize this) ws <- load_workspace_from_config() compute_target <- create_aks_compute(ws, cluster_name = 'mycluster') wait_for_provisioning_completion(compute_target)
Create an AmlCompute cluster
Description
Provision Azure Machine Learning Compute (AmlCompute) as a compute target
for training. AmlCompute is a managed-compute infrastructure that allows the
user to easily create a single or multi-node compute. To create a persistent
AmlCompute resource that can be reused across jobs, make sure to specify the
vm_size
and max_nodes
parameters. The compute can then be shared with
other users in the workspace and is kept between jobs. If min_nodes = 0
,
the compute autoscales down to zero nodes when it isn't used, and scales up
automatically when a job is submitted.
AmlCompute has default limits, such as the number of cores that can be allocated. For more information, see Manage and request quotas for Azure resources.
Usage
create_aml_compute(
workspace,
cluster_name,
vm_size,
vm_priority = "dedicated",
min_nodes = 0,
max_nodes = NULL,
idle_seconds_before_scaledown = NULL,
admin_username = NULL,
admin_user_password = NULL,
admin_user_ssh_key = NULL,
vnet_resourcegroup_name = NULL,
vnet_name = NULL,
subnet_name = NULL,
tags = NULL,
description = NULL
)
Arguments
workspace |
The |
cluster_name |
A string of the name of the cluster. |
vm_size |
A string of the size of agent VMs. More details can be found
here.
Note that not all sizes are available in all regions, as detailed in the
aformentioned link. Defaults to |
vm_priority |
A string of either |
min_nodes |
An integer of the minimum number of nodes to use on the
cluster. If not specified, will default to |
max_nodes |
An integer of the maximum number of nodes to use on the cluster. |
idle_seconds_before_scaledown |
An integer of the node idle time in
seconds before scaling down the cluster. Defaults to |
admin_username |
A string of the name of the administrator user account that can be used to SSH into nodes. |
admin_user_password |
A string of the password of the administrator user account. |
admin_user_ssh_key |
A string of the SSH public key of the administrator user account. |
vnet_resourcegroup_name |
A string of the name of the resource group where the virtual network is located. |
vnet_name |
A string of the name of the virtual network. |
subnet_name |
A string of the name of the subnet inside the vnet. |
tags |
A named list of tags for the cluster, e.g.
|
description |
A string of the description for the cluster. |
Value
The AmlCompute
object.
Details
For more information on using an Azure Machine Learning Compute resource in a virtual network, see Secure Azure ML experimentation and inference jobs within an Azure Virtual Network.
See Also
wait_for_provisioning_completion()
Examples
## Not run:
ws <- load_workspace_from_config()
compute_target <- create_aml_compute(ws,
cluster_name = 'mycluster',
vm_size = 'STANDARD_D2_V2',
max_nodes = 1)
wait_for_provisioning_completion(compute_target, show_output = TRUE)
## End(Not run)
Create a child run
Description
Create a child run. This is used to isolate part of a run into a subsection.
Usage
create_child_run(parent_run, name = NULL, run_id = NULL, outputs = NULL)
Arguments
parent_run |
The parent |
name |
An optional name for the child run, typically specified for a "part" |
run_id |
An optional run ID for the child, otherwise it is auto-generated. Typically this parameter is not set. |
outputs |
Optional outputs directory to track for the child. |
Value
The child run, a Run
object.
Create one or many child runs
Description
Create one or many child runs.
Usage
create_child_runs(parent_run, count = NULL, tag_key = NULL, tag_values = NULL)
Arguments
parent_run |
The parent |
count |
An optional number of children to create. |
tag_key |
An optional key to populate the Tags entry in all created children. |
tag_values |
An optional list of values that will map onto Tags for the list of runs created. |
Value
The list of child runs, Run
objects.
Create a FileDataset to represent file streams.
Description
Create a FileDataset to represent file streams.
Usage
create_file_dataset_from_files(path, validate = TRUE)
Arguments
path |
A data path in a registered datastore or a local path. |
validate |
Indicates whether to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
Value
The FileDataset object
See Also
Create an unregistered, in-memory Dataset from delimited files.
Description
Create an unregistered, in-memory Dataset from delimited files. Use this method to read delimited text files when you want to control the options used.
Usage
create_tabular_dataset_from_delimited_files(
path,
validate = TRUE,
include_path = FALSE,
infer_column_types = TRUE,
set_column_types = NULL,
separator = ",",
header = TRUE,
partition_format = NULL,
support_multi_line = FALSE,
empty_as_string = FALSE
)
Arguments
path |
A data path in a registered datastore, a local path, or an HTTP URL. |
validate |
Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
include_path |
Whether to include a column containing the path of the file from which the data was read. This is useful when you are reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path. |
infer_column_types |
Indicates whether column data types are inferred. |
set_column_types |
A named list to set column data type, where key is column name and value is data type. |
separator |
The separator used to split columns. |
header |
Controls how column headers are promoted when reading from files. Defaults to True for all
files having the same header. Files will read as having no header When header=False. More options can
be specified using |
partition_format |
Specify the partition format in path and create string columns from format 'x' and datetime column from format 'x:yyyy/MM/dd/HH/mm/ss', where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the postition of first partition key until the end of file path. For example, given a file path '../USA/2019/01/01/data.csv' and data is partitioned by country and time, we can define '/Country/PartitionDate:yyyy/MM/dd/data.csv' to create columns 'Country' of string type and 'PartitionDate' of datetime type. |
support_multi_line |
By default (support_multi_line=FALSE), all line breaks, including those in quoted field values, will be interpreted as a record break. Reading data this way is faster and more optimized for parallel execution on multiple CPU cores. However, it may result in silently producing more records with misaligned field values. This should be set to TRUE when the delimited files are known to contain quoted line breaks. |
empty_as_string |
Specify if empty field values should be loaded as empty strings. The default (FALSE) will read empty field values as nulls. Passing this as TRUE will read empty field values as empty strings. If the values are converted to numeric or datetime then this has no effect, as empty values will be converted to nulls. |
Value
The Tabular Dataset object.
See Also
Create a TabularDataset to represent tabular data in JSON Lines files (http://jsonlines.org/).
Description
Create a TabularDataset to represent tabular data in JSON Lines files (http://jsonlines.org/).
“from_json_lines_files“' creates a Tabular Dataset object , which defines the operations to
load data from JSON Lines files into tabular representation. For the data to be accessible
by Azure Machine Learning, the JSON Lines files specified by path
must be located in
a Datastore or behind public web urls. Column data types are read from data types saved
in the JSON Lines files. Providing 'set_column_types' will override the data type
for the specified columns in the returned Tabular Dataset.
Usage
create_tabular_dataset_from_json_lines_files(
path,
validate = TRUE,
include_path = FALSE,
set_column_types = NULL,
partition_format = NULL
)
Arguments
path |
The path to the source files, which can be single value or list of http url string or tuple of Datastore and relative path. |
validate |
Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
include_path |
Boolean to keep path information as column in the dataset. Defaults to False. This is useful when reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path. |
set_column_types |
A named list to set column data type, where key is column name and value is data type. |
partition_format |
Specify the partition format in path and create string columns from format 'x' and datetime column from format 'x:yyyy/MM/dd/HH/mm/ss', where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the postition of first partition key until the end of file path. For example, given a file path '../USA/2019/01/01/data.csv' and data is partitioned by country and time, we can define '/Country/PartitionDate:yyyy/MM/dd/data.csv' to create columns 'Country' of string type and 'PartitionDate' of datetime type. |
Value
The Tabular Dataset object.
See Also
Create an unregistered, in-memory Dataset from parquet files.
Description
Create an unregistered, in-memory Dataset from parquet files.
Usage
create_tabular_dataset_from_parquet_files(
path,
validate = TRUE,
include_path = FALSE,
set_column_types = NULL,
partition_format = NULL
)
Arguments
path |
A data path in a registered datastore or a local path. |
validate |
Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
include_path |
Whether to include a column containing the path of the file from which the data was read. This is useful when you are reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path. |
set_column_types |
A named list to set column data type, where key is column name and value is data type. |
partition_format |
Specify the partition format in path and create string columns from format 'x' and datetime column from format 'x:yyyy/MM/dd/HH/mm/ss', where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the postition of first partition key until the end of file path. For example, given a file path '../USA/2019/01/01/data.csv' and data is partitioned by country and time, we can define '/Country/PartitionDate:yyyy/MM/dd/data.csv' to create columns 'Country' of string type and 'PartitionDate' of datetime type. |
Value
The Tabular Dataset object.
See Also
Create a TabularDataset to represent tabular data in SQL databases.
Description
Create a TabularDataset to represent tabular data in SQL databases.
“from_sql_query“' creates a Tabular Dataset object , which defines the operations to
load data from SQL databases into tabular representation. For the data to be accessible
by Azure Machine Learning, the SQL database specified by query
must be located in
a Datastore and the datastore type must be of a SQL kind. Column data types are
read from data types in SQL query result. Providing 'set_column_types' will
override the data type for the specified columns in the returned Tabular Dataset.
Usage
create_tabular_dataset_from_sql_query(
query,
validate = TRUE,
set_column_types = NULL,
query_timeout = 30L
)
Arguments
query |
A SQL-kind datastore and a query |
validate |
Boolean to validate if data can be loaded from the returned dataset. Defaults to True. Validation requires that the data source is accessible from the current compute. |
set_column_types |
A named list to set column data type, where key is column name and value is data type. |
query_timeout |
Sets the wait time (as an int, in seconds) before terminating the attempt to execute a command and generating an error. The default is 30 seconds. |
Value
A TabularDataset
object
Examples
# create tabular dataset from a SQL database in datastore datastore <- get_datastore(ws, 'sql-db') query <- data_path(datastore, 'SELECT * FROM my_table') tab_ds <- create_tabular_dataset_from_sql_query(query, query_timeout = 10) # use `set_column_types` param to set column data types data_types <- list(ID = data_type_string(), Date = data_type_datetime('%d/%m/%Y %I:%M:%S %p'), Count = data_type_long(), Latitude = data_type_double(), Found = data_type_bool()) set_tab_ds <- create_tabular_dataset_from_sql_query(query, set_column_types = data_types)
See Also
data_path()
data_type_datetime()
data_type_bool()
data_type_double()
data_type_string()
data_type_long()
Create a new Azure Machine Learning workspace
Description
Create a new Azure Machine Learning workspace. Throws an exception if the workspace already exists or any of the workspace requirements are not satisfied. When you create new workspace, it automatically creates several Azure resources that are used in the workspace:
Azure Container Registry: Registers Docker containers that you use during training and when you deploy a model. To minimize costs, ACR is lazy-loaded until deployment images are created.
Azure Storage account: Used as the default datastore for the workspace.
Azure Application Insights: Stores monitoring information about your models.
Azure Key Vault: Stores secrets that are used by compute targets and other sensitive information that's needed by the workspace.
Usage
create_workspace(
name,
auth = NULL,
subscription_id = NULL,
resource_group = NULL,
location = NULL,
create_resource_group = TRUE,
friendly_name = NULL,
storage_account = NULL,
key_vault = NULL,
app_insights = NULL,
container_registry = NULL,
cmk_keyvault = NULL,
resource_cmk_uri = NULL,
hbi_workspace = FALSE,
exist_ok = FALSE,
show_output = TRUE,
sku = "basic"
)
Arguments
name |
A string of the new workspace name. Workspace name has to be between 2 and 32 characters of letters and numbers. |
auth |
The |
subscription_id |
A string of the subscription ID of the containing subscription for the new workspace. The parameter is required if the user has access to more than one subscription. |
resource_group |
A string of the Azure resource group that is containing the workspace. The parameter defaults to a mutation of the workspace name. |
location |
A string of the location of the workspace. The parameter defaults to the resource group location. The location has to be a supported region for Azure Machine Learning Services. |
create_resource_group |
If |
friendly_name |
A string of the friendly name for the workspace that can be displayed in the UI. |
storage_account |
A string of an existing storage account in the Azure
resource ID format. The storage will be used by the workspace to save run
outputs, code, logs etc. If |
key_vault |
A string of an existing key vault in the Azure resource ID
format. The key vault will be used by the workspace to store credentials
added to the workspace by the users. If |
app_insights |
A string of an existing Application Insights in the Azure
resource ID format. The Application Insights will be used by the workspace to
log webservices events. If |
container_registry |
A string of an existing container registry in the
Azure resource ID format. The container registry will be used by the
workspace to pull and push both experimentation and webservices images. If
|
cmk_keyvault |
A string representing the key vault containing the customer managed key in the Azure resource ID format: '/subscriptions//resourcegroups//providers/microsoft.keyvault/vaults/'. For example: '/subscriptions/d139f240-94e6-4175-87a7-954b9d27db16/resourcegroups/myresourcegroup/providers/microsoft.keyvault/vaults/mykeyvault'. |
resource_cmk_uri |
The key URI of the customer managed key to encrypt the data at rest. The URI format is: 'https://<keyvault-dns-name>/keys/<key-name>/<key-version>'. For example, 'https://mykeyvault.vault.azure.net/keys/mykey/bc5dce6d01df49w2na7ffb11a2ee008b'. Refer to https://docs.microsoft.com/azure-stack/user/azure-stack-key-vault-manage-portal for steps on how to create a key and get its URI. |
hbi_workspace |
Specifies whether the customer data is of High Business Impact(HBI), i.e., contains sensitive business information. The default value is FALSE. When set to TRUE, downstream services will selectively disable logging. |
exist_ok |
If |
show_output |
If |
sku |
A string indicating if the workspace will be "basic" or "enterprise" edition. |
Value
The Workspace
object.
Examples
This example requires only minimal specification, and all dependent resources as well as the resource group will be created automatically.
ws <- create_workspace(name = 'myworkspace', subscription_id = '<azure-subscription-id>', resource_group = 'myresourcegroup', location = 'eastus2')
This example shows how to reuse existing Azure resources by making use of all parameters utilizing the Azure resource ID format. The specific Azure resource IDs can be retrieved through the Azure Portal or SDK. This assumes that the resource group, storage account, key vault, App Insights and container registry already exist.
prefix = "subscriptions/<azure-subscription-id>/resourcegroups/myresourcegroup/providers/" ws <- create_workspace( name = 'myworkspace', subscription_id = '<azure-subscription-id>', resource_group = 'myresourcegroup', create_resource_group = FALSE, location = 'eastus2', friendly_name = 'My workspace', storage_account = paste0(prefix, 'microsoft.storage/storageaccounts/mystorageaccount'), key_vault = paste0(prefix, 'microsoft.keyvault/vaults/mykeyvault'), app_insights = paste0(prefix, 'microsoft.insights/components/myappinsights'), container_registry = paste0( prefix, 'microsoft.containerregistry/registries/mycontainerregistry'))
See Also
get_workspace()
service_principal_authentication()
interactive_login_authentication()
Represents a path to data in a datastore.
Description
The path represented by DataPath object can point to a directory or a data artifact (blob, file).
Usage
data_path(datastore, path_on_datastore = NULL, name = NULL)
Arguments
datastore |
The Datastore to reference. |
path_on_datastore |
The relative path in the backing storage for the data reference. |
name |
An optional name for the DataPath. |
Value
The DataPath
object.
Examples
my_data <- register_azure_blob_container_datastore( workspace = ws, datastore_name = blob_datastore_name, container_name = ws_blob_datastore$container_name, account_name = ws_blob_datastore$account_name, account_key = ws_blob_datastore$account_key, create_if_not_exists = TRUE) datapath <- data_path(my_data, <path_on_my_datastore>) dataset <- create_file_dataset_from_files(datapath)
See Also
create_file_dataset_from_files
create_tabular_dataset_from_parquet_files
create_tabular_dataset_from_delimited_files
create_tabular_dataset_from_json_lines_files
create_tabular_dataset_from_sql_query
Configure conversion to bool.
Description
Configure conversion to bool.
Usage
data_type_bool()
Value
Converted DataType object.
Configure conversion to datetime.
Description
Configure conversion to datetime.
Usage
data_type_datetime(formats = NULL)
Arguments
formats |
Formats to try for datetime conversion. For example
Format specifiers will be inferred if not specified. Inference requires that the data source is accessible from current compute. |
Value
Converted DataType object.
Configure conversion to 53-bit double.
Description
Configure conversion to 53-bit double.
Usage
data_type_double()
Value
Converted DataType object.
Configure conversion to 64-bit integer.
Description
Configure conversion to 64-bit integer.
Usage
data_type_long()
Value
Converted DataType object.
Configure conversion to string.
Description
Configure conversion to string.
Usage
data_type_string()
Value
Converted DataType object.
Represent how to deliver the dataset to a compute target.
Description
Represent how to deliver the dataset to a compute target.
Usage
dataset_consumption_config(
name,
dataset,
mode = "direct",
path_on_compute = NULL
)
Arguments
name |
The name of the dataset in the run, which can be different to the registered name. The name will be registered as environment variable and can be used in data plane. |
dataset |
The dataset that will be consumed in the run. |
mode |
Defines how the dataset should be delivered to the compute target. There are three modes: 'direct': consume the dataset as dataset. 'download': download the dataset and consume the dataset as downloaded path. 'mount': mount the dataset and consume the dataset as mount path. |
path_on_compute |
The target path on the compute to make the data available at. The folder structure of the source data will be kept, however, we might add prefixes to this folder structure to avoid collision. |
Value
The DatasetConsumptionConfig
object.
Examples
est <- estimator(source_directory = ".", entry_script = "train.R", inputs = list(dataset_consumption_config('mydataset', dataset, mode = 'download')), compute_target = compute_target)
See Also
Define timestamp columns for the dataset.
Description
Define timestamp columns for the dataset.
The method defines columns to be used as timestamps. Timestamp columns on a dataset
make it possible to treat the data as time-series data and enable additional capabilities.
When a dataset has both fine_grain_timestamp
and coarse_grain_timestamp defined
specified, the two columns should represent the same timeline.
Usage
define_timestamp_columns_for_dataset(
dataset,
fine_grain_timestamp,
coarse_grain_timestamp = NULL,
validate = FALSE
)
Arguments
dataset |
The Tabular Dataset object. |
fine_grain_timestamp |
The name of column as fine grain timestamp. Use None to clear it. |
coarse_grain_timestamp |
The name of column coarse grain timestamp (optional). The default is None. |
validate |
Indicates whether to validate if specified columns exist in dataset. The default is False. Validation requires that the data source is accessible from the current compute. |
Value
The Tabular Dataset with timestamp columns defined.
Delete a cluster
Description
Remove the compute object from its associated workspace and delete the corresponding cloud-based resource.
Usage
delete_compute(cluster)
Arguments
cluster |
The |
Value
None
Examples
## Not run:
ws <- load_workspace_from_config()
compute_target <- get_compute(ws, cluster_name = 'mycluster')
delete_compute(compute_target)
## End(Not run)
Delete a local web service from the local machine
Description
Delete a local web service from the local machine. This function call is not asynchronous; it runs until the service is deleted.
Usage
delete_local_webservice(webservice, delete_cache = TRUE, delete_image = FALSE)
Arguments
webservice |
The |
delete_cache |
If |
delete_image |
If |
Value
None
Delete a model from its associated workspace
Description
Delete the registered model from its associated workspace. Note that you cannot delete a registered model that is being used by an active web service deployment.
Usage
delete_model(model)
Arguments
model |
The |
Value
None
Delete secrets from a keyvault
Description
Delete secrets from the keyvault associated with the workspace for a specified set of secret names.
Usage
delete_secrets(keyvault, secrets)
Arguments
keyvault |
The |
secrets |
A vector of secret names. |
Value
None
Delete a web service from a given workspace
Description
Delete a deployed ACI or AKS web service from the given workspace. This function call is not asynchronous; it runs until the resource is deleted.
To delete a LocalWebservice
see delete_local_webservice()
.
Usage
delete_webservice(webservice)
Arguments
webservice |
The |
Value
None
Delete a workspace
Description
Delete the Azure Machine Learning workspace resource. delete_workspace()
can also delete the workspace's associated resources.
Usage
delete_workspace(
workspace,
delete_dependent_resources = FALSE,
no_wait = FALSE
)
Arguments
workspace |
The |
delete_dependent_resources |
If |
no_wait |
If |
Value
None
Deploy a web service from registered model(s)
Description
Deploy a web service from zero or more registered models. Types of web
services that can be deployed are LocalWebservice
, which will deploy
a model locally, and AciWebservice
and AksWebservice
, which will
deploy a model to Azure Container Instances (ACI) and Azure Kubernetes
Service (AKS), respectively.The type of web service deployed will be
determined by the deployment_config
specified. Returns a Webservice
object corresponding to the deployed web service.
Usage
deploy_model(
workspace,
name,
models,
inference_config,
deployment_config = NULL,
deployment_target = NULL
)
Arguments
workspace |
The |
name |
A string of the name to give the deployed service. Must be unique to the workspace, only consist of lowercase letters, numbers, or dashes, start with a letter, and be between 3 and 32 characters long. |
models |
A list of |
inference_config |
The |
deployment_config |
The deployment configuration of type
|
deployment_target |
The compute target to deploy the model to.
You will only need to specify this parameter if you are deploy to AKS,
in which case provide an |
Value
The LocalWebservice
, AciWebservice
, or AksWebservice
object.
Details
If you encounter any issue in deploying your web service, please visit this troubleshooting guide.
See Also
inference_config()
, aci_webservice_deployment_config()
,
aks_webservice_deployment_config()
, local_webservice_deployment_config()
Examples
## Not run:
ws <- load_workspace_from_config()
model <- get_model(ws, name = "my_model")
r_env <- r_environment(name = "r_env")
inference_config <- inference_config(entry_script = "score.R",
source_directory = ".",
environment = r_env)
deployment_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)
service <- deploy_model(ws,
name = "my_webservice",
models = list(model),
inference_config = inference_config,
deployment_config = deployment_config)
wait_for_deployment(service, show_output = TRUE)
## End(Not run)
Detach an AksCompute cluster from its associated workspace
Description
Detach the AksCompute cluster from its associated workspace. No underlying cloud resource will be deleted; the association will just be removed.
Usage
detach_aks_compute(cluster)
Arguments
cluster |
The |
Value
None
Download a file from a run
Description
Download a file from the run record. You can download any file that
was uploaded to the run record via upload_files_to_run()
or
upload_folder_to_run()
, or any file that was written out to
the ./outputs
or ./logs
folders during a run.
You can see what files are available to download from the run record
by calling get_run_file_names()
.
Usage
download_file_from_run(run, name, output_file_path = NULL)
Arguments
run |
The |
name |
A string of the name of the artifact to be downloaded. |
output_file_path |
A string of the local path where to download the artifact to. |
Value
None
See Also
Download files from a run
Description
Download files from the run record. You can download any files that
were uploaded to the run record via upload_files_to_run()
or
upload_folder_to_run()
, or any files that were written out to
the ./outputs
or ./logs
folders during a run.
Usage
download_files_from_run(
run,
prefix = NULL,
output_directory = NULL,
output_paths = NULL,
batch_size = 100L
)
Arguments
run |
The |
prefix |
A string of the the filepath prefix (folder name) from which to download all artifacts. If not specified, all the artifacts in the run record will be downloaded. |
output_directory |
(Optional) A string of the directory that all artifact paths use as a prefix. |
output_paths |
(Optional) A list of strings of the local filepaths where the artifacts will be downloaded to. |
batch_size |
An int of the number of files to download per batch. |
Value
None
See Also
Download data from a datastore to the local file system
Description
Download data from the datastore to the local file system.
Usage
download_from_datastore(
datastore,
target_path,
prefix = NULL,
overwrite = FALSE,
show_progress = TRUE
)
Arguments
datastore |
The |
target_path |
A string of the local directory to download the file to. |
prefix |
A string of the path to the folder in the blob container
or file store to download. If |
overwrite |
If |
show_progress |
If |
Value
An integer of the number of files successfully downloaded.
Download file streams defined by the dataset as local files.
Description
Download file streams defined by the dataset as local files. If target_path starts with a /, then it will be treated as an absolute path. If it doesn't start with a /, then it will be treated as a relative path relative to the current working directory.
Usage
download_from_file_dataset(dataset, target_path = NULL, overwrite = FALSE)
Arguments
dataset |
The Dataset object |
target_path |
The local directory to download the files to. If NULL, the data will be downloaded into a temporary directory. |
overwrite |
Indicates whether to overwirte existing files. The default
is FALSE. Existing files will be overwritten if |
Value
A list of file paths for each file downloaded.
Download a model to the local file system
Description
Download a registered model to the target_dir
of your local file
system.
Usage
download_model(model, target_dir = ".", exist_ok = FALSE)
Arguments
model |
The |
target_dir |
A string of the path to the directory on your local file system for where to download the model to. Defaults to ".". |
exist_ok |
If |
Value
A string of the path to the file or folder of the downloaded model.
Examples
## Not run:
ws <- load_workspace_from_config()
model <- get_model(ws, name = "my_model", version = 2)
download_model(model, target_dir = tempdir(), exist_ok = TRUE)
## End(Not run)
Drop the specified columns from the dataset.
Description
Drop the specified columns from the dataset. If a timeseries column is dropped, the corresponding capabilities will be dropped for the returned dataset as well.
Usage
drop_columns_from_dataset(dataset, columns)
Arguments
dataset |
The Tabular Dataset object. |
columns |
A name or a list of names for the columns to drop. |
Value
A new TabularDataset object with the specified columns dropped.
Create an estimator
Description
An Estimator wraps run configuration information for specifying details
of executing an R script. Running an Estimator experiment
(using submit_experiment()
) will return a ScriptRun
object and
execute your training script on the specified compute target.
To define the environment to use for training, you can either directly
provide the environment-related parameters (e.g. cran_packages
,
custom_docker_image
) to estimator()
, or you can provide an
Environment
object to the environment
parameter. For more information
on the predefined Docker images that are used for training if
custom_docker_image
is not specified, see the documentation
here.
Usage
estimator(
source_directory,
compute_target = NULL,
vm_size = NULL,
vm_priority = NULL,
entry_script = NULL,
script_params = NULL,
cran_packages = NULL,
github_packages = NULL,
custom_url_packages = NULL,
custom_docker_image = NULL,
image_registry_details = NULL,
use_gpu = FALSE,
environment_variables = NULL,
shm_size = NULL,
max_run_duration_seconds = NULL,
environment = NULL,
inputs = NULL
)
Arguments
source_directory |
A string of the local directory containing experiment configuration and code files needed for the training job. |
compute_target |
The |
vm_size |
A string of the VM size of the compute target that will be
created for the training job. The list of available VM sizes
are listed here.
Provide this parameter if you want to create AmlCompute as the compute target
at run time, instead of providing an existing cluster to the |
vm_priority |
A string of either |
entry_script |
A string representing the relative path to the file used to start training. |
script_params |
A named list of the command-line arguments to pass to
the training script specified in |
cran_packages |
A list of |
github_packages |
A list of |
custom_url_packages |
A character vector of packages to be installed from local directory or custom URL. |
custom_docker_image |
A string of the name of the Docker image from
which the image to use for training will be built. If not set, a predefined
image will be used as the base image. To use an image from a
private Docker repository, you will also have to specify the
|
image_registry_details |
A |
use_gpu |
Indicates whether the environment to run the experiment should
support GPUs. If |
environment_variables |
A named list of environment variables names and values. These environment variables are set on the process where the user script is being executed. |
shm_size |
A string for the size of the Docker container's shared
memory block. For more information, see
Docker run reference.
If not set, a default value of |
max_run_duration_seconds |
An integer of the maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value. |
environment |
The |
inputs |
A list of DataReference objects or DatasetConsumptionConfig objects to use as input. |
Value
The Estimator
object.
Examples
r_env <- r_environment(name = "r-env", cran_packages = list(cran_package("dplyr"), cran_package("ggplot2"))) est <- estimator(source_directory = ".", entry_script = "train.R", compute_target = compute_target, environment = r_env)
See Also
r_environment()
, container_registry()
, submit_experiment()
,
dataset_consumption_config()
, cran_package()
Create an Azure Machine Learning experiment
Description
An experiment is a grouping of many runs from a specified script.
Usage
experiment(workspace, name)
Arguments
workspace |
The |
name |
A string of the experiment name. The name must be between 3-36 characters, start with a letter or number, and can only contain letters, numbers, underscores, and dashes. |
Value
The Experiment
object.
See Also
submit_experiment()
Examples
## Not run:
ws <- load_workspace_from_config()
exp <- experiment(ws, name = 'myexperiment')
## End(Not run)
Filter Tabular Dataset with time stamp columns after a specified start time.
Description
Filter Tabular Dataset with time stamp columns after a specified start time.
Usage
filter_dataset_after_time(dataset, start_time, include_boundary = TRUE)
Arguments
dataset |
The Tabular Dataset object |
start_time |
The lower bound for filtering data. |
include_boundary |
Boolean indicating if the row associated with the
boundary time ( |
Value
The filtered Tabular Dataset
Filter Tabular Dataset with time stamp columns before a specified end time.
Description
Filter Tabular Dataset with time stamp columns before a specified end time.
Usage
filter_dataset_before_time(dataset, end_time, include_boundary = TRUE)
Arguments
dataset |
The Tabular Dataset object |
end_time |
The upper bound for filtering data. |
include_boundary |
Boolean indicating if the row associated with the
boundary time ( |
Value
The filtered Tabular Dataset
Filter Tabular Dataset between a specified start and end time.
Description
Filter Tabular Dataset between a specified start and end time.
Usage
filter_dataset_between_time(
dataset,
start_time,
end_time,
include_boundary = TRUE
)
Arguments
dataset |
The Tabular Dataset object |
start_time |
The lower bound for filtering data. |
end_time |
The upper bound for filtering data. |
include_boundary |
Boolean indicating if the row associated with the
boundary time ( |
Value
The filtered Tabular Dataset
Filter Tabular Dataset to contain only the specified duration (amount) of recent data.
Description
Filter Tabular Dataset to contain only the specified duration (amount) of recent data.
Usage
filter_dataset_from_recent_time(dataset, time_delta, include_boundary = TRUE)
Arguments
dataset |
The Tabular Dataset object |
time_delta |
The duration (amount) of recent data to retrieve. |
include_boundary |
Boolean indicating if the row associated with the
boundary time ( |
Value
The filtered Tabular Dataset
Generates the control script for the experiment.
Description
Generates the control script for the experiment.
Usage
generate_entry_script(source_directory)
Arguments
source_directory |
The directory which contains all the files needed for the experiment. |
Regenerate one of a web service's keys
Description
Regenerate either the primary or secondary authentication key for
an AciWebservice
or AksWebservice
.The web service must have
been deployed with key-based authentication enabled.
Not supported for LocalWebservice
deployments.
Usage
generate_new_webservice_key(webservice, key_type)
Arguments
webservice |
The |
key_type |
A string of which key to regenerate. Options are "Primary" or "Secondary". |
Value
None
Get the credentials for an AksCompute cluster
Description
Retrieve the credentials for an AksCompute cluster.
Usage
get_aks_compute_credentials(cluster)
Arguments
cluster |
The |
Value
Named list of the cluster details.
Return the best performing run amongst all completed runs
Description
Find and return the run that corresponds to the best performing run amongst all the completed runs.
The best performing run is identified solely based on the primary metric
parameter specified in the HyperDriveConfig
(primary_metric_name
).
The PrimaryMetricGoal
governs whether the minimum or maximum of the
primary metric is used. To do a more detailed analysis of all the
run metrics launched by this HyperDrive run, use get_child_run_metrics()
.
Only one of the runs is returned from get_best_run_by_primary_metric()
,
even if several of the runs launched by this HyperDrive run reached
the same best metric.
Usage
get_best_run_by_primary_metric(
hyperdrive_run,
include_failed = TRUE,
include_canceled = TRUE
)
Arguments
hyperdrive_run |
The |
include_failed |
If |
include_canceled |
If |
Value
The Run
object.
Get the hyperparameters for all child runs
Description
Return the hyperparameters for all the child runs of the HyperDrive run.
Usage
get_child_run_hyperparameters(hyperdrive_run)
Arguments
hyperdrive_run |
The |
Value
The named list of hyperparameters where element name
is the run_id, e.g. list("run_id" = hyperparameters)
.
Get the metrics from all child runs
Description
Return the metrics from all the child runs of the HyperDrive run.
Usage
get_child_run_metrics(hyperdrive_run)
Arguments
hyperdrive_run |
The |
Value
The named list of metrics where element name is
the run_id, e.g. list("run_id" = metrics)
.
Get all children for the current run selected by specified filters
Description
Get all children for the current run selected by specified filters.
Usage
get_child_runs(
parent_run,
recursive = FALSE,
tags = NULL,
properties = NULL,
type = NULL,
status = NULL
)
Arguments
parent_run |
The parent |
recursive |
Boolean indicating whether to recurse through all descendants. |
tags |
If specified, returns runs matching specified "tag" or list(tag = value). |
properties |
If specified, returns runs matching specified "property" or list(property = value). |
type |
If specified, returns runs matching this type. |
status |
If specified, returns runs with status specified "status". |
Value
A list of child runs, Run
objects.
Get the child runs sorted in descending order by best primary metric
Description
Return a list of child runs of the HyperDrive run sorted by their best
primary metric. The sorting is done according to the primary metric and
its goal: if it is maximize, then the child runs are returned in descending
order of their best primary metric. If reverse = TRUE
, the order is
reversed. Each child in the result has run id, hyperparameters, best primary
metric value, and status.
Child runs without the primary metric are discarded when
discard_no_metric = TRUE
. Otherwise, they are appended to the list behind
other child runs with the primary metric. Note that the reverse option has no
impact on them.
Usage
get_child_runs_sorted_by_primary_metric(
hyperdrive_run,
top = 0L,
reverse = FALSE,
discard_no_metric = FALSE
)
Arguments
hyperdrive_run |
The |
top |
An integer of the number of top child runs to be returned. If |
reverse |
If |
discard_no_metric |
If |
Value
The named list of child runs.
Get an existing compute cluster
Description
Returns an AmlCompute
or AksCompute
object for an existing compute
resource. If the compute target doesn't exist, the function will return
NULL
.
Usage
get_compute(workspace, cluster_name)
Arguments
workspace |
The |
cluster_name |
A string of the name of the cluster. |
Value
The AmlCompute
or AksCompute
object.
Examples
## Not run:
ws <- load_workspace_from_config()
compute_target <- get_compute(ws, cluster_name = 'mycluster')
## End(Not run)
Get the context object for a run
Description
This function is commonly used to retrieve the authenticated
run object inside of a script to be submitted for execution
via submit_experiment()
. Note that the logging functions
(log_*
methods, upload_files_to_run()
, upload_folder_to_run()
)
will by default log the specified metrics or files to the
run returned from get_current_run()
.
Usage
get_current_run(allow_offline = TRUE)
Arguments
allow_offline |
If |
Value
The Run
object.
Get Dataset by ID.
Description
Get a Dataset which is saved to the workspace using its ID.
Usage
get_dataset_by_id(workspace, id)
Arguments
workspace |
The existing AzureML workspace in which the Dataset is saved. |
id |
The ID of the dataset |
Value
The Dataset object
Get a registered Dataset from the workspace by its registration name.
Description
Get a registered Dataset from the workspace by its registration name.
Usage
get_dataset_by_name(workspace, name, version = "latest")
Arguments
workspace |
The existing AzureML workspace in which the Dataset was registered. |
name |
The registration name. |
version |
The registration version. Defaults to "latest". |
Value
The registered Dataset object.
Get an existing datastore
Description
Get the corresponding datastore object for an existing datastore by name from the given workspace.
Usage
get_datastore(workspace, datastore_name)
Arguments
workspace |
The |
datastore_name |
A string of the name of the datastore. |
Value
The azureml.data.azure_sql_database.AzureBlobDatastore
,
azureml.data.azure_sql_database.AzureFileDatastore
,
azureml.data.azure_sql_database.AzureSqlDatabaseDatastore
,
azureml.data.azure_data_lake_datastore.AzureDataLakeGen2Datastore
,
azureml.data.azure_postgre_sql_datastore.AzurePostgreSqlDatastore
, or
azureml.data.azure_sql_database.AzureSqlDatabaseDatastore
object.
Get the default datastore for a workspace
Description
Returns the default datastore associated with the workspace.
When you create a workspace, an Azure blob container and Azure file share
are registered to the workspace with the names workspaceblobstore
and
workspacefilestore
, respectively. They store the connection information
of the blob container and the file share that is provisioned in the storage
account attached to the workspace. The workspaceblobstore
is set as the
default datastore, and remains the default datastore unless you set a new
datastore as the default with set_default_datastore()
.
Usage
get_default_datastore(workspace)
Arguments
workspace |
The |
Value
The default Datastore
object.
Examples
Get the default datastore for the datastore:
ws <- load_workspace_from_config() ds <- get_default_datastore(ws)
If you have not changed the default datastore for the workspace, the following code will return the same datastore object as the above example:
ws <- load_workspace_from_config() ds <- get_datastore(ws, datastore_name = 'workspaceblobstore')
See Also
Get the default keyvault for a workspace
Description
Returns a Keyvault
object representing the default
Azure Key Vault
associated with the workspace.
Usage
get_default_keyvault(workspace)
Arguments
workspace |
The |
Value
The Keyvault
object.
See Also
set_secrets()
get_secrets()
list_secrets()
delete_secrets()
Get an existing environment
Description
Returns an Environment
object for an existing environment in
the workspace.
Usage
get_environment(workspace, name, version = NULL)
Arguments
workspace |
The |
name |
A string of the name of the environment. |
version |
A string of the version of the environment. |
Value
The Environment
object.
Examples
## Not run:
ws <- load_workspace_from_config()
env <- get_environment(ws, name = 'myenv', version = '1')
## End(Not run)
Get a list of file paths for each file stream defined by the dataset.
Description
Get a list of file paths for each file stream defined by the dataset. The file paths are relative paths for local files when the file srteam are downloaded or mounted. A common prefix will be removed from the file paths based on how data source was specified to create the dataset.
Usage
get_file_dataset_paths(dataset)
Arguments
dataset |
The Dataset object. |
Value
A list of file paths.
Return the named list for input datasets.
Description
Return the named list for input datasets.
Usage
get_input_dataset_from_run(name, run = NULL)
Arguments
name |
The name of the input dataset |
run |
The run taking the dataset as input |
Value
A dataset object corresponding to the "name"
Get a registered model
Description
Returns a Model
object for an existing model that has been
previously registered to the given workspace.
Usage
get_model(
workspace,
name = NULL,
id = NULL,
tags = NULL,
properties = NULL,
version = NULL,
run_id = NULL
)
Arguments
workspace |
The |
name |
Retrieve the latest model with the corresponding name (a string), if it exists. |
id |
Retrieve the model with the corresponding ID (a string), if it exists. |
tags |
(Optional) Retrieve the model filtered based on the provided tags (a list), searching by either 'key' or 'list(key, value)'. |
properties |
(Optional) Retrieve the model filter based on the provided properties (a list), searching by either 'key' or 'list(key, value)'. |
version |
(Optional) An int of the version of a model to
retrieve, when provided along with |
run_id |
(Optional) Retrieve the model filterd by the provided run ID (a string) the model was registered from, if it exists. |
Value
The Model
object.
Get the Azure container registry that a packaged model uses
Description
Return a ContainerRegistry
object for where the image
(or base image, for Dockerfile packages) is stored in an
Azure container registry.
Usage
get_model_package_container_registry(package)
Arguments
package |
The |
Value
The ContainerRegistry
object.
See Also
container_registry()
Examples
# Given a ModelPackage object,
# get the container registry information
## Not run:
container_registry <- get_model_package_container_registry(package)
address <- container_registry$address
username <- container_registry$username
password <- container_registry$password
## End(Not run)
# To then authenticate Docker with the Azure container registry from
# a shell or command-line session, use the following command, replacing
# <address>, <username>, and <password> with the values retrieved
# from above:
# ```bash
# docker login <address> -u <username> -p <password>
# ```
Get the model package creation logs
Description
Retrieve the creation logs from packaging a model with
package_model()
.
Usage
get_model_package_creation_logs(package, decode = TRUE, offset = 0)
Arguments
package |
The |
decode |
If |
offset |
An int of the byte offset from which to start reading the logs. |
Value
A string or character vector of package creation logs.
Get an experiment run
Description
Given the associated experiment and run ID, return the run object for a previously submitted/tracked run.
Usage
get_run(experiment, run_id)
Arguments
experiment |
The |
run_id |
A string of the run ID for the run. |
Value
The Run
object.
Get the details of a run
Description
Get the definition, status information, current log files, and other details of the run.
Usage
get_run_details(run)
Arguments
run |
The |
Details
The returned list contains the following named elements:
-
runId: ID of the run.
-
target: The compute target of the run.
-
status: The run's current status.
-
startTimeUtc: UTC time of when the run was started, in ISO8601.
-
endTimeUtc: UTC time of when the run was finished (either Completed or Failed), in ISO8601. This element does not exist if the run is still in progress.
-
properties: Immutable key-value pairs associated with the run.
-
logFiles: Log files from the run.
Value
A named list of the details for the run.
See Also
Get the details of a run along with the log files' contents
Description
Get the details of a run along with the log files' contents
Usage
get_run_details_with_logs(run)
Arguments
run |
The |
Value
A named list of the run details and log file contents.
See Also
List the files that are stored in association with a run
Description
Get the list of files stored in a run record.
Usage
get_run_file_names(run)
Arguments
run |
The |
Value
A list of strings of the paths for existing artifacts in the run record.
See Also
download_file_from_run()
download_files_from_run()
Get the metrics logged to a run
Description
Retrieve the metrics logged to a run that were logged with
the log_*()
methods.
Usage
get_run_metrics(
run,
name = NULL,
recursive = FALSE,
run_type = NULL,
populate = FALSE
)
Arguments
run |
The |
name |
The name of the metric. |
recursive |
If specified, returns runs matching specified "property" or "property": "value". |
run_type |
run type |
populate |
Boolean indicating whether to fetch the contents of external data linked to the metric. |
Value
A named list of the metrics associated with the run,
e.g. list("metric_name" = metric)
.
Examples
ws <- load_workspace_from_config() exp <- experiment(ws, name = 'myexperiment') run <- get_run(exp, run_id = "myrunid") metrics <- get_run_metrics(run)
Return a generator of the runs for an experiment
Description
Return a generator of the runs for an experiment, in reverse chronological order.
Usage
get_runs_in_experiment(
experiment,
type = NULL,
tags = NULL,
properties = NULL,
include_children = FALSE
)
Arguments
experiment |
The |
type |
Filter the returned generator of runs by the provided type. |
tags |
Filter runs by tags. A named list eg. list("tag" = "value"). |
properties |
Filter runs by properties. A named list eg. list("property" = "value"). |
include_children |
By default, fetch only top-level runs. Set to TRUE to list all runs. |
Value
The list of runs matching supplied filters.
Get secrets from a keyvault
Description
Returns the secret values from the keyvault associated with the
workspace for a given set of secret names. For runs submitted using
submit_experiment()
, you can use get_secrets_from_run()
instead,
as that method shortcuts workspace instantiation (since a submitted
run is aware of its workspace).
Usage
get_secrets(keyvault, secrets)
Arguments
keyvault |
The |
secrets |
A vector of secret names. |
Value
A named list of found and not found secrets, where element
name corresponds to the secret name. If a secret was not found, the
corresponding element will be NULL
.
Get secrets from the keyvault associated with a run's workspace
Description
From within the script of a run submitted using
submit_experiment()
, you can use get_secrets_from_run()
to get secrets that are stored in the keyvault of the associated
workspace.
Note that this method is slightly different than get_secrets()
,
which first requires you to instantiate the workspace object.
Since a submitted run is aware of its workspace,
get_secrets_from_run()
shortcuts workspace instantiation and
returns the secret value directly.
Be careful not to expose the secret(s) values by writing or printing them out.
Usage
get_secrets_from_run(run, secrets)
Arguments
run |
The |
secrets |
A vector of strings of secret names to retrieve the values for. |
Value
A named list of found and not found secrets.
If a secret was not found, the corresponding element will be NULL
.
See Also
Get a deployed web service
Description
Return the corresponding Webservice object of a deployed web service from a given workspace.
Usage
get_webservice(workspace, name)
Arguments
workspace |
The |
name |
A string of the name of the web service to retrieve. |
Value
The LocalWebservice
, AciWebservice
, or AksWebservice
object.
Retrieve auth keys for a web service
Description
Get the authentication keys for a web service that is deployed
with key-based authentication enabled. In order to enable
key-based authentication, set the auth_enabled = TRUE
parameter
when you are creating or updating a deployment (either
aci_webservice_deployment_config()
or
aks_webservice_deployment_config()
for creation and
update_aci_webservice()
or update_aks_webservice()
for updating).
Note that key-based auth is enabled by default for AksWebservice
but not for AciWebservice
.
To check if a web service has key-based auth enabled, you can
access the following boolean property from the Webservice object:
service$auth_enabled
Not supported for LocalWebservice
deployments.
Usage
get_webservice_keys(webservice)
Arguments
webservice |
The |
Value
A list of two strings corresponding to the primary and secondary authentication keys.
See Also
generate_new_webservice_key()
Retrieve the logs for a web service
Description
You can get the detailed Docker engine log messages from your web service deployment. You can view the logs for local, ACI, and AKS deployments.
For example, if your web service deployment fails, you can inspect the logs to help troubleshoot.
Usage
get_webservice_logs(webservice, num_lines = 5000L)
Arguments
webservice |
The |
num_lines |
An int of the maximum number of log lines to retrieve. |
Value
A string of the logs for the web service.
Retrieve the auth token for a web service
Description
Get the authentication token, scoped to the current user, for a web service that was deployed with token-based authentication enabled. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service. Only available for AKS deployments.
In order to enable token-based authentication, set the
token_auth_enabled = TRUE
parameter when you are creating or
updating a deployment (aks_webservice_deployment_config()
for creation
or update_aks_webservice()
for updating). Note that you cannot have both
key-based authentication and token-based authentication enabled.
Token-based authentication is not enabled by default.
To check if a web service has token-based auth enabled, you can
access the following boolean property from the Webservice object:
service$token_auth_enabled
Usage
get_webservice_token(webservice)
Arguments
webservice |
The |
Value
An AksServiceAccessToken
object.
Get an existing workspace
Description
Returns a Workspace
object for an existing Azure Machine Learning
workspace. Throws an exception if the workpsace doesn't exist or the
required fields don't lead to a uniquely identifiable workspace.
Usage
get_workspace(name, auth = NULL, subscription_id = NULL, resource_group = NULL)
Arguments
name |
A string of the workspace name to get. |
auth |
The |
subscription_id |
A string of the subscription ID to use. The parameter is required if the user has access to more than one subscription. |
resource_group |
A string of the resource group to use. If |
Value
The Workspace
object.
See Also
create_workspace()
service_principal_authentication()
interactive_login_authentication()
Get the details of a workspace
Description
Returns the details of the workspace.
Usage
get_workspace_details(workspace)
Arguments
workspace |
The |
Value
Named list of the workspace details.
Details
The returned named list contains the following elements:
-
id: URI pointing to the workspace resource, containing subscription ID, resource group, and workspace name.
-
name: Workspace name.
-
location: Workspace region.
-
type: URI of the format
"{providerName}/workspaces"
. -
workspaceid: Workspace ID.
-
description: Workspace description.
-
friendlyName: Workspace friendly name.
-
creationTime: Time the workspace was created, in ISO8601.
-
containerRegistry: Workspace container registry.
-
keyVault: Workspace key vault.
-
applicationInsights: Workspace App Insights.
-
identityPrincipalId: Workspace identity principal ID.
-
identityTenantId: Workspace tenant ID.
-
identityType: Workspace identity type.
-
storageAccount: Workspace storage account.
Specifies a Github package to install in environment
Description
Specifies a Github package to install in run environment
Usage
github_package(repository, auth_token = NULL)
Arguments
repository |
Repository address of the github package |
auth_token |
Personal access token to install from a private repo. |
Value
A named list containing the package specifications
Examples
pkg1 <- github_package("Azure/azureml-sdk-for-r") env <- r_environment(name = "r_env", github_packages = list(pkg1))
See Also
Define grid sampling over a hyperparameter search space
Description
Grid sampling performs a simple grid search over all feasible values in
the defined search space. It can only be used with hyperparameters
specified using choice()
.
Usage
grid_parameter_sampling(parameter_space)
Arguments
parameter_space |
A named list containing each parameter and its
distribution, e.g. |
Value
The GridParameterSampling
object.
See Also
choice()
Examples
## Not run:
param_sampling <- grid_parameter_sampling(list("num_hidden_layers" = choice(c(1, 2, 3)),
"batch_size" = choice(c(16, 32))))
## End(Not run)
Create a configuration for a HyperDrive run
Description
The HyperDrive configuration includes information about hyperparameter space sampling, termination policy, primary metric, estimator, and the compute target to execute the experiment runs on.
To submit the HyperDrive experiment, pass the HyperDriveConfig
object
returned from this method to submit_experiment()
.
Usage
hyperdrive_config(
hyperparameter_sampling,
primary_metric_name,
primary_metric_goal,
max_total_runs,
max_concurrent_runs = NULL,
max_duration_minutes = 10080L,
policy = NULL,
estimator = NULL
)
Arguments
hyperparameter_sampling |
The hyperparameter sampling space.
Can be a |
primary_metric_name |
A string of the name of the primary metric reported by the experiment runs. |
primary_metric_goal |
The |
max_total_runs |
An integer of the maximum total number of runs
to create. This is the upper bound; there may be fewer runs when the
sample space is smaller than this value. If both |
max_concurrent_runs |
An integer of the maximum number of runs to
execute concurrently. If |
max_duration_minutes |
An integer of the maximum duration of the
HyperDrive run. Once this time is exceeded, any runs still executing are
cancelled. If both |
policy |
The early termination policy to use. Can be either a
The |
estimator |
The |
Value
The HyperDriveConfig
object.
See Also
submit_experiment()
Examples
## Not run:
# Load the workspace
ws <- load_workspace_from_config()
# Get the compute target
compute_target <- get_compute(ws, cluster_name = 'mycluster')
# Define the primary metric goal
goal = primary_metric_goal("MAXIMIZE")
# Define the early termination policy
early_termination_policy = median_stopping_policy(evaluation_interval = 1L,
delay_evaluation = 5L)
# Create the estimator
est <- estimator(source_directory = '.',
entry_script = 'train.R',
compute_target = compute_target)
# Create the HyperDrive configuration
hyperdrive_run_config = hyperdrive_config(
hyperparameter_sampling = param_sampling,
primary_metric_name = 'accuracy',
primary_metric_goal = goal,
max_total_runs = 100,
max_concurrent_runs = 4,
policy = early_termination_policy,
estimator = est)
# Submit the HyperDrive experiment
exp <- experiment(ws, name = 'myexperiment')
run = submit_experiment(exp, hyperdrive_run_config)
## End(Not run)
Create an inference configuration for model deployments
Description
The inference configuration describes how to configure the model to make
predictions. It references your scoring script (entry_script
) and is
used to locate all the resources required for the deployment. Inference
configurations use Azure Machine Learning environments (see r_environment()
)
to define the software dependencies needed for your deployment.
Usage
inference_config(
entry_script,
source_directory = ".",
description = NULL,
environment = NULL
)
Arguments
entry_script |
A string of the path to the local file that contains the code to run for making predictions. |
source_directory |
A string of the path to the local folder
that contains the files to package and deploy alongside your model, such as
helper files for your scoring script ( |
description |
(Optional) A string of the description to give this configuration. |
environment |
An |
Value
The InferenceConfig
object.
Defining the entry script
To deploy a model, you must provide an entry script that accepts requests, scores the requests by using the model, and returns the results. The entry script is specific to your model. It must understand the format of the incoming request data, the format of the data expected by your model, and the format of the data returned to clients. If the request data is in a format that is not usable by your model, the script can transform it into an acceptable format. It can also transform the response before returning it to the client.
The entry script must contain an init()
method that loads your model and
then returns a function that uses the model to make a prediction based on
the input data passed to the function. Azure ML runs the init()
method
once, when the Docker container for your web service is started. The
prediction function returned by init()
will be run every time the service
is invoked to make a prediction on some input data. The inputs and outputs
of this prediction function typically use JSON for serialization and
deserialization.
To locate the model in your entry script (when you load the model in the
script's init()
method), use AZUREML_MODEL_DIR
, an environment variable
containing the path to the model location. The environment variable is
created during service deployment, and you can use it to find the location
of your deployed model(s).
To get the path to a file in a model, combine the environment variable with the filename you're looking for. The filenames of the model files are preserved during registration and deployment.
Single model example:
model_path <- file.path(Sys.getenv("AZUREML_MODEL_DIR"), "my_model.rds")
Multiple model example:
model1_path <- file.path(Sys.getenv("AZUREML_MODEL_DIR"), "my_model/1/my_model.rds")
See Also
r_environment()
, deploy_model()
Install azureml sdk package
Description
Install azureml sdk package
Usage
install_azureml(
version = "1.10.0",
envname = "r-reticulate",
conda_python_version = "3.6",
restart_session = TRUE,
remove_existing_env = FALSE
)
Arguments
version |
azureml sdk package version |
envname |
name of environment to create, if environment other than default is desired |
conda_python_version |
version of python for conda environment |
restart_session |
restart R session after installation |
remove_existing_env |
delete the conda environment if already exists |
Value
None
Manages authentication and acquires an authorization token in interactive login workflows.
Description
Interactive login authentication is suitable for local experimentation on your own computer, and is the default authentication model when using Azure Machine Learning SDK. The constructor of the class will prompt you to login. The constructor then will save the credentials for any subsequent attempts. If you are already logged in with the Azure CLI or have logged-in before, the constructor will load the existing credentials without prompt.
Usage
interactive_login_authentication(
force = FALSE,
tenant_id = NULL,
cloud = "AzureCloud"
)
Arguments
force |
Indicates whether "az login" will be run even if the old "az login" is still valid. |
tenant_id |
The string id of the active directory tenant that the service identity belongs to. This is can be used to specify a specific tenant when you have access to multiple tenants. If unspecified, the default tenant will be used. |
cloud |
The name of the target cloud. Can be one of "AzureCloud", "AzureChinaCloud", or "AzureUSGovernment". If no cloud is specified, "AzureCloud" is used. |
Value
InteractiveLoginAuthentication
object
Examples
interactive_auth <- interactive_login_authentication(tenant_id="your-tenant-id") ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>", auth = interactive_auth)
See Also
get_workspace()
service_principal_authentication()
Call a web service with the provided input
Description
Invoke the web service with the provided input and to receive
predictions from the deployed model. The structure of the
provided input data needs to match what the service's scoring
script and model expect. See the "Details" section of
inference_config()
.
Usage
invoke_webservice(webservice, input_data)
Arguments
webservice |
The |
input_data |
The input data to invoke the web service with. This is the data your model expects as an input to run predictions. |
Details
Instead of invoking the web service using invoke_webservice()
, you can
also consume the web service using the service's REST API. If you've
enabled key-based authentication for your service, you will need to provide
a service key as a token in your request header
(see get_webservice_keys()
). If you've enabled token-based
authentication, you will need to provide an JWT token as a bearer
token in your request header (see get_webservice_token()
).
To get the REST API address for the service's scoring endpoint, you can
access the following property from the Webservice object:
service$scoring_uri
Value
A named list of the result of calling the web service. This will return the predictions run from your model.
Keep the specified columns and drops all others from the dataset.
Description
Keep the specified columns and drops all others from the dataset. If a timeseries column is dropped, the corresponding capabilities will be dropped for the returned dataset as well.
Usage
keep_columns_from_dataset(dataset, columns, validate = FALSE)
Arguments
dataset |
The Tabular Dataset object |
columns |
The name or a list of names for the columns to keep. |
validate |
Indicates whether to validate if data can be loaded from the returned dataset. The default is False. Validation requires that the data source is accessible from current compute. |
Value
A new Tabular Dataset object with only the specified columns kept.
Get the details (e.g IP address, port etc) of all the compute nodes in the compute target
Description
Get the details (e.g IP address, port etc) of all the compute nodes in the compute target
Usage
list_nodes_in_aml_compute(cluster)
Arguments
cluster |
cluster object |
Value
Details of all the compute nodes in the cluster in data frame
List the secrets in a keyvault
Description
Returns the list of secret names for all the secrets in the keyvault associated with the workspace.
Usage
list_secrets(keyvault)
Arguments
keyvault |
The |
Value
A list of secret names.
List the supported VM sizes in a region
Description
List the supported VM sizes in a region
Usage
list_supported_vm_sizes(workspace, location = NULL)
Arguments
workspace |
The |
location |
A string of the location of the cluster. If not specified, will default to the workspace location. |
Value
A data frame of supported VM sizes in a region with name of the VM, VCPUs, RAM.
List all workspaces that the user has access to in a subscription ID
Description
List all workspaces that the user has access to in the specified
subscription_id
parameter. The list of workspaces can be filtered
based on the resource group.
Usage
list_workspaces(subscription_id, resource_group = NULL)
Arguments
subscription_id |
A string of the specified subscription ID to list the workspaces in. |
resource_group |
A string of the specified resource group to list
the workspaces. If |
Value
A named list of Workspace
objects where element name corresponds
to the workspace name.
Load all records from the dataset into a dataframe.
Description
Load all records from the dataset into a dataframe.
Usage
load_dataset_into_data_frame(
dataset,
on_error = "null",
out_of_range_datetime = "null"
)
Arguments
dataset |
The Tabular Dataset object. |
on_error |
How to handle any error values in the dataset, such as those produced by an error while parsing values. Valid values are 'null' which replaces them with NULL; and 'fail' which will result in an exception. |
out_of_range_datetime |
How to handle date-time values that are outside the range supported by Pandas. Valid values are 'null' which replaces them with NULL; and 'fail' which will result in an exception. |
Value
A data.frame.
Load workspace configuration details from a config file
Description
Returns a Workspace
object for an existing Azure Machine Learning
workspace by reading the workspace configuration from a file. The method
provides a simple way of reusing the same workspace across multiple files or
projects. Users can save the workspace ARM properties using
write_workspace_config()
, and use this method to load the same workspace
in different files or projects without retyping the workspace ARM properties.
Usage
load_workspace_from_config(path = NULL, file_name = NULL)
Arguments
path |
A string of the path to the config file or starting directory for search. The parameter defaults to starting the search in the current directory. |
file_name |
A string that will override the config file name to search for when path is a directory path. |
Value
The Workspace
object.
See Also
Create a deployment config for deploying a local web service
Description
You can deploy a model locally for limited testing and troubleshooting. To do so, you will need to have Docker installed on your local machine.
If you are using an Azure Machine Learning Compute Instance for development, you can also deploy locally on your compute instance.
Usage
local_webservice_deployment_config(port = NULL)
Arguments
port |
An int of the local port on which to expose the service's HTTP endpoint. |
Value
The LocalWebserviceDeploymentConfiguration
object.
Examples
## Not run:
deployment_config <- local_webservice_deployment_config(port = 8890)
## End(Not run)
Log an accuracy table metric to a run
Description
The accuracy table metric is a multi-use non-scalar metric that can be used to produce multiple types of line charts that vary continuously over the space of predicted probabilities. Examples of these charts are ROC, precision-recall, and lift curves.
Usage
log_accuracy_table_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of the metric. |
value |
A named list containing name, version, and data properties. |
description |
(Optional) A string of the metric description. |
run |
The |
Details
The calculation of the accuracy table is similar to the calculation of an ROC curve. An ROC curve stores true positive rates and false positive rates at many different probability thresholds. The accuracy table stores the raw number of true positives, false positives, true negatives, and false negatives at many probability thresholds.
There are two methods used for selecting thresholds: "probability" and "percentile." They differ in how they sample from the space of predicted probabilities.
Probability thresholds are uniformly spaced thresholds between 0 and 1. If NUM_POINTS were 5 the probability thresholds would be c(0.0, 0.25, 0.5, 0.75, 1.0).
Percentile thresholds are spaced according to the distribution of predicted probabilities. Each threshold corresponds to the percentile of the data at a probability threshold. For example, if NUM_POINTS were 5, then the first threshold would be at the 0th percentile, the second at the 25th percentile, the third at the 50th, and so on.
The probability tables and percentile tables are both 3D lists where the first dimension represents the class label, the second dimension represents the sample at one threshold (scales with NUM_POINTS), and the third dimension always has 4 values: TP, FP, TN, FN, and always in that order.
The confusion values (TP, FP, TN, FN) are computed with the one vs. rest strategy. See the following link for more details: https://en.wikipedia.org/wiki/Multiclass_classification.
N = # of samples in validation dataset (200 in example), M = # thresholds = # samples taken from the probability space (5 in example), C = # classes in full dataset (3 in example)
Some invariants of the accuracy table:
TP + FP + TN + FN = N for all thresholds for all classes
TP + FN is the same at all thresholds for any class
TN + FP is the same at all thresholds for any class
Probability tables and percentile tables have shape (C, M, 4)
Note: M can be any value and controls the resolution of the charts. This is independent of the dataset, is defined when calculating metrics, and trades off storage space, computation time, and resolution.
Class labels should be strings, confusion values should be integers, and thresholds should be doubles.
Value
None
Log a confusion matrix metric to a run
Description
Log a confusion matrix metric to a run
Usage
log_confusion_matrix_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of the metric. |
value |
A named list containing name, version, and data properties. |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Log an image metric to a run
Description
Log an image to the run with the give metric name. Use
log_image_to_run()
to log an image file or ggplot2 plot to the
run. These images will be visible and comparable in the run
record.
Usage
log_image_to_run(name, path = NULL, plot = NULL, description = "", run = NULL)
Arguments
name |
A string of the name of the metric. |
path |
A string of the path or stream of the image. |
plot |
The ggplot2 plot to log as an image. |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Log a vector metric value to a run
Description
Log a vector with the given metric name to the run.
Usage
log_list_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of metric. |
value |
The vector of elements to log. |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Examples
log_list_to_run("Accuracies", c(0.6, 0.7, 0.87))
Log a metric to a run
Description
Log a numerical or string value with the given metric name to the run. Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric.
Usage
log_metric_to_run(name, value, run = NULL)
Arguments
name |
A string of the name of the metric. |
value |
The value of the metric. |
run |
The |
Value
None
Examples
log_metric_to_run("Accuracy", 0.95)
Log a predictions metric to a run
Description
log_predictions_to_run()
logs a metric score that can be used to
compare the distributions of true target values to the distribution
of predicted values for a regression task.
The predictions are binned and standard deviations are calculated for error bars on a line chart.
Usage
log_predictions_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of the metric. |
value |
A named list containing name, version, and data properties. |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Examples
data <- list("bin_averages" = c(0.25, 0.75), "bin_errors" = c(0.013, 0.042), "bin_counts" = c(56, 34), "bin_edges" = c(0.0, 0.5, 1.0)) predictions <- list("schema_type" = "predictions", "schema_version" = "v1", "data" = data) log_predictions_to_run("mypredictions", predictions)
Log a residuals metric to a run
Description
log_residuals_to_run()
logs the data needed to display a histogram
of residuals for a regression task. The residuals are predicted - actual
.
There should be one more edge than the number of counts.
Usage
log_residuals_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of the metric. |
value |
A named list containing name, version, and data properties. |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Examples
data <- list("bin_edges" = c(50, 100, 200, 300, 350), "bin_counts" = c(0.88, 20, 30, 50.99)) residuals <- list("schema_type" = "residuals", "schema_version" = "v1", "data" = data) log_predictions_to_run("myresiduals", predictions)
Log a row metric to a run
Description
Using log_row_to_run()
creates a metric with multiple columns
as described in ...
. Each named parameter generates a column
with the value specified. log_row_to_run()
can be called once
to log an arbitrary tuple, or multiple times in a loop to generate
a complete table.
Usage
log_row_to_run(name, description = "", run = NULL, ...)
Arguments
name |
A string of the name of metric. |
description |
(Optional) A string of the metric description. |
run |
The |
... |
Each named parameter generates a column with the value specified. |
Value
None
Examples
Log an arbitrary tuple:
log_row_to_run("Y over X", x = 1, y = 0.4)
Log the complete table:
citrus <- c("orange", "lemon", "lime") sizes <- c(10, 7, 3) for (i in seq_along(citrus)) { log_row_to_run("citrus", fruit = citrus[i], size = sizes[i]) }
Log a table metric to a run
Description
Log a table metric with the given metric name to the run. The table value is a named list where each element corresponds to a column of the table.
Usage
log_table_to_run(name, value, description = "", run = NULL)
Arguments
name |
A string of the name of metric. |
value |
The table value of the metric (a named list where the element name corresponds to the column name). |
description |
(Optional) A string of the metric description. |
run |
The |
Value
None
Examples
log_table_to_run("Y over X", list("x" = c(1, 2, 3), "y" = c(0.6, 0.7, 0.89)))
Specify a normal distribution of the form exp(normal(mu, sigma))
Description
Specify a normal distribution of the form exp(normal(mu, sigma))
.
The logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.
Usage
lognormal(mu, sigma)
Arguments
mu |
A double of the mean of the normal distribution. |
sigma |
A double of the standard deviation of the normal distribution. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Specify a log uniform distribution
Description
Specify a log uniform distribution.
A value is drawn according to exp(uniform(min_value, max_value))
so that
the logarithm of the return value is uniformly distributed. When optimizing,
this variable is constrained to the interval
[exp(min_value), exp(max_value)]
.
Usage
loguniform(min_value, max_value)
Arguments
min_value |
A double where the minimum value in the range will be
|
max_value |
A double where the maximum value in the range will be
|
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Define a median stopping policy for early termination of HyperDrive runs
Description
Median stopping is an early termination policy based on running averages of primary metrics reported by the runs. This policy computes running averages across all training runs and terminates runs whose performance is worse than the median of the running averages. Specifically, a run will be canceled at interval N if its best primary metric reported up to interval N is worse than the median of the running averages for intervals 1:N across all runs.
Usage
median_stopping_policy(evaluation_interval = 1L, delay_evaluation = 0L)
Arguments
evaluation_interval |
An integer of the frequency for applying policy. |
delay_evaluation |
An integer of the number of intervals for which to delay the first evaluation. |
Value
The MedianStoppingPolicy
object.
Details
The median stopping policy takes the following optional configuration parameters:
-
evaluation_interval
: Optional. The frequency for applying the policy. Each time the training script logs the primary metric counts as one interval. -
delay_evaluation
: Optional. The number of intervals to delay the policy evaluation. Use this parameter to avoid premature termination of training runs. If specified, the policy applies every multiple ofevaluation_interval
that is greater than or equal todelay_evaluation
.
This policy is inspired from the research publication Google Vizier: A Service for Black-Box Optimization.
If you are looking for a conservative policy that provides savings without
terminating promising jobs, you can use a MedianStoppingPolicy
with
evaluation_interval = 1
and delay_evaluation = 5
. These are conservative
settings that can provide approximately 25%-35% savings with no loss on
the primary metric (based on our evaluation data).
Examples
# In this example, the early termination policy is applied at every
# interval starting at evaluation interval 5. A run will be terminated at
# interval 5 if its best primary metric is worse than the median of the
# running averages over intervals 1:5 across all training runs
## Not run:
early_termination_policy = median_stopping_policy(evaluation_interval = 1L,
delay_evaluation = 5L)
## End(Not run)
Combine the results from the parallel training.
Description
Combine the results from the parallel training.
Usage
merge_results(node_count, process_count_per_node, run, source_directory)
Arguments
node_count |
Number of nodes in the AmlCompute cluster. |
process_count_per_node |
Number of processes per node. |
run |
The run object whose output needs to be combined. |
source_directory |
The directory where the output from the run would be downloaded. |
Create a context manager for mounting file streams defined by the dataset as local files.
Description
Create a context manager for mounting file streams defined by the dataset as local files.
A context manager will be returned to manage the lifecycle of the mount.
To mount, you will need to enter the context manager and to unmount, exit from
the context manager. Mount is only supported on Unix or Unix-like operating systems
and libfuse must be present. If you are running inside a docker container, the docker
container must be started with the --privileged
flag or started with
--cap-add SYS_ADMIN --device /dev/fuse
.
Usage
mount_file_dataset(dataset, mount_point = NULL)
Arguments
dataset |
The Dataset object. |
mount_point |
The local directory to mount the files to. If NULL, the data will be mounted into a temporary directory. |
Value
Returns a context manager for managing the lifecycle of the mount of
type azureml.dataprep.fuse.daemon.MountContext
.
Specify a real value that is normally-distributed with mean mu
and standard
deviation sigma
Description
Specify a real value that is normally-distributed with mean mu
and
standard deviation sigma
.
When optimizing, this is an unconstrained variable.
Usage
normal(mu, sigma)
Arguments
mu |
A double of the mean of the normal distribution. |
sigma |
A double of the standard deviation of the normal distribution. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Create a model package that packages all the assets needed to host a model as a web service
Description
In some cases, you might want to create a Docker image without deploying the model (for example, if you plan to deploy to Azure App Service). Or you might want to download the image and run it on a local Docker installation. You might even want to download the files used to build the image, inspect them, modify them, and build the image manually.
Model packaging enables you to do these things. package_model()
packages all
the assets needed to host a model as a web service and allows you to download
either a fully built Docker image or the files needed to build one. There are
two ways to use model packaging:
-
Download a packaged model: Download a Docker image that contains the model and other files needed to host it as a web service.
-
Generate a Dockerfile: Download the Dockerfile, model, entry script, and other assets needed to build a Docker image. You can then inspect the files or make changes before you build the image locally. To use this method, make sure to set
generate_dockerfile = TRUE
. With either scenario, you will need to have Docker installed in your development environment.
Usage
package_model(workspace, models, inference_config, generate_dockerfile = FALSE)
Arguments
workspace |
The |
models |
A list of |
inference_config |
The |
generate_dockerfile |
If |
Value
The ModelPackage
object.
See Also
wait_for_model_package_creation()
, get_model_package_container_registry()
,
get_model_package_creation_logs()
, pull_model_package_image()
,
save_model_package_files()
Examples
# Package a registered model
## Not run:
ws <- load_workspace_from_config()
model <- get_model(ws, name = "my_model")
r_env <- r_environment(name = "r_env")
inference_config <- inference_config(entry_script = "score.R",
source_directory = ".",
environment = r_env)
package <- package_model(ws,
models = list(model),
inference_config = inference_config)
wait_for_model_package_creation(show_output = TRUE)
## End(Not run)
Generate table of run details
Description
Plot a table of run details including
ID
Status
Start Time
Duration
Script Name
Arguments
Link to Web Portal view
Errors
Usage
plot_run_details(run)
Arguments
run |
The |
Value
Datatable containing run details
Define supported metric goals for hyperparameter tuning
Description
A metric goal is used to determine whether a higher value for a metric is better or worse. Metric goals are used when comparing runs based on the primary metric. For example, you may want to maximize accuracy or minimize error.
The primary metric name and goal are specified to hyperdrive_config()
when you configure a HyperDrive run.
Usage
primary_metric_goal(goal)
Arguments
goal |
A string of the metric goal (either "MAXIMIZE" or "MINIMIZE"). |
Value
The PrimaryMetricGoal
object.
Defines options for how column headers are processed when reading data from files to create a dataset.
Description
Defines options for how column headers are processed when reading data from files to create a dataset. These enumeration values are used in the Dataset class method.
Usage
promote_headers_behavior(option)
Arguments
option |
An integer corresponding to an option for how column headers are to be processed
|
Value
The PromoteHeadersBehavior object.
Pull the Docker image from a ModelPackage
to your local
Docker environment
Description
Pull the Docker image from a created ModelPackage
to your
local Docker environment. The output of this call will
display the name of the image. For example:
Status: Downloaded newer image for myworkspacef78fd10.azurecr.io/package:20190822181338
.
This can only be used with a Docker image ModelPackage
(where
package_model()
was called with generate_dockerfile = FALSE
).
After you've pulled the image, you can start a local container based on this image using Docker commands.
Usage
pull_model_package_image(package)
Arguments
package |
The |
Value
None
See Also
package_model()
Specify a normal distribution of the form
round(exp(normal(mu, sigma)) / q) * q
Description
Specify a normal distribution of the form
round(exp(normal(mu, sigma)) / q) * q
.
Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
Usage
qlognormal(mu, sigma, q)
Arguments
mu |
A double of the mean of the normal distribution. |
sigma |
A double of the standard deviation of the normal distribution. |
q |
An integer of the smoothing factor. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Specify a uniform distribution of the form
round(exp(uniform(min_value, max_value) / q) * q
Description
Specify a uniform distribution of the form
round(exp(uniform(min_value, max_value) / q) * q
.
This is suitable for a discrete variable with respect to which the objective is "smooth", and gets smoother with the size of the value, but which should be bounded both above and below.
Usage
qloguniform(min_value, max_value, q)
Arguments
min_value |
A double of the minimum value in the range (inclusive). |
max_value |
A double of the maximum value in the range (inclusive). |
q |
An integer of the smoothing factor. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Specify a normal distribution of the form round(normal(mu, sigma) / q) * q
Description
Specify a normal distribution of the form round(normal(mu, sigma) / q) * q
.
Suitable for a discrete variable that probably takes a value around mu
,
but is fundamentally unbounded.
Usage
qnormal(mu, sigma, q)
Arguments
mu |
A double of the mean of the normal distribution. |
sigma |
A double of the standard deviation of the normal distribution. |
q |
An integer of the smoothing factor. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Specify a uniform distribution of the form
round(uniform(min_value, max_value) / q) * q
Description
Specify a uniform distribution of the form
round(uniform(min_value, max_value) / q) * q
.
This is suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below.
Usage
quniform(min_value, max_value, q)
Arguments
min_value |
A double of the minimum value in the range (inclusive). |
max_value |
A double of the maximum value in the range (inclusive). |
q |
An integer of the smoothing factor. |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Create an environment
Description
Configure the R environment to be used for training or web service
deployments. When you submit a run or deploy a model, Azure ML builds a
Docker image and creates a conda environment with your specifications from
your Environment
object within that Docker container.
If the custom_docker_image
parameter
is not set, Azure ML will build a predefined base image (CPU or GPU
depending on the use_gpu
flag) and install any R packages specified in the
cran_packages
, github_packages
, or custom_url_packages
parameters.
Usage
r_environment(
name,
version = NULL,
environment_variables = NULL,
r_version = NULL,
rscript_path = NULL,
snapshot_date = NULL,
cran_packages = NULL,
github_packages = NULL,
custom_url_packages = NULL,
bioconductor_packages = NULL,
custom_docker_image = NULL,
image_registry_details = NULL,
use_gpu = FALSE,
shm_size = NULL
)
Arguments
name |
A string of the name of the environment. |
version |
A string of the version of the environment. |
environment_variables |
A named list of environment variables names and values. These environment variables are set on the process where the user script is being executed. |
r_version |
The version of R to be installed. |
rscript_path |
The Rscript path to use if an environment build is not required. The path specified gets used to call the user script. |
snapshot_date |
Date of MRAN snapshot to use. |
cran_packages |
A list of |
github_packages |
A list of |
custom_url_packages |
A character vector of packages to be installed from local directory or custom URL. |
bioconductor_packages |
A character vector of packages to be installed from Bioconductor. |
custom_docker_image |
A string of the name of the Docker image from
which the image to use for training or deployment will be built. If not set,
a predefined Docker image will be used. To use an image from a private Docker
repository, you will also have to specify the |
image_registry_details |
A |
use_gpu |
Indicates whether the environment should support GPUs.
If |
shm_size |
A string for the size of the Docker container's shared
memory block. For more information, see
Docker run reference
If not set, a default value of |
Value
The Environment
object.
Details
Once built, the Docker image appears in the Azure Container Registry associated with your workspace, by default. The repository name has the form azureml/azureml_<uuid>. The unique identifier (uuid) part corresponds to a hash computed from the environment configuration. This allows the service to determine whether an image corresponding to the given environment already exists for reuse.
If you make changes to an existing environment, such as adding an R package, a new version of the environment is created when you either submit a run, deploy a model, or manually register the environment. The versioning allows you to view changes to the environment over time.
Predefined Docker images
When submitting a training job or deploying a model, Azure ML runs your
training script or scoring script within a Docker container. If no custom
Docker image is specified with the custom_docker_image
parameter, Azure
ML will build a predefined CPU or GPU Docker image. The predefine images extend
the Ubuntu 16.04 Azure ML base images
and include the following dependencies:
Dependencies | Version | Remarks |
azuremlsdk | latest | (from GitHub) |
R | 3.6.0 | - |
Commonly used R packages | - | 80+ of the most popular R packages for data science, including the IRKernel, dplyr, shiny, ggplot2, tidyr, caret, and nnet. For the full list of packages included, see here. |
Python | 3.7.0 | - |
azureml-defaults | latest | azureml-defaults contains the
azureml-core and applicationinsights packages of the Python SDK that
are required for tasks such as logging metrics, uploading artifacts, and
deploying models. (from pip) |
rpy2 | latest | (from conda) |
CUDA (GPU image only) | 10.0 | CuDNN (version 7) is also included |
See Also
estimator()
, inference_config()
Examples
# The following example defines an environment that will build the default
# base CPU image.
## Not run:
r_env <- r_environment(name = 'myr_env',
version = '1')
## End(Not run)
Specify a set of random integers in the range [0, upper)
Description
Specify a set of random integers in the range [0, upper)
to sample the hyperparameters from.
The semantics of this distribution is that there is no more
correlation in the loss function between nearby integer values,
as compared with more distant integer values. This is an
appropriate distribution for describing random seeds, for example.
If the loss function is probably more correlated for nearby integer
values, then you should probably use one of the "quantized" continuous
distributions, such as either quniform()
, qloguniform()
, qnormal()
,
or qlognormal()
.
Usage
randint(upper)
Arguments
upper |
An integer of the upper bound for the range of integers (exclusive). |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Define random sampling over a hyperparameter search space
Description
In random sampling, hyperparameter values are randomly selected from the defined search space. Random sampling allows the search space to include both discrete and continuous hyperparameters.
Usage
random_parameter_sampling(parameter_space, properties = NULL)
Arguments
parameter_space |
A named list containing each parameter and its
distribution, e.g. |
properties |
A named list of additional properties for the algorithm. |
Value
The RandomParameterSampling
object.
Details
In this sampling algorithm, parameter values are chosen from a set of
discrete values or a distribution over a continuous range. Functions you can
use include:
choice()
, randint()
, uniform()
, quniform()
, loguniform()
,
qloguniform()
, normal()
, qnormal()
, lognormal()
, and qlognormal()
.
See Also
choice()
, randint()
, uniform()
, quniform()
, loguniform()
,
qloguniform()
, normal()
, qnormal()
, lognormal()
, qlognormal()
Examples
## Not run:
param_sampling <- random_parameter_sampling(list("learning_rate" = normal(10, 3),
"keep_probability" = uniform(0.05, 0.1),
"batch_size" = choice(c(16, 32, 64, 128))))
## End(Not run)
Split file streams in the dataset into two parts randomly and approximately by the percentage specified.
Description
Split file streams in the dataset into two parts randomly and approximately by the percentage specified.
Usage
random_split_dataset(dataset, percentage, seed = NULL)
Arguments
dataset |
The Dataset object. |
percentage |
The approximate percentage to split the Dataset by. This must be a number between 0.0 and 1.0. |
seed |
An optional seed to use for the random generator. |
Value
A new Dataset object representing the two datasets after the split.
Register an Azure blob container as a datastore
Description
Register an Azure blob container as a datastore. You can choose to use either the SAS token or the storage account key.
Usage
register_azure_blob_container_datastore(
workspace,
datastore_name,
container_name,
account_name,
sas_token = NULL,
account_key = NULL,
protocol = NULL,
endpoint = NULL,
overwrite = FALSE,
create_if_not_exists = FALSE,
skip_validation = FALSE,
blob_cache_timeout = NULL,
grant_workspace_access = FALSE,
subscription_id = NULL,
resource_group = NULL
)
Arguments
workspace |
The |
datastore_name |
A string of the name of the datastore. The name must be case insensitive and can only contain alphanumeric characters and underscores. |
container_name |
A string of the name of the Azure blob container. |
account_name |
A string of the storage account name. |
sas_token |
A string of the account SAS token. |
account_key |
A string of the storage account key. |
protocol |
A string of the protocol to use to connect to the
blob container. If |
endpoint |
A string of the endpoint of the blob container.
If |
overwrite |
If |
create_if_not_exists |
If |
skip_validation |
If |
blob_cache_timeout |
An integer of the cache timeout in seconds
when this blob is mounted. If |
grant_workspace_access |
If |
subscription_id |
A string of the subscription id of the storage account. |
resource_group |
A string of the resource group of the storage account. |
Value
The AzureBlobDatastore
object.
Details
In general we recommend Azure Blob storage over Azure File storage. Both standard and premium storage are available for blobs. Although more expensive, we suggest premium storage due to faster throughput speeds that may improve the speed of your training runs, particularly if you train against a large dataset.
Examples
## Not run:
ws <- load_workspace_from_config()
ds <- register_azure_blob_container_datastore(ws,
datastore_name = 'mydatastore',
container_name = 'myazureblobcontainername',
account_name = 'mystorageaccoutname',
account_key = 'mystorageaccountkey')
## End(Not run)
Initialize a new Azure Data Lake Gen2 Datastore.
Description
Initialize a new Azure Data Lake Gen2 Datastore.
Usage
register_azure_data_lake_gen2_datastore(
workspace,
datastore_name,
filesystem,
account_name,
tenant_id,
client_id,
client_secret,
resource_url = NULL,
authority_url = NULL,
protocol = NULL,
endpoint = NULL,
overwrite = FALSE
)
Arguments
workspace |
The workspace this datastore belongs to. |
datastore_name |
The datastore name. |
filesystem |
The name of the Data Lake Gen2 filesystem. |
account_name |
The storage account name. |
tenant_id |
The Directory ID/Tenant ID of the service principal. |
client_id |
The Client ID/Application ID of the service principal. |
client_secret |
The secret of the service principal. |
resource_url |
The resource URL, which determines what operations will be performed on the data lake store, defaults to https://storage.azure.com/ which allows us to perform filesystem operations. |
authority_url |
The authority URL used to authenticate the user, defaults to "https://login.microsoftonline.com". |
protocol |
Protocol to use to connect to the blob container. If None, defaults to "https". |
endpoint |
The endpoint of the blob container. If None, defaults to "core.windows.net". |
overwrite |
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is FALSE. |
Value
The azureml.data.azure_data_lake_datastore.AzureDataLakeGen2Datastore
object.
Examples
# Create and register an Azure Data Lake Gen2 Datastore to a workspace. my_adlsgen2_ds <- register_azure_data_lake_gen2_datastore(workspace = your_workspace, datastore_name = <name for this datastore>, filesystem = 'test', tenant_id = your_workspace$auth$tenant_id, client_id = your_workspace$auth$service_principal_id, client_secret = your_workspace$auth$service_principal_password)
See Also
unregister_datastore()
, get_datastore()
Register an Azure file share as a datastore
Description
Register an Azure file share as a datastore. You can choose to use either the SAS token or the storage account key.
Usage
register_azure_file_share_datastore(
workspace,
datastore_name,
file_share_name,
account_name,
sas_token = NULL,
account_key = NULL,
protocol = NULL,
endpoint = NULL,
overwrite = FALSE,
create_if_not_exists = FALSE,
skip_validation = FALSE
)
Arguments
workspace |
The |
datastore_name |
A string of the name of the datastore. The name must be case insensitive and can only contain alphanumeric characters and underscores. |
file_share_name |
A string of the name of the Azure file share. |
account_name |
A string of the storage account name. |
sas_token |
A string of the account SAS token. |
account_key |
A string of the storage account key. |
protocol |
A string of the protocol to use to connect to the
file store. If |
endpoint |
A string of the endpoint of the file store.
If |
overwrite |
If |
create_if_not_exists |
If |
skip_validation |
If |
Value
The AzureFileDatastore
object.
Details
In general we recommend Azure Blob storage over Azure File storage. Both standard and premium storage are available for blobs. Although more expensive, we suggest premium storage due to faster throughput speeds that may improve the speed of your training runs, particularly if you train against a large dataset.
Examples
## Not run:
ws <- load_workspace_from_config()
ds <- register_azure_file_share_datastore(ws,
datastore_name = 'mydatastore',
file_share_name = 'myazurefilesharename',
account_name = 'mystorageaccoutname',
account_key = 'mystorageaccountkey')
## End(Not run)
Initialize a new Azure PostgreSQL Datastore.
Description
Initialize a new Azure PostgreSQL Datastore.
Usage
register_azure_postgre_sql_datastore(
workspace,
datastore_name,
server_name,
database_name,
user_id,
user_password,
port_number = NULL,
endpoint = NULL,
overwrite = FALSE
)
Arguments
workspace |
The workspace this datastore belongs to. |
datastore_name |
The datastore name. |
server_name |
The PostgreSQL server name. |
database_name |
The PostgreSQL database name. |
user_id |
The User ID of the PostgreSQL server. |
user_password |
The User Password of the PostgreSQL server. |
port_number |
The Port Number of the PostgreSQL server. |
endpoint |
The endpoint of the PostgreSQL server. If NULL, defaults to postgres.database.azure.com. |
overwrite |
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is FALSE. |
Value
The azureml.data.azure_postgre_sql_datastore.AzurePostgreSqlDatastore
object.
Initialize a new Azure SQL database Datastore.
Description
Initialize a new Azure SQL database Datastore.
Usage
register_azure_sql_database_datastore(
workspace,
datastore_name,
server_name,
database_name,
tenant_id,
client_id,
client_secret,
resource_url = NULL,
authority_url = NULL,
endpoint = NULL,
overwrite = FALSE,
username = NULL,
password = NULL
)
Arguments
workspace |
The workspace this datastore belongs to. |
datastore_name |
The datastore name. |
server_name |
The SQL server name. |
database_name |
The SQL database name. |
tenant_id |
The Directory ID/Tenant ID of the service principal. |
client_id |
The Client ID/Application ID of the service principal. |
client_secret |
The secret of the service principal. |
resource_url |
The resource URL, which determines what operations will be performed on the SQL database store, if NULL, defaults to https://database.windows.net/. |
authority_url |
The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com. |
endpoint |
The endpoint of the SQL server. If NULL, defaults to database.windows.net. |
overwrite |
Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is FALSE. |
username |
The username of the database user to access the database. |
password |
The password of the database user to access the database. |
Value
The azureml.data.azure_sql_database_datastore.AzureSqlDatabaseDatastore
object.
Register a Dataset in the workspace
Description
Register the Dataset in the workspace, making it available to other users of the workspace.
Usage
register_dataset(
workspace,
dataset,
name,
description = NULL,
tags = NULL,
create_new_version = FALSE
)
Arguments
workspace |
The AzureML workspace in which the Dataset is to be registered. |
dataset |
The dataset to be registered. |
name |
The name of the Dataset in the workspace. |
description |
A description of the Dataset. |
tags |
Named list of tags to give the Dataset. Defaults to NULL. |
create_new_version |
Boolean to register the dataset as a new version under the specified name. |
Value
The registered Dataset object.
Registers AMLCompute as a parallel backend with the foreach package.
Description
Registers AMLCompute as a parallel backend with the foreach package.
Usage
register_do_azureml_parallel(workspace, compute_target)
Arguments
workspace |
The Workspace object which has the compute_target. |
compute_target |
The AMLCompute target to use for parallelization. |
Register an environment in the workspace
Description
The environment is automatically registered with your workspace when you
submit an experiment or deploy a web service. You can also manually register
the environment with register_environment()
. This operation makes the
environment into an entity that is tracked and versioned in the cloud, and
can be shared between workspace users.
Whe used for the first time in training or deployment, the environment is registered with the workspace, built, and deployed on the compute target. The environments are cached by the service. Reusing a cached environment takes much less time than using a new service or one that has bee updated.
Usage
register_environment(environment, workspace)
Arguments
environment |
The |
workspace |
The |
Value
The Environment
object.
Register a model to a given workspace
Description
Register a model to the given workspace. A registered model is a logical container for one or more files that make up your model. For example, if you have a model that's stored in multiple files, you can register them as a single model in your workspace. After registration, you can then download or deploy the registered model and receive all the files that were registered.
Models are identified by name and version. Each time you register a model with the same name as an existing one, your workspace's model registry assumes that it's a new version. The version is incremented, and the new model is registered under the same name.
Usage
register_model(
workspace,
model_path,
model_name,
datasets = NULL,
tags = NULL,
properties = NULL,
description = NULL,
child_paths = NULL,
sample_input_dataset = NULL,
sample_output_dataset = NULL,
resource_configuration = NULL
)
Arguments
workspace |
The |
model_path |
A string of the path on the local file system where
the model assets are located. This can be a direct pointer to a single
file or folder. If pointing to a folder, the |
model_name |
A string of the name to register the model with. |
datasets |
A list of two-element lists where the first element is the
dataset-model relationship and the second is the corresponding dataset, e.g.
|
tags |
A named list of key-value tags to give the model, e.g.
|
properties |
A named list of key-value properties to give the model,
e.g. |
description |
A string of the text description of the model. |
child_paths |
A list of strings of child paths of a folder specified
by |
sample_input_dataset |
Sample input dataset for the registered model. |
sample_output_dataset |
Sample output dataset for the registered model. |
resource_configuration |
'ResourceConfiguration“ object to run the registered model. |
Value
The Model
object.
Examples
Registering a model from a single file:
ws <- load_workspace_from_config() model <- register_model(ws, model_path = "my_model.rds", model_name = "my_model", datasets = list(list("training", train_dataset)))
See Also
Register a model for operationalization.
Description
Register a model for operationalization.
Usage
register_model_from_run(
run,
model_name,
model_path = NULL,
tags = NULL,
properties = NULL,
description = NULL,
datasets = NULL,
sample_input_dataset = NULL,
sample_output_dataset = NULL,
resource_configuration = NULL
)
Arguments
run |
The |
model_name |
The name of the model. |
model_path |
The relative cloud path to the model, for example,
"outputs/modelname". When not specified, |
tags |
A dictionary of key value tags to assign to the model. |
properties |
A dictionary of key value properties to assign to the model. These properties cannot be changed after model creation, however new key-value pairs can be added. |
description |
An optional description of the model. |
datasets |
A list of two-element lists where the first element is the
dataset-model relationship and the second is the corresponding dataset, e.g.
|
sample_input_dataset |
Sample input dataset for the registered model. |
sample_output_dataset |
Sample output dataset for the registered model. |
resource_configuration |
'ResourceConfiguration“ object to run the registered model. |
Value
The registered Model.
Examples
registered_model <- register_model_from_run(run = run, model_name = "my model", model_path = 'outputs/model.rds', tags = list("version" = "0"), datasets = list(list("training", train_dataset), list("validation", validation_dataset)), resource_configuration = resource_configuration(2, 2, 0))
See Also
Reload a local web service's entry script and dependencies
Description
This restarts the service's container with copies of updated assets,
including the entry script and local dependencies, but it does not
rebuild the underlying image. Accordingly, changes to the environment
will not be reflected in the reloaded local web service. To handle those
changes call update_local_webservice()
instead.
Usage
reload_local_webservice_assets(webservice, wait = FALSE)
Arguments
webservice |
The |
wait |
If |
Value
None
Initialize the ResourceConfiguration.
Description
Initialize the ResourceConfiguration.
Usage
resource_configuration(cpu = NULL, memory_in_gb = NULL, gpu = NULL)
Arguments
cpu |
The number of CPU cores to allocate for this resource. Can be a decimal. |
memory_in_gb |
The amount of memory (in GB) to allocate for this resource.
Can be a decimal If |
gpu |
The number of GPUs to allocate for this resource. |
Value
The ResourceConfiguration
object.
See Also
Examples
## Not run:
rc <- resource_configuration(2, 2, 0)
registered_model <- register_model_from_run(run, "my_model_name",
"path_to_my_model",
resource_configuration = rc)
## End(Not run)
Save a Dockerfile and dependencies from a ModelPackage
to
your local file system
Description
Download the Dockerfile, model, and other assets needed to build
an image locally from a created ModelPackage
.
This can only be used with a Dockerfile ModelPackage
(where
package_model()
was called with generate_dockerfile = TRUE
to
indicated that you wanted only the files and not a fully built image).
save_model_package_files()
downloads the files needed to build the
image to the output_directory
. The Dockerfile included in the saved
files references a base image stored in an Azure container registry.
When you build the image on your local Docker installation, you will
need the address, username, and password to authenticate to the registry.
You can get this information using get_model_package_container_registry()
.
Usage
save_model_package_files(package, output_directory)
Arguments
package |
The |
output_directory |
A string of the local directory that will be created to contain the contents of the package. |
Value
None
See Also
package_model()
, get_model_package_container_registry()
Manages authentication using a service principle instead of a user identity.
Description
Service Principal authentication is suitable for automated workflows like for CI/CD scenarios. This type of authentication decouples the authentication process from any specific user login, and allows for managed access control.
Usage
service_principal_authentication(
tenant_id,
service_principal_id,
service_principal_password,
cloud = "AzureCloud"
)
Arguments
tenant_id |
The string id of the active directory tenant that the service identity belongs to. |
service_principal_id |
The service principal ID string. |
service_principal_password |
The service principal password/key string. |
cloud |
The name of the target cloud. Can be one of "AzureCloud", "AzureChinaCloud", or "AzureUSGovernment". If no cloud is specified, "AzureCloud" is used. |
Value
ServicePrincipalAuthentication
object
Examples
Service principal authentication involves creating an App Registration in
Azure Active Directory. First, you generate a client secret, and then you grant
your service principal role access to your machine learning workspace. Then,
you use the ServicePrincipalAuthentication
object to manage your authentication flow.
svc_pr_password <- Sys.getenv("AZUREML_PASSWORD") svc_pr <- service_principal_authentication(tenant_id="my-tenant-id", service_principal_id="my-application-id", service_principal_password=svc_pr_password) ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>", auth = svc_pr)
See Also
get_workspace()
interactive_login_authentication()
Set the default datastore for a workspace
Description
Set the default datastore associated with the workspace.
Usage
set_default_datastore(workspace, datastore_name)
Arguments
workspace |
The |
datastore_name |
The name of the datastore to be set as default. |
Value
None
See Also
Add secrets to a keyvault
Description
Add a named list of secrets into the keyvault associated with the workspace.
Usage
set_secrets(keyvault, secrets)
Arguments
keyvault |
The |
secrets |
The named list of secrets to be added to the keyvault, where element name corresponds to the secret name. |
Value
None
Examples
## Not run:
ws <- load_workspace_from_config()
my_secret <- Sys.getenv("MY_SECRET")
keyvault <- get_default_keyvault(ws)
set_secrets(list("mysecret" = my_secret))
## End(Not run)
Skip file streams from the top of the dataset by the specified count.
Description
Skip file streams from the top of the dataset by the specified count.
Usage
skip_from_dataset(dataset, count)
Arguments
dataset |
The Dataset object. |
count |
The number of file streams to skip. |
Value
A new Dataset object representing the dataset with file streams skipped.
Splits the job into parallel tasks.
Description
Splits the job into parallel tasks.
Usage
split_tasks(args_list, node_count, process_count_per_node)
Arguments
args_list |
The list of arguments which are distributed across all the processes. |
node_count |
Number of nodes in the AmlCompute cluster. |
process_count_per_node |
Number of processes per node. |
Create an interactive logging run
Description
Create an interactive run that allows the user to log metrics and artifacts to a run locally.
Any metrics that are logged during the interactive run session are added to the run record in the experiment. If an output directory is specified, the contents of that directory is uploaded as run artifacts upon run completion.
This method is useful if you would like to add experiment
tracking and artifact logging to the corresponding run record
in Azure ML for local runs without have to submit an experiment
run to a compute target with submit_experiment()
.
Usage
start_logging_run(experiment, outputs = NULL, snapshot_directory = NULL)
Arguments
experiment |
The |
outputs |
(Optional) A string of the local path to an outputs directory to track. |
snapshot_directory |
(Optional) Directory to take snapshot of.
Setting to |
Value
The Run
object of the started run.
See Also
complete_run()
Examples
## Not run:
ws <- load_workspace_from_config()
exp <- experiment(ws, name = 'myexperiment')
run <- start_logging_run(exp)
log_metric_to_run("Accuracy", 0.9)
complete_run(run)
## End(Not run)
Submit an experiment and return the active child run
Description
Submit an experiment and return the active child run.
Usage
submit_child_run(parent_run, config = NULL, tags = NULL)
Arguments
parent_run |
The parent |
config |
The |
tags |
Tags to be added to the submitted run, e.g., "tag": "value". |
Value
A Run
object.
Submit an experiment and return the active created run
Description
submit_experiment()
is an asynchronous call to Azure Machine Learning
service to execute a trial on local or remote compute. Depending on the
configuration, submit_experiment()
will automatically prepare your
execution environments, execute your code, and capture your source code
and results in the experiment's run history.
To submit an experiment you first need to create a configuration object
describing how the experiment is to be run. The configuration depends on
the type of trial required. For a script run, provide an Estimator
object
to the config
parameter. For a HyperDrive run for hyperparameter tuning,
provide a HyperDriveConfig
to config
.
Usage
submit_experiment(experiment, config, tags = NULL)
Arguments
experiment |
The |
config |
The |
tags |
A named list of tags for the submitted run, e.g.
|
Value
The ScriptRun
or HyperDriveRun
object.
See Also
estimator()
, hyperdrive_config()
Examples
# This example submits an Estimator experiment
## Not run:
ws <- load_workspace_from_config()
compute_target <- get_compute(ws, cluster_name = 'mycluster')
exp <- experiment(ws, name = 'myexperiment')
est <- estimator(source_directory = '.',
entry_script = 'train.R',
compute_target = compute_target)
run <- submit_experiment(exp, est)
## End(Not run)
Take a sample of file streams from top of the dataset by the specified count.
Description
Take a sample of file streams from top of the dataset by the specified count.
Usage
take_from_dataset(dataset, count)
Arguments
dataset |
The Dataset object. |
count |
The number of file streams to take. |
Value
A new Dataset object representing the sampled dataset.
Take a random sample of file streams in the dataset approximately by the probability specified.
Description
Take a random sample of file streams in the dataset approximately by the probability specified.
Usage
take_sample_from_dataset(dataset, probability, seed = NULL)
Arguments
dataset |
The Dataset object. |
probability |
The probability of a file stream being included in the sample. |
seed |
An optional seed to use for the random generator. |
Value
A new Dataset object representing the sampled dataset.
Define a truncation selection policy for early termination of HyperDrive runs
Description
Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs are compared based on their performance on the primary metric and the lowest X% are terminated.
Usage
truncation_selection_policy(
truncation_percentage,
evaluation_interval = 1L,
delay_evaluation = 0L
)
Arguments
truncation_percentage |
An integer of the percentage of lowest performing runs to terminate at each interval. |
evaluation_interval |
An integer of the frequency for applying policy. |
delay_evaluation |
An integer of the number of intervals for which to delay the first evaluation. |
Value
The TruncationSelectionPolicy
object.
Details
This policy periodically cancels the given percentage of runs that rank the lowest for their performance on the primary metric. The policy strives for fairness in ranking the runs by accounting for improving model performance with training time. When ranking a relatively young run, the policy uses the corresponding (and earlier) performance of older runs for comparison. Therefore, runs aren't terminated for having a lower performance because they have run for less time than other runs.
The truncation selection policy takes the following configuration parameters:
-
truncation_percentage
: An integer of the percentage of lowest performing runs to terminate at each evaluation interval. -
evaluation_interval
: Optional. The frequency for applying the policy. Each time the training script logs the primary metric counts as one interval. -
delay_evaluation
: Optional. The number of intervals to delay the policy evaluation. Use this parameter to avoid premature termination of training runs. If specified, the policy applies every multiple ofevaluation_interval
that is greater than or equal todelay_evaluation
.
For example, when evaluating a run at a interval N, its performance is only compared with the performance of other runs up to interval N even if they reported metrics for intervals greater than N.
Examples
# In this example, the early termination policy is applied at every interval
# starting at evaluation interval 5. A run will be terminated at interval 5
# if its performance at interval 5 is in the lowest 20% of performance of all
# runs at interval 5
## Not run:
early_termination_policy = truncation_selection_policy(
truncation_percentage = 20L,
evaluation_interval = 1L,
delay_evaluation = 5L)
## End(Not run)
Specify a uniform distribution of options to sample from
Description
Specify a uniform distribution of options to sample the hyperparameters from.
Usage
uniform(min_value, max_value)
Arguments
min_value |
A double of the minimum value in the range (inclusive). |
max_value |
A double of the maximum value in the range (inclusive). |
Value
A list of the stochastic expression.
See Also
random_parameter_sampling()
, grid_parameter_sampling()
,
bayesian_parameter_sampling()
Unregister all versions under the registration name of this dataset from the workspace.
Description
Unregister all versions under the registration name of this dataset from the workspace.
Usage
unregister_all_dataset_versions(dataset)
Arguments
dataset |
The dataset to be unregistered. |
Value
None
Unregister a datastore from its associated workspace
Description
Unregister the datastore from its associated workspace. The underlying Azure storage will not be deleted.
Usage
unregister_datastore(datastore)
Arguments
datastore |
The |
Value
None
Update a deployed ACI web service
Description
Update an ACI web service with the provided properties. You can update the web service to use a new model, a new entry script, or new dependencies that can be specified in an inference configuration.
Values left as NULL
will remain unchanged in the web service.
Usage
update_aci_webservice(
webservice,
tags = NULL,
properties = NULL,
description = NULL,
auth_enabled = NULL,
ssl_enabled = NULL,
ssl_cert_pem_file = NULL,
ssl_key_pem_file = NULL,
ssl_cname = NULL,
enable_app_insights = NULL,
models = NULL,
inference_config = NULL
)
Arguments
webservice |
The |
tags |
A named list of key-value tags for the web service,
e.g. |
properties |
A named list of key-value properties to add for the web
service, e.g. |
description |
A string of the description to give the web service. |
auth_enabled |
If |
ssl_enabled |
Whether or not to enable SSL for this Webservice. |
ssl_cert_pem_file |
A string of the cert file needed if SSL is enabled. |
ssl_key_pem_file |
A string of the key file needed if SSL is enabled. |
ssl_cname |
A string of the cname if SSL is enabled. |
enable_app_insights |
If |
models |
A list of |
inference_config |
An |
Value
None
Update a deployed AKS web service
Description
Update an AKS web service with the provided properties. You can update the web service to use a new model, a new entry script, or new dependencies that can be specified in an inference configuration.
Values left as NULL
will remain unchanged in the web service.
Usage
update_aks_webservice(
webservice,
autoscale_enabled = NULL,
autoscale_min_replicas = NULL,
autoscale_max_replicas = NULL,
autoscale_refresh_seconds = NULL,
autoscale_target_utilization = NULL,
auth_enabled = NULL,
cpu_cores = NULL,
memory_gb = NULL,
enable_app_insights = NULL,
scoring_timeout_ms = NULL,
replica_max_concurrent_requests = NULL,
max_request_wait_time = NULL,
num_replicas = NULL,
tags = NULL,
properties = NULL,
description = NULL,
models = NULL,
inference_config = NULL,
gpu_cores = NULL,
period_seconds = NULL,
initial_delay_seconds = NULL,
timeout_seconds = NULL,
success_threshold = NULL,
failure_threshold = NULL,
namespace = NULL,
token_auth_enabled = NULL
)
Arguments
webservice |
The |
autoscale_enabled |
If |
autoscale_min_replicas |
An int of the minimum number of containers to use when autoscaling the web service. |
autoscale_max_replicas |
An int of the maximum number of containers to use when autoscaling the web service. |
autoscale_refresh_seconds |
An int of how often in seconds the autoscaler should attempt to scale the web service. |
autoscale_target_utilization |
An int of the target utilization (in percent out of 100) the autoscaler should attempt to maintain for the web service. |
auth_enabled |
If |
cpu_cores |
The number of cpu cores to allocate for
the web service. Can be a decimal. Defaults to |
memory_gb |
The amount of memory (in GB) to allocate for
the web service. Can be a decimal. Defaults to |
enable_app_insights |
If |
scoring_timeout_ms |
An int of the timeout (in milliseconds) to enforce for scoring calls to the web service. |
replica_max_concurrent_requests |
An int of the number of maximum concurrent requests per node to allow for the web service. |
max_request_wait_time |
An int of the maximum amount of time a request will stay in the queue (in milliseconds) before returning a 503 error. |
num_replicas |
An int of the number of containers to allocate for the web service. If this parameter is not set then the autoscaler is enabled by default. |
tags |
A named list of key-value tags for the web service,
e.g. |
properties |
A named list of key-value properties to add for the web
service, e.g. |
description |
A string of the description to give the web service. |
models |
A list of |
inference_config |
An |
gpu_cores |
An int of the number of gpu cores to allocate for the web service. |
period_seconds |
An int of how often in seconds to perform the
liveness probe. Minimum value is |
initial_delay_seconds |
An int of the number of seconds after the container has started before liveness probes are initiated. |
timeout_seconds |
An int of the number of seconds after which the
liveness probe times out. Minimum value is |
success_threshold |
An int of the minimum consecutive successes
for the liveness probe to be considered successful after having failed.
Minimum value is |
failure_threshold |
An int of the number of times Kubernetes will try
the liveness probe when a Pod starts and the probe fails, before giving up.
Minimum value is |
namespace |
A string of the Kubernetes namespace in which to deploy the web service: up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. The first last characters cannot be hyphens. |
token_auth_enabled |
If |
Value
None
Update scale settings for an AmlCompute cluster
Description
Update the scale settings for an existing AmlCompute cluster.
Usage
update_aml_compute(
cluster,
min_nodes = NULL,
max_nodes = NULL,
idle_seconds_before_scaledown = NULL
)
Arguments
cluster |
The |
min_nodes |
An integer of the minimum number of nodes to use on the cluster. |
max_nodes |
An integer of the maximum number of nodes to use on the cluster. |
idle_seconds_before_scaledown |
An integer of the node idle time in seconds before scaling down the cluster. |
Value
None
Update a local web service
Description
Update a local web service with the provided properties. You can update the web service to use a new model, a new entry script, or new dependencies that can be specified in an inference configuration.
Values left as NULL
will remain unchanged in the service.
Usage
update_local_webservice(
webservice,
models = NULL,
deployment_config = NULL,
wait = FALSE,
inference_config = NULL
)
Arguments
webservice |
The |
models |
A list of |
deployment_config |
A |
wait |
If |
inference_config |
An |
Value
None
Upload files to the Azure storage a datastore points to
Description
Upload the data from the local file system to the Azure storage that the datastore points to.
Usage
upload_files_to_datastore(
datastore,
files,
relative_root = NULL,
target_path = NULL,
overwrite = FALSE,
show_progress = TRUE
)
Arguments
datastore |
The |
files |
A character vector of the absolute path to files to upload. |
relative_root |
A string of the base path from which is used to
determine the path of the files in the Azure storage. For example, if
we upload |
target_path |
A string of the location in the blob container or file
share to upload the data to. Defaults to |
overwrite |
If |
show_progress |
If |
Value
The DataReference
object for the target path uploaded.
Upload files to a run
Description
Upload files to the run record.
Note: Runs automatically capture files in the specified output
directory, which defaults to "./outputs". Use upload_files_to_run()
only when additional files need to be uploaded or an output directory
is not specified.
Usage
upload_files_to_run(names, paths, timeout_seconds = NULL, run = NULL)
Arguments
names |
A character vector of the names of the files to upload. |
paths |
A character vector of relative local paths to the files to be upload. |
timeout_seconds |
An int of the timeout in seconds for uploading the files. |
run |
The |
Value
None
Examples
ws <- load_workspace_from_config() exp <- experiment(ws, name = 'myexperiment') # Start an interactive logging run run <- start_logging_run(exp) # Upload files to the run record filename1 <- "important_file_1" filename2 <- "important_file_2" upload_files_to_run(names = c(filename1, filename2), paths = c("path/on/disk/file_1.txt", "other/path/on/disk/file_2.txt")) # Download a file from the run record download_file_from_run(filename1, "file_1.txt")
See Also
upload_folder_to_run()
download_file_from_run()
download_files_from_run()
Upload a folder to a run
Description
Upload the specified folder to the given prefix name to the run record.
Note: Runs automatically capture files in the specified output
directory, which defaults to "./outputs". Use upload_folder_to_run()
only when additional files need to be uploaded or an output directory
is not specified.
Usage
upload_folder_to_run(name, path, run = NULL)
Arguments
name |
A string of the name of the folder of files to upload. |
path |
A string of the relative local path to the folder to upload. |
run |
The |
Value
None
Examples
ws <- load_workspace_from_config() exp <- experiment(ws, name = 'myexperiment') # Start an interactive logging run run <- start_logging_run(exp) # Upload folder to the run record upload_folder_to_run(name = "important_files", path = "path/on/disk") # Download a file from the run record download_file_from_run("important_files/existing_file.txt", "local_file.txt")
See Also
upload_files_to_run()
download_file_from_run()
download_files_from_run()
Upload a local directory to the Azure storage a datastore points to
Description
Upload a local directory to the Azure storage the datastore points to.
Usage
upload_to_datastore(
datastore,
src_dir,
target_path = NULL,
overwrite = FALSE,
show_progress = TRUE
)
Arguments
datastore |
The |
src_dir |
A string of the local directory to upload. |
target_path |
A string of the location in the blob container or
file share to upload the data to. Defaults to |
overwrite |
If |
show_progress |
If |
Value
The DataReference
object for the target path uploaded.
Initialize run details widget
Description
Initializes a ShinyApp in RStudio Viewer (or the default browser if Viewer is unavailable) showing details of the submitted run. If using RStudio, the plot will auto-update with information collected from the server. For more details about the run, click the web view link. The widget will stop running once the run has reached a terminal state: "Failed", "Completed", or "Canceled".
If you are running this method from an RMarkdown file, the run details table will show up in the code chunk output instead of the Viewer.
Usage
view_run_details(run, auto_refresh = TRUE)
Arguments
run |
Run object |
auto_refresh |
Boolean indicating whether or not widget should update run details automatically. The default is TRUE when using RStudio. |
Wait for a web service to finish deploying
Description
Automatically poll on the running web service deployment and wait for the web service to reach a terminal state. Will throw an exception if it reaches a non-successful terminal state.
Typically called after running deploy_model()
.
Usage
wait_for_deployment(webservice, show_output = FALSE)
Arguments
webservice |
The |
show_output |
If |
Value
None
See Also
deploy_model()
Wait for a model package to finish creating
Description
Wait for a model package creation to reach a terminal state.
Usage
wait_for_model_package_creation(package, show_output = FALSE)
Arguments
package |
The |
show_output |
If |
Value
None
Wait for a cluster to finish provisioning
Description
Wait for a cluster to finish provisioning. Typically invoked after a
create_aml_compute()
or create_aks_compute()
call.
Usage
wait_for_provisioning_completion(cluster, show_output = FALSE)
Arguments
cluster |
The |
show_output |
If |
Value
None
See Also
create_aml_compute()
, create_aks_compute()
Wait for the completion of a run
Description
Wait for the run to reach a terminal state. Typically called
after submitting an experiment run with submit_experiment()
.
Usage
wait_for_run_completion(run, show_output = TRUE)
Arguments
run |
The |
show_output |
If |
Value
None
See Also
Write out the workspace configuration details to a config file
Description
Write out the workspace ARM properties to a config file. Workspace ARM
properties can be loaded later using load_workspace_from_config()
.
The method provides a simple way of reusing the same workspace across
multiple files or projects. Users can save the workspace ARM properties
using this function, and use load_workspace_from_config()
to load the
same workspace in different files or projects without retyping the
workspace ARM properties.
Usage
write_workspace_config(workspace, path = NULL, file_name = NULL)
Arguments
workspace |
The |
path |
A string of the location to write the config.json file. The config file will be located in a directory called '.azureml'. The parameter defaults to the current working directory, so by default config.json will be located at '.azureml/'. |
file_name |
A string of the name to use for the config file. The
parameter defaults to |
Value
None