本文介绍如何使用 Terraform 配置文件创建 Azure 机器学习工作区。 使用 Terraform 基于模板的配置文件,能够以可重复、可预测的方式定义、创建和配置 Azure 资源。 Terraform 跟踪资源状态,并能够清理和销毁资源。
Terraform 配置文件是定义部署所需资源的文档。 Terraform 配置还可以指定部署变量,用于在应用配置时提供输入值。
先决条件
限制
创建新的工作区时,可以自动创建工作区所需的服务或使用现有的服务。 如果要使用来自不同于工作区所在的 Azure 订阅的现有服务,则必须在包含这些服务的订阅中注册 Azure 机器学习命名空间。 例如,在订阅 A 中创建一个使用订阅 B 中的存储帐户的工作区时,必须在订阅 B 中注册 Azure 机器学习命名空间,然后该工作区才能使用该存储帐户。
Azure 机器学习的资源提供程序是 Microsoft.MachineLearningServices。 有关如何查看是否已注册或需要注册命名空间,请参阅《Azure 资源提供程序和类型》。
重要
此信息仅适用于工作区创建期间提供的资源:Azure 存储帐户、Azure 容器注册表、Azure Key Vault 和 Application Insights。
创建工作区
创建包含以下代码的名为 main.tf 的文件。
data "azurerm_client_config" "current" {}
resource "azurerm_resource_group" "default" {
name = "${random_pet.prefix.id}-rg"
___location = var.___location
}
resource "random_pet" "prefix" {
prefix = var.prefix
length = 2
}
resource "random_integer" "suffix" {
min = 10000000
max = 99999999
}
在包含以下代码的名为 providers.tf 的文件中声明 Azure 提供程序。
terraform {
required_version = ">= 1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 3.0, < 4.0"
}
random = {
source = "hashicorp/random"
version = ">= 3.0"
}
}
}
provider "azurerm" {
features {
key_vault {
recover_soft_deleted_key_vaults = false
purge_soft_delete_on_destroy = false
purge_soft_deleted_keys_on_destroy = false
}
resource_group {
prevent_deletion_if_contains_resources = false
}
}
}
若要创建 Azure 机器学习工作区,请使用以下 Terraform 配置之一。 Azure 机器学习工作区需要其他各种服务作为依赖项。 该模板会指定这些关联的资源。 根据需要,可选择使用创建具有公用或专用网络连接的资源的模板。
注意
Azure 中的某些资源需要全局唯一的名称。 在部署资源之前,请确保将 name
变量设置为唯一值。
以下配置创建具有公共网络连接的工作区。
在名为 variables.tf 的文件中定义以下变量。
variable "environment" {
type = string
description = "Name of the environment"
default = "dev"
}
variable "___location" {
type = string
description = "Location of the resources"
default = "eastus"
}
variable "prefix" {
type = string
description = "Prefix of the resource name"
default = "ml"
}
在名为 workspace.tf 的文件中定义以下工作区配置:
# Dependent resources for Azure Machine Learning
resource "azurerm_application_insights" "default" {
name = "${random_pet.prefix.id}-appi"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
application_type = "web"
}
resource "azurerm_key_vault" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}kv"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
purge_protection_enabled = false
}
resource "azurerm_storage_account" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}st"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
account_tier = "Standard"
account_replication_type = "GRS"
allow_nested_items_to_be_public = false
}
resource "azurerm_container_registry" "default" {
name = "${var.prefix}${var.environment}${random_integer.suffix.result}cr"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
sku = "Premium"
admin_enabled = true
}
# Machine Learning workspace
resource "azurerm_machine_learning_workspace" "default" {
name = "${random_pet.prefix.id}-mlw"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
application_insights_id = azurerm_application_insights.default.id
key_vault_id = azurerm_key_vault.default.id
storage_account_id = azurerm_storage_account.default.id
container_registry_id = azurerm_container_registry.default.id
public_network_access_enabled = true
identity {
type = "SystemAssigned"
}
}
以下配置通过使用 Azure 专用链接终结点在隔离的网络环境中创建工作区。 该模板包括 专用域名系统 (DNS) 区域 以解析虚拟网络中的域名。
如果对 Azure 容器注册表和 Azure 机器学习都使用专用链接终结点,则无法使用容器注册表任务来生成环境映像。 而必须使用 Azure 机器学习计算群集来生成映像。
要配置使用的群集名,请设置 image_build_compute_name 参数。 可通过使用 public_network_access_enabled 来允许公开访问具有专用链接终结点的工作区。
在名为 variables.tf 的文件中定义以下变量。
variable "name" {
type = string
description = "Name of the deployment"
default = "examplehost"
}
variable "environment" {
type = string
description = "Name of the environment"
default = "dev"
}
variable "___location" {
type = string
description = "Location of the resources"
default = "East US"
}
variable "vnet_address_space" {
type = list(string)
description = "Address space of the virtual network"
default = ["10.0.0.0/16"]
}
variable "training_subnet_address_space" {
type = list(string)
description = "Address space of the training subnet"
default = ["10.0.1.0/24"]
}
variable "aks_subnet_address_space" {
type = list(string)
description = "Address space of the aks subnet"
default = ["10.0.2.0/23"]
}
variable "ml_subnet_address_space" {
type = list(string)
description = "Address space of the ML workspace subnet"
default = ["10.0.0.0/24"]
}
variable "dsvm_subnet_address_space" {
type = list(string)
description = "Address space of the DSVM subnet"
default = ["10.0.4.0/24"]
}
variable "bastion_subnet_address_space" {
type = list(string)
description = "Address space of the bastion subnet"
default = ["10.0.5.0/24"]
}
variable "image_build_compute_name" {
type = string
description = "Name of the compute cluster to be created and set to build docker images"
default = "image-builder"
}
# DSVM Variables
variable "dsvm_name" {
type = string
description = "Name of the Data Science VM"
default = "vmdsvm01"
}
variable "dsvm_admin_username" {
type = string
description = "Admin username of the Data Science VM"
default = "azureadmin"
}
variable "dsvm_host_password" {
type = string
description = "Password for the admin username of the Data Science VM"
default = "ChangeMe123!"
sensitive = true
}
在名为 workspace.tf 的文件中定义以下工作区配置:
# Dependent resources for Azure Machine Learning
resource "azurerm_application_insights" "default" {
name = "appi-${var.name}-${var.environment}"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
application_type = "web"
}
resource "random_string" "kv_prefix" {
length = 4
upper = false
special = false
numeric = false
}
resource "azurerm_key_vault" "default" {
name = "kv-${random_string.kv_prefix.result}-${var.environment}"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
purge_protection_enabled = true
network_acls {
default_action = "Deny"
bypass = "AzureServices"
}
}
resource "random_string" "sa_prefix" {
length = 4
upper = false
special = false
numeric = false
}
resource "azurerm_storage_account" "default" {
name = "st${random_string.sa_prefix.result}${var.environment}"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
account_tier = "Standard"
account_replication_type = "GRS"
network_rules {
default_action = "Deny"
bypass = ["AzureServices"]
}
}
resource "azurerm_container_registry" "default" {
name = "cr${var.name}${var.environment}"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
sku = "Premium"
admin_enabled = true
network_rule_set {
default_action = "Deny"
}
public_network_access_enabled = false
}
# Machine Learning workspace
resource "azurerm_machine_learning_workspace" "default" {
name = "mlw-${var.name}-${var.environment}"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
application_insights_id = azurerm_application_insights.default.id
key_vault_id = azurerm_key_vault.default.id
storage_account_id = azurerm_storage_account.default.id
container_registry_id = azurerm_container_registry.default.id
identity {
type = "SystemAssigned"
}
# Args of use when using an Azure Private Link configuration
public_network_access_enabled = false
image_build_compute_name = var.image_build_compute_name
depends_on = [
azurerm_private_endpoint.kv_ple,
azurerm_private_endpoint.st_ple_blob,
azurerm_private_endpoint.storage_ple_file,
azurerm_private_endpoint.cr_ple,
azurerm_subnet.snet-training
]
}
# Private endpoints
resource "azurerm_private_endpoint" "kv_ple" {
name = "ple-${var.name}-${var.environment}-kv"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsvault.id]
}
private_service_connection {
name = "psc-${var.name}-kv"
private_connection_resource_id = azurerm_key_vault.default.id
subresource_names = ["vault"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "st_ple_blob" {
name = "ple-${var.name}-${var.environment}-st-blob"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsstorageblob.id]
}
private_service_connection {
name = "psc-${var.name}-st"
private_connection_resource_id = azurerm_storage_account.default.id
subresource_names = ["blob"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "storage_ple_file" {
name = "ple-${var.name}-${var.environment}-st-file"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsstoragefile.id]
}
private_service_connection {
name = "psc-${var.name}-st"
private_connection_resource_id = azurerm_storage_account.default.id
subresource_names = ["file"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "cr_ple" {
name = "ple-${var.name}-${var.environment}-cr"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnscontainerregistry.id]
}
private_service_connection {
name = "psc-${var.name}-cr"
private_connection_resource_id = azurerm_container_registry.default.id
subresource_names = ["registry"]
is_manual_connection = false
}
}
resource "azurerm_private_endpoint" "mlw_ple" {
name = "ple-${var.name}-${var.environment}-mlw"
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
subnet_id = azurerm_subnet.snet-workspace.id
private_dns_zone_group {
name = "private-dns-zone-group"
private_dns_zone_ids = [azurerm_private_dns_zone.dnsazureml.id, azurerm_private_dns_zone.dnsnotebooks.id]
}
private_service_connection {
name = "psc-${var.name}-mlw"
private_connection_resource_id = azurerm_machine_learning_workspace.default.id
subresource_names = ["amlworkspace"]
is_manual_connection = false
}
}
# Compute cluster for image building required since the workspace is behind a vnet.
# For more details, see https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-create-secure-workspace#configure-image-builds.
resource "azurerm_machine_learning_compute_cluster" "image-builder" {
name = var.image_build_compute_name
___location = azurerm_resource_group.default.___location
vm_priority = "LowPriority"
vm_size = "Standard_DS2_v2"
machine_learning_workspace_id = azurerm_machine_learning_workspace.default.id
subnet_resource_id = azurerm_subnet.snet-training.id
scale_settings {
min_node_count = 0
max_node_count = 3
scale_down_nodes_after_idle_duration = "PT15M" # 15 minutes
}
identity {
type = "SystemAssigned"
}
}
在名为 network.tf 的 文件中定义以下网络配置:
# Virtual network
resource "azurerm_virtual_network" "default" {
name = "vnet-${var.name}-${var.environment}"
address_space = var.vnet_address_space
___location = azurerm_resource_group.default.___location
resource_group_name = azurerm_resource_group.default.name
}
resource "azurerm_subnet" "snet-training" {
name = "snet-training"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.training_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_subnet" "snet-aks" {
name = "snet-aks"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.aks_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
resource "azurerm_subnet" "snet-workspace" {
name = "snet-workspace"
resource_group_name = azurerm_resource_group.default.name
virtual_network_name = azurerm_virtual_network.default.name
address_prefixes = var.ml_subnet_address_space
enforce_private_link_endpoint_network_policies = true
}
创建并应用计划
若要创建工作区,请运行以下代码:
terraform init
terraform plan \
# -var <any of the variables set in variables.tf> \
-out demo.tfplan
terraform apply "demo.tfplan"
排查资源提供程序错误
创建 Azure 机器学习工作区或工作区使用的资源时,可能会收到类似于以下消息的错误:
No registered resource provider found for ___location {___location}
The subscription is not registered to use namespace {resource-provider-namespace}
大多数资源提供程序会自动注册,但并非全部。 如果收到此消息,则需要注册所提到的提供程序。
下表包含 Azure 机器学习所需的资源提供程序的列表:
资源提供程序 |
为什么需要它 |
Microsoft.MachineLearningServices |
创建 Azure 机器学习工作区。 |
Microsoft.Storage |
Azure 存储帐户用作该工作区的默认存储。 |
Microsoft.ContainerRegistry |
Azure 容器注册表被工作区用来生成 Docker 映像。 |
Microsoft.KeyVault |
该工作区使用 Azure Key Vault 来存储机密。 |
Microsoft.Notebooks |
Azure 机器学习计算实例上集成的笔记本。 |
Microsoft.ContainerService |
如果计划将训练后的模型部署到 Azure Kubernetes 服务。 |
如果计划将客户管理的密钥与 Azure 机器学习一起使用,则必须注册以下服务提供程序:
资源提供程序 |
为什么需要它 |
Microsoft.DocumentDB |
用于记录工作区元数据的 Azure CosmosDB 实例。 |
Microsoft.Search |
Azure 搜索为工作区提供索引编制功能。 |
如果打算将托管虚拟网络与 Azure 机器学习配合使用,必须注册 Microsoft.Network 资源提供程序。 为托管虚拟网络创建专用终结点时,工作区会使用此资源提供程序。
有关注册资源提供程序的信息,请参阅解决资源提供程序注册错误。