Deploy models as serverless API deployments

2025-06-20

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn how to deploy an Azure AI Foundry Model as a serverless API deployment. Certain models in the model catalog can be deployed as a serverless API deployment. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.

Although serverless API deployment is one option for deploying Azure AI Foundry Models, we recommend that you deploy Foundry Models to Azure AI Foundry resources.

Note

We recommend that you deploy Azure AI Foundry Models to Azure AI Foundry resources so that you can consume your deployments in the resource via a single endpoint with the same authentication and schema to generate inference. The endpoint follows the Azure AI Model Inference API which all the Foundry Models support. To learn how to deploy a Foundry Model to the Azure AI Foundry resources, see Add and configure models to Azure AI Foundry Models.

Prerequisites

An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
If you don't have one, create a hub based project.
Ensure that the Deploy models to Azure AI Foundry resources (preview) feature is turned off in the Azure AI Foundry portal. When this feature is on, serverless API deployments aren't available from the portal.
Foundry Models from Partners and Community require access to Azure Marketplace, while Foundry Models Sold Directly by Azure don't have this requirement. Ensure you have the permissions required to subscribe to model offerings in Azure Marketplace.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Foundry portal.

You can use any compatible web browser to navigate Azure AI Foundry.

Find your model in the model catalog

Sign in to Azure AI Foundry.
If you’re not already in your project, select it.
Select Model catalog from the left pane.

Models sold directly by Azure
Models from Partners and Community

Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.
Select Use this model to open the Serverless API deployment window where you can view the Pricing and terms tab.
In the deployment wizard, name the deployment. The Content filter (preview) option is enabled by default. Leave the default setting for the service to detect harmful content such as hate, self-harm, sexual, and violent content. For more information about content filtering, see Content filtering in Azure AI Foundry portal.

Deploy the model to a serverless API

In this section, you create an endpoint for your model.

In the deployment wizard, select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page.
To see the endpoints deployed to your project, in the My assets section of the left pane, select Models + endpoints.
The created endpoint uses key authentication for authorization. To get the keys associated with a given endpoint, follow these steps:
1. Select the deployment, and note the endpoint's Target URI and Key.
2. Use these credentials to call the deployment and generate predictions.
If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

Tip

If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Models deployed in Azure Machine Learning and Azure AI Foundry in serverless API deployments support the Azure AI Foundry Models API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

Tip

Because you can customize the left pane in the Azure AI Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

To delete a serverless API deployment:

Go to the Azure AI Foundry.
Go to your project.
In the My assets section, select Models + endpoints.
Open the deployment you want to delete.
Select Delete.

To delete the associated model subscription:

Go to the Azure portal
Navigate to the resource group where the project belongs.
On the Type filter, select SaaS.
Select the subscription you want to delete.
Select Delete.

To work with Azure AI Foundry, install the Azure CLI and the ml extension for Azure Machine Learning.
```
az extension add -n ml
```
If you already have the extension installed, ensure the latest version is installed.
```
az extension update -n ml
```
Once the extension is installed, configure it:
```
az account set --subscription <subscription>
az configure --defaults workspace=<project-name> group=<resource-group> ___location=<___location>
```

Find your model in the model catalog

Sign in to Azure AI Foundry.
If you’re not already in your project, select it.
Select Model catalog from the left pane.

Models sold directly by Azure
Models from Partners and Community

Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

Select the model card of the model you want to deploy. In this article, you select Cohere-command-r-08-2024.

Note

Models from Partners and Community are offered through Azure Marketplace. For these models, ensure that your account has the Azure AI Developer role permissions on the resource group, or that you meet the permissions required to subscribe to model offerings, as you're required to subscribe your project to the particular model offering.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024/versions/1, copy azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024.

For models from partners and community, for example, Cohere-command-r-08-2024, you must create a subscription before you can deploy them. If it's your first time deploying the model in the project, you have to subscribe your project for the particular model offering from Azure Marketplace. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Once you subscribe a project for the particular Azure Marketplace offering, subsequent deployments of the same offering in the same project don't require subscribing again.

Furthermore, models offered through Azure Marketplace are available for deployment to serverless API deployment in specific regions. Check regions that are supported for serverless deployment to verify available regions for the particular model. If the region in which your project is located isn't listed, you can deploy to a project in a supported region and then consume serverless API deployment from a different project.

Create the model's marketplace subscription. When you create a subscription, you accept the terms and conditions associated with the model offer.

subscription.yml
```
name: Cohere-command-r-08-2024-qwerty
model_id: azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024
```
Use the previous file to create the subscription:
```
az ml marketplace-subscription create -f subscription.yml
```
(Optional) At any point, you can see the model offers to which your project is currently subscribed:
```
az ml marketplace-subscription list
```

The steps in this section of the article use the DeepSeek-R1 model for illustration. The steps are the same, whether you're using Foundry Models sold directly by Azure or Foundry Models from partners and community. For example, if you choose to deploy the Cohere-command-r-08-2024 model instead, you can replace the model credentials in the code snippets with the credentials for Cohere.

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint DeepSeek-R1-qwerty.

Create the serverless endpoint.

endpoint.yml

name: DeepSeek-R1-qwerty
model_id: azureml://registries/azureml-deepseek/models/DeepSeek-R1

Use the endpoint.yml file to create the endpoint:

az ml serverless-endpoint create -f endpoint.yml

At any point, you can see the endpoints deployed to your project:
```
az ml serverless-endpoint list
```
The created endpoint uses key authentication for authorization. Use the following steps to get the keys associated with a given endpoint.
```
az ml serverless-endpoint get-credentials -n DeepSeek-R1-qwerty
```
If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

Tip

If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

To delete a serverless API deployment:

az ml serverless-endpoint delete \
    --name "DeepSeek-R1-qwerty"

To delete the associated model subscription:

az ml marketplace-subscription delete \
    --name "DeepSeek-R1"

To work with Azure AI Foundry, install the Azure Machine Learning SDK for Python.

pip install -U azure-ai-ml

Once installed, import necessary namespaces and create a client connected to your project:

from azure.ai.ml import MLClient
from azure.identity import InteractiveBrowserCredential
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint

client = MLClient(
    credential=InteractiveBrowserCredential(tenant_id="<tenant-id>"),
    subscription_id="<subscription-id>",
    resource_group_name="<resource-group>",
    workspace_name="<project-name>",
)

Find your model in the model catalog

Sign in to Azure AI Foundry.
If you’re not already in your project, select it.
Select Model catalog from the left pane.

Models sold directly by Azure
Models from Partners and Community

Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

Select the model card of the model you want to deploy. In this article, you select Cohere-command-r-08-2024.

Note

Models from Partners and Community are offered through Azure Marketplace. For these models, ensure that your account has the Azure AI Developer role permissions on the resource group, or that you meet the permissions required to subscribe to model offerings, as you're required to subscribe your project to the particular model offering.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024/versions/1, copy azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024.

Create the model's marketplace subscription. When you create a subscription, you accept the terms and conditions associated with the model offer.

model_id="azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024"
subscription_name="Cohere-command-r-08-2024"

marketplace_subscription = MarketplaceSubscription(
    model_id=model_id,
    name=subscription_name,
)

marketplace_subscription = client.marketplace_subscriptions.begin_create_or_update(
    marketplace_subscription
).result()

(Optional) At any point, you can see the model offers to which your project is currently subscribed:

marketplace_sub_list = client.marketplace_subscriptions.list()

for sub in marketplace_sub_list:
    print(sub.as_dict())

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint DeepSeek-R1-qwerty.

Create the serverless endpoint.

endpoint_name="DeepSeek-R1-qwerty"

serverless_endpoint = ServerlessEndpoint(
    name=endpoint_name,
    model_id=model_id
)

created_endpoint = client.serverless_endpoints.begin_create_or_update(
    serverless_endpoint
).result()

At any point, you can see the endpoints deployed to your project:

endpoint_name="DeepSeek-R1-qwerty"

serverless_endpoint = ServerlessEndpoint(
    name=endpoint_name,
    model_id=model_id
)

created_endpoint = client.serverless_endpoints.begin_create_or_update(
    serverless_endpoint
).result()

The created endpoint uses key authentication for authorization. Use the following steps to get the keys associated with a given endpoint.

endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)
print(endpoint_keys.primary_key)
print(endpoint_keys.secondary_key)

If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

Tip

If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

client.serverless_endpoints.begin_delete(endpoint_name).wait()

To delete the associated model subscription:

client.marketplace_subscriptions.begin_delete(subscription_name).wait()

To work with Azure AI Foundry, install the Azure CLI as described at Azure CLI.

Configure the following environment variables according to your settings:
```
RESOURCE_GROUP="serverless-models-dev"
LOCATION="eastus2" 
```

Find your model in the model catalog

Sign in to Azure AI Foundry.
If you’re not already in your project, select it.
Select Model catalog from the left pane.

Models sold directly by Azure
Models from Partners and Community

Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

Select the model card of the model you want to deploy. In this article, you select Cohere-command-r-08-2024.

Note

Models from Partners and Community are offered through Azure Marketplace. For these models, ensure that your account has the Azure AI Developer role permissions on the resource group, or that you meet the permissions required to subscribe to model offerings, as you're required to subscribe your project to the particular model offering.
Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024/versions/1, copy azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024.

Use the following bicep configuration to create a model subscription. When you create a subscription, you accept the terms and conditions associated with the model offer.

model-subscription.bicep

param projectName string = 'my-project'
param modelId string = 'azureml://registries/azureml-cohere/models/Cohere-command-r-08-2024'

var modelName = substring(modelId, (lastIndexOf(modelId, '/') + 1))
var subscriptionName = '${modelName}-subscription'

resource projectName_subscription 'Microsoft.MachineLearningServices/workspaces/marketplaceSubscriptions@2024-04-01-preview' = if (!startsWith(
  modelId,
  'azureml://registries/azureml/'
)) {
  name: '${projectName}/${subscriptionName}'
  properties: {
    modelId: modelId
  }
}

Then create the resource as follows:

az deployment group create --resource-group $RESOURCE_GROUP --template-file model-subscription.bicep

(Optional) At any point, you can see the model offers to which your project is currently subscribed. You can use the resource management tools to query the resources. The following code uses Azure CLI:
```
az resource list \
    --query "[?type=='Microsoft.SaaS']"
```

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint myserverless-text-1234ss.

Create the serverless endpoint. Use the following template to create an endpoint:

serverless-endpoint.bicep

param projectName string = 'my-project'
param endpointName string = 'myserverless-text-1234ss'
param ___location string = resourceGroup().___location
param modelId string = 'azureml://registries/azureml-deepseek/models/DeepSeek-R1'

var modelName = substring(modelId, (lastIndexOf(modelId, '/') + 1))
var subscriptionName = '${modelName}-subscription'

resource projectName_endpoint 'Microsoft.MachineLearningServices/workspaces/serverlessEndpoints@2024-04-01-preview' = {
  name: '${projectName}/${endpointName}'
  ___location: ___location
  sku: {
    name: 'Consumption'
  }
  properties: {
    modelSettings: {
      modelId: modelId
    }
  }
  dependsOn: [
    projectName_subscription
  ]
}

output endpointUri string = projectName_endpoint.properties.inferenceEndpoint.uri

Create the deployment as follows:

az deployment group create --resource-group $RESOURCE_GROUP --template-file model-subscription.bicep

At any point, you can see the endpoints deployed to your project:

You can use the resource management tools to query the resources. The following code uses Azure CLI:
```
az resource list \
    --query "[?type=='Microsoft.MachineLearningServices/workspaces/serverlessEndpoints']"
```
The created endpoint uses key authentication for authorization. Get the keys associated with the given endpoint by using REST APIs to query this information.
If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

Tip

If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

You can use the resource management tools to manage the resources. The following code uses Azure CLI:

az resource delete --name <resource-name>

Cost and quota considerations for Foundry Models deployed as a serverless API deployment

Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. Additionally, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.

You can find pricing information for Models Sold Directly by Azure, on the Pricing and terms tab of the Serverless API deployment window.
Models from Partners and Community are offered through Azure Marketplace and integrated with Azure AI Foundry for use. You can find Azure Marketplace pricing when deploying or fine-tuning these models. Each time a project subscribes to a given offer from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently. For more information on how to track costs, see Monitor costs for models offered through Azure Marketplace.

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the Owner, Contributor, or Azure AI Developer role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:

On the Azure subscription—to subscribe the workspace to Azure Marketplace offering, once for each workspace, per offering:
- Microsoft.MarketplaceOrdering/agreements/offers/plans/read
- Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
- Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
- Microsoft.SaaS/register/action
On the resource group—to create and use the SaaS resource:
- Microsoft.SaaS/resources/read
- Microsoft.SaaS/resources/write
On the workspace—to deploy endpoints (the Azure Machine Learning data scientist role contains these permissions already):
- Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
- Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/*

For more information on permissions, see Role-based access control in Azure AI Foundry portal.

Share via

Deploy models as serverless API deployments

Prerequisites

Find your model in the model catalog

Deploy the model to a serverless API

Use the serverless API deployment

Delete endpoints and subscriptions

Find your model in the model catalog

Deploy the model to a serverless API

Use the serverless API deployment

Delete endpoints and subscriptions

Find your model in the model catalog

Deploy the model to a serverless API

Use the serverless API deployment

Delete endpoints and subscriptions

Find your model in the model catalog

Deploy the model to a serverless API

Use the serverless API deployment

Delete endpoints and subscriptions

Cost and quota considerations for Foundry Models deployed as a serverless API deployment

Permissions required to subscribe to model offerings

Related content

Feedback

Additional resources