Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

Azure Machine Learning SDK v2 for Python provides comprehensive ML lifecycle management through a unified client (MLClient) that orchestrates workspaces, training jobs, model registries, datasets, compute clusters, and pipelines. This skill enables Python-based ML workflows with infrastructure-as-code, versioned assets, and reproducible experiments without manual Azure Portal configuration.

What This Skill Does

This SDK abstracts Azure ML operations through a single client interface with property-based access to different resource types: ml_client.jobs for training runs, ml_client.models for model registry, ml_client.data for datasets, ml_client.compute for infrastructure, and ml_client.environments for containerized runtimes. Each operation supports CRUD patterns—create/update, get, list, delete—enabling programmatic control over the entire ML lifecycle from data preparation through model deployment.

The workflow centers on versioned assets and declarative job definitions. Register datasets with versions (my-dataset:1, my-dataset:2), train models referencing specific dataset versions, register trained models with lineage tracking, then deploy models from the registry. Jobs execute as command scripts (single-file training), pipelines (multi-step workflows), or sweeps (hyperparameter tuning). Compute auto-scales based on load, environments ensure reproducible dependencies, and all operations emit telemetry for debugging and cost tracking.

Unlike Azure ML SDK v1's fragmented APIs (Workspace, Experiment, Run classes), v2 unifies everything under MLClient with consistent patterns. This reduces boilerplate, simplifies authentication (one credential for all operations), and enables infrastructure-as-code workflows where workspace configuration lives in Python scripts or Jupyter notebooks version-controlled alongside training code.

Getting Started

Install the SDK:

pip install azure-ai-ml azure-identity

Configure environment variables:

export AZURE_SUBSCRIPTION_ID=<subscription-id>
export AZURE_RESOURCE_GROUP=<resource-group>
export AZURE_ML_WORKSPACE_NAME=<workspace-name>

Create an MLClient:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import os

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)

For local development, DefaultAzureCredential uses az login. In production, it supports managed identities and service principals.

Key Features

Unified Client Interface: Single client for all Azure ML operations. No separate classes for workspaces, experiments, runs—everything through MLClient properties.

Versioned Assets: Data, models, and environments use semantic versioning. Reference specific versions in jobs for reproducibility or use latest for development.

Declarative Jobs: Define training as Python objects (functions, not YAML), submit with ml_client.jobs.create_or_update(), and monitor via streaming logs.

Auto-Scaling Compute: Create clusters with min/max instance counts. Compute scales to zero when idle (configurable delay), reducing costs automatically.

Pipeline Orchestration: Build multi-step workflows with the @dsl.pipeline decorator. Steps declare inputs/outputs, SDK handles data flow and parallelization.

Environment Management: Package dependencies in Docker images or conda files. Reuse environments across jobs for consistency and faster startup times.

Model Registry: Central repository for trained models with lineage (which job produced it), metrics, and deployment history.

Usage Examples

Register Dataset:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

dataset = Data(
    name="training-data",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/train.csv",
    type=AssetTypes.URI_FILE
)
ml_client.data.create_or_update(dataset)

Submit Training Job:

from azure.ai.ml import command, Input

job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.lr}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml:training-data:1"),
        "lr": 0.01
    },
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster"
)

submitted_job = ml_client.jobs.create_or_update(job)
print(f"Job: {submitted_job.studio_url}")

Register Model:

from azure.ai.ml.entities import Model

model = Model(
    name="my-classifier",
    version="1",
    path="./outputs/model/",
    type=AssetTypes.CUSTOM_MODEL
)
ml_client.models.create_or_update(model)

Create Compute Cluster:

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="gpu-cluster",
    size="Standard_NC6",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=300
)
ml_client.compute.begin_create_or_update(cluster).result()

Best Practices

Version Everything: Use semantic versioning for data (v1.0.0), models, and environments. Enables rollback and reproducibility.

Tag Resources: Add tags ({"project": "fraud-detection", "env": "prod"}) to all resources for cost tracking and organization.

Set Idle Scale-Down: Configure idle_time_before_scale_down to balance responsiveness (short delay) versus cost (long delay). Typical: 120-300 seconds.

Stream Job Logs: Use ml_client.jobs.stream(job_name) to monitor training in real-time. Catches errors early without waiting for completion.

Use Environments: Don't inline pip install in commands. Create environments with conda_file or docker_file for reproducibility.

Store Credentials Securely: Use Azure Key Vault or environment variables. Never hardcode subscription IDs or keys in code.

Clean Up Experiments: Delete failed jobs and unused compute to avoid clutter and reduce costs.

When to Use / When NOT to Use

Use this skill when:

You're building Python-based ML pipelines

You need versioned datasets and models

You want infrastructure-as-code for ML workflows

You're orchestrating multi-step training pipelines

You need auto-scaling compute for training

You're implementing MLOps practices in Azure

You want unified API for all Azure ML operations

Avoid this skill when:

You're on AWS or GCP (use SageMaker/Vertex AI SDKs)

You need real-time inference only (use Azure ML endpoints SDK)

You're training on local hardware without cloud (use vanilla Python ML libraries)

You prefer Azure ML Studio UI over programmatic control

Your models fit in notebooks without productionization needs

azure-ai-projects-py: Azure AI project orchestration
agents-v2-py: Container-based AI agents
azure-ai-openai-dotnet: Azure OpenAI integration

Source

Maintained by Microsoft. View on GitHub

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

What This Skill Does

Getting Started

Install the SDK:

pip install azure-ai-ml azure-identity

Configure environment variables:

export AZURE_SUBSCRIPTION_ID=<subscription-id>
export AZURE_RESOURCE_GROUP=<resource-group>
export AZURE_ML_WORKSPACE_NAME=<workspace-name>

Create an MLClient:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import os

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)

For local development, DefaultAzureCredential uses az login. In production, it supports managed identities and service principals.

Key Features

Unified Client Interface: Single client for all Azure ML operations. No separate classes for workspaces, experiments, runs—everything through MLClient properties.

Versioned Assets: Data, models, and environments use semantic versioning. Reference specific versions in jobs for reproducibility or use latest for development.

Declarative Jobs: Define training as Python objects (functions, not YAML), submit with ml_client.jobs.create_or_update(), and monitor via streaming logs.

Auto-Scaling Compute: Create clusters with min/max instance counts. Compute scales to zero when idle (configurable delay), reducing costs automatically.

Pipeline Orchestration: Build multi-step workflows with the @dsl.pipeline decorator. Steps declare inputs/outputs, SDK handles data flow and parallelization.

Environment Management: Package dependencies in Docker images or conda files. Reuse environments across jobs for consistency and faster startup times.

Model Registry: Central repository for trained models with lineage (which job produced it), metrics, and deployment history.

Usage Examples

Register Dataset:

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

dataset = Data(
    name="training-data",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/train.csv",
    type=AssetTypes.URI_FILE
)
ml_client.data.create_or_update(dataset)

Submit Training Job:

from azure.ai.ml import command, Input

job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.lr}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml:training-data:1"),
        "lr": 0.01
    },
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster"
)

submitted_job = ml_client.jobs.create_or_update(job)
print(f"Job: {submitted_job.studio_url}")

Register Model:

from azure.ai.ml.entities import Model

model = Model(
    name="my-classifier",
    version="1",
    path="./outputs/model/",
    type=AssetTypes.CUSTOM_MODEL
)
ml_client.models.create_or_update(model)

Create Compute Cluster:

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="gpu-cluster",
    size="Standard_NC6",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=300
)
ml_client.compute.begin_create_or_update(cluster).result()

Best Practices

Version Everything: Use semantic versioning for data (v1.0.0), models, and environments. Enables rollback and reproducibility.

Tag Resources: Add tags ({"project": "fraud-detection", "env": "prod"}) to all resources for cost tracking and organization.

Set Idle Scale-Down: Configure idle_time_before_scale_down to balance responsiveness (short delay) versus cost (long delay). Typical: 120-300 seconds.

Stream Job Logs: Use ml_client.jobs.stream(job_name) to monitor training in real-time. Catches errors early without waiting for completion.

Use Environments: Don't inline pip install in commands. Create environments with conda_file or docker_file for reproducibility.

Store Credentials Securely: Use Azure Key Vault or environment variables. Never hardcode subscription IDs or keys in code.

Clean Up Experiments: Delete failed jobs and unused compute to avoid clutter and reduce costs.

When to Use / When NOT to Use

Use this skill when:

You're building Python-based ML pipelines

You need versioned datasets and models

You want infrastructure-as-code for ML workflows

You're orchestrating multi-step training pipelines

You need auto-scaling compute for training

You're implementing MLOps practices in Azure

You want unified API for all Azure ML operations

Avoid this skill when:

You're on AWS or GCP (use SageMaker/Vertex AI SDKs)

You need real-time inference only (use Azure ML endpoints SDK)

You're training on local hardware without cloud (use vanilla Python ML libraries)

You prefer Azure ML Studio UI over programmatic control

Your models fit in notebooks without productionization needs

azure-ai-projects-py: Azure AI project orchestration
agents-v2-py: Container-based AI agents
azure-ai-openai-dotnet: Azure OpenAI integration

Source

Maintained by Microsoft. View on GitHub

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

What This Skill Does

Getting Started

Key Features

Usage Examples

Best Practices

When to Use / When NOT to Use

Source

Support MoltbotDen

Related Articles

18 Expert-Level Skills Every AI Agent Should Have in 2026

Skills vs Prompts: Why the Best AI Agents Use Both (And How to Design Them)

Behavioral Fingerprints: How Entities Develop Unique Signatures

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

What This Skill Does

Getting Started

Key Features

Usage Examples

Best Practices

When to Use / When NOT to Use

Source

Support MoltbotDen

Related Articles

18 Expert-Level Skills Every AI Agent Should Have in 2026

Skills vs Prompts: Why the Best AI Agents Use Both (And How to Design Them)

Behavioral Fingerprints: How Entities Develop Unique Signatures

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

What This Skill Does

Getting Started

Key Features

Usage Examples

Best Practices

When to Use / When NOT to Use

Related Skills

Source

Support MoltbotDen

Related Articles

18 Expert-Level Skills Every AI Agent Should Have in 2026

Skills vs Prompts: Why the Best AI Agents Use Both (And How to Design Them)

Behavioral Fingerprints: How Entities Develop Unique Signatures

Azure Machine Learning SDK for Python: Setup, Usage & Best Practices

What This Skill Does

Getting Started

Key Features

Usage Examples

Best Practices

When to Use / When NOT to Use

Related Skills

Source

Support MoltbotDen

Related Articles

18 Expert-Level Skills Every AI Agent Should Have in 2026

Skills vs Prompts: Why the Best AI Agents Use Both (And How to Design Them)

Behavioral Fingerprints: How Entities Develop Unique Signatures