Azure Bash Infra Microsoft

Azure Monitor Pipeline Is GA: A Practical Lab Guide

I’ll be honest: when I saw the Azure Monitor Pipeline go GA, my first reaction was “oh, another collector agent.” I almost scrolled past it.

That would have been a mistake.

After spending time with the docs and actually running the lab (including several dead ends I’ll tell you about), I think this is one of the more interesting GA announcements in the Azure observability space this year. Not because of any individual feature, but because of what it changes about where you process telemetry and why that matters more than it sounds.

The real problem it’s solving

Most coverage of Azure Monitor Pipeline leads with scale numbers. “Handle millions of events per second!” Which, fine, that’s real. But the more interesting story is architectural.

Here’s a pattern I’ve seen more times than I can count: a team connects their firewalls and Linux servers to Sentinel or Log Analytics, and six months later they’re staring at a bill that’s double what they expected. The logs are flowing. The dashboards look impressive. Half of what’s being ingested is noise they never query.

The problem isn’t the cloud. The problem is that by the time telemetry hits Azure, it’s too late to do anything useful with it. You’ve already paid for the ingestion. You’ve already stored the junk. All the filtering and parsing happens downstream, after the cost is incurred.

Azure Monitor Pipeline flips that. It sits between your telemetry sources and Azure, running on an Arc-enabled Kubernetes cluster in your local environment. The processing (filtering, transformation, schema normalisation) happens before the data leaves your site.

That shift in where things happen is the whole point.

What it actually is

The pipeline is a containerised solution built on OpenTelemetry Collector, deployed on an Arc-enabled Kubernetes cluster. It’s not another agent. The Azure Monitor Agent collects telemetry from individual machines. The pipeline answers a different question: how do you ingest telemetry from across an environment through a single centralised point, while keeping control over reliability, security, and cost?

The GA feature set covers:

  • Automatic schematisation: logs land in standard Azure tables like Syslog and CommonSecurityLog without custom parsing downstream
  • Local buffering with backfill: if connectivity drops, the pipeline queues locally and syncs when it’s restored
  • Edge filtering and aggregation: drop what you don’t need before it hits Azure
  • mTLS with automated cert provisioning: cert-manager handles certificate lifecycle, including zero-downtime rotation
  • Pipeline health monitoring: CPU, memory, and throughput visible via Azure Monitor itself

And it’s included at no additional cost for ingesting telemetry into Azure Monitor and Microsoft Sentinel.

A different way to think about it

Every serious observability practice eventually develops informal rules: we don’t ship debug logs from prod, we sample noisy services at 10%, syslog from network devices goes through a normaliser before Sentinel. These rules usually live in scripts, a custom forwarder someone wrote three years ago that nobody wants to touch, or a Logstash config that’s become load-bearing.

Azure Monitor Pipeline is a first-class place to put those rules. Structured, monitored, centrally managed, and version-controllable as configuration. For anyone running Sentinel in a regulated environment, that framing makes it more interesting than the scale numbers alone.

The lab

I want to be direct about what this lab shows and what it doesn’t. Getting data flowing into Log Analytics involves more moving parts than the docs initially suggest. I’ll walk through what actually works, including the parts that surprised me.

What you need:

  • An Azure subscription
  • A Linux VM (I used Ubuntu 22.04, Standard_B2s)
  • Azure CLI (Cloud Shell works fine)

Important before you start: region support is limited. At the time of writing, the pipeline extension only works in: East US2, Canada Central, Italy North, West US2. West Europe is listed in some docs but doesn’t work. Use East US2.

The architecture

Linux VM
├── K3s (single-node Kubernetes)
├── Azure Arc (connects K3s to Azure)
├── Pipeline extension (cert-manager + pipeline operator)
└── Pipeline group (syslog receiver → processor → Azure Monitor exporter)

Azure Monitor (DCE → DCR → Log Analytics)

Phase 1: Set variables

Work in Bash throughout. Set these once at the start of your Cloud Shell session:

RG="rg-amp-lab"
LOCATION="eastus2"
VM_NAME="vm-amp-lab"
ARC_CLUSTER="arc-amp-lab"
WORKSPACE_NAME="law-amp-lab"
PIPELINE_NAME="pipeline-amp-lab"

Phase 2: Deploy the VM and workspace

az group create --name $RG --location $LOCATION
az monitor log-analytics workspace create \
--resource-group $RG \
--workspace-name $WORKSPACE_NAME \
--location $LOCATION &
az vm create \
--resource-group $RG \
--name $VM_NAME \
--image Ubuntu2204 \
--size Standard_B2s \
--admin-username azureuser \
--generate-ssh-keys \
--public-ip-sku Standard

The & runs the workspace creation in the background while the VM deploys.

Phase 3: Install K3s on the VM

SSH into the VM using the public IP from the previous output:

ssh azureuser@<public/internal-ip>

Then install K3s and set up the kubeconfig:

curl -sfL https://get.k3s.io | sh -
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER ~/.kube/config
export KUBECONFIG=~/.kube/config

Verify it’s running:

sudo kubectl get nodes

You should see one node in Ready state.

Do this now while you’re on the VM: raise the file descriptor limits. K3s plus the pipeline collector will hit the default inotify limits on a B2s VM, and the pipeline pod will crash with “too many open files” if you skip this:

sudo tee /etc/sysctl.d/99-pipeline.conf > /dev/null <<EOF
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=524288
fs.file-max=131072
EOF
sudo sysctl --system
sudo mkdir -p /etc/systemd/system/k3s.service.d
sudo tee /etc/systemd/system/k3s.service.d/limits.conf > /dev/null <<EOF
[Service]
LimitNOFILE=131072
EOF
sudo systemctl daemon-reload
sudo systemctl restart k3s

Phase 4: Connect to Arc

Still on the VM, install the Azure CLI and connect:

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
az login --use-device-code
az config set extension.dynamic_install_allow_preview=true
az connectedk8s connect \
--name arc-amp-lab \
--resource-group rg-amp-lab \
--location eastus2 \
--kube-config ~/.kube/config

This takes 3-5 minutes. Verify:

az connectedk8s show \
--name arc-amp-lab \
--resource-group rg-amp-lab \
--query "connectivityStatus"

Wait for "Connected".

Phase 5: Install cert-manager, then the pipeline extension

Back in Cloud Shell. cert-manager must go in first — the pipeline operator depends on it and will crash-loop without it:

az k8s-extension create \
--name azure-cert-management \
--cluster-name $ARC_CLUSTER \
--resource-group $RG \
--cluster-type connectedClusters \
--extension-type microsoft.certmanagement \
--release-train stable \
--config subcharts.zdtrcontroller.enabled=true

Wait for it to show Succeeded, then install the pipeline controller. The docs have two different extension type names floating around and only this one works:

az k8s-extension create \
--name azuremonitor-pipeline \
--extension-type microsoft.monitor.pipelinecontroller \
--scope cluster \
--cluster-name $ARC_CLUSTER \
--resource-group $RG \
--cluster-type connectedClusters \
--release-train Preview

Grab the extension’s principal ID, you’ll need it for RBAC:

PRINCIPAL_ID=$(az k8s-extension show \
--name azuremonitor-pipeline \
--cluster-name $ARC_CLUSTER \
--resource-group $RG \
--cluster-type connectedClusters \
--query "identity.principalId" -o tsv)
echo $PRINCIPAL_ID

Phase 6: Create a custom location

The pipeline group resource deploys to a custom location. This is how Azure targets your Arc cluster as a deployment destination:

ARC_CLUSTER_ID=$(az connectedk8s show \
--name $ARC_CLUSTER \
--resource-group $RG \
--query "id" -o tsv)
EXTENSION_ID=$(az k8s-extension show \
--name azuremonitor-pipeline \
--cluster-name $ARC_CLUSTER \
--resource-group $RG \
--cluster-type connectedClusters \
--query "id" -o tsv)
az customlocation create \
--name "cl-amp-lab" \
--resource-group $RG \
--location $LOCATION \
--host-resource-id $ARC_CLUSTER_ID \
--namespace kube-system \
--cluster-extension-ids $EXTENSION_ID

Phase 7: Create DCE and DCR

Create the Data Collection Endpoint first:

WORKSPACE_ID=$(az monitor log-analytics workspace show \
--resource-group $RG \
--workspace-name $WORKSPACE_NAME \
--query "id" -o tsv)
az monitor data-collection endpoint create \
--name dce-amp-lab \
--resource-group $RG \
--location $LOCATION \
--public-network-access Enabled
DCE_URL=$(az monitor data-collection endpoint show \
--name dce-amp-lab \
--resource-group $RG \
--query "logsIngestion.endpoint" -o tsv)
DCE_ID=$(az monitor data-collection endpoint show \
--name dce-amp-lab \
--resource-group $RG \
--query "id" -o tsv)

The CLI doesn’t accept Microsoft-Syslog-FullyFormed as a stream name, so create the DCR from a JSON file instead:

cat > /tmp/dcr.json <<EOF
{
"properties": {
"dataCollectionEndpointId": "$DCE_ID",
"streamDeclarations": {},
"destinations": {
"logAnalytics": [
{
"workspaceResourceId": "$WORKSPACE_ID",
"name": "law-amp-lab"
}
]
},
"dataFlows": [
{
"streams": ["Microsoft-Syslog-FullyFormed"],
"destinations": ["law-amp-lab"]
}
]
}
}
EOF
az monitor data-collection rule create \
--name dcr-amp-lab \
--resource-group $RG \
--location $LOCATION \
--rule-file /tmp/dcr.json

Grab the immutable ID:

DCR_IMMUTABLE_ID=$(az monitor data-collection rule show \
--name dcr-amp-lab \
--resource-group $RG \
--query "immutableId" -o tsv)
DCR_ID=$(az monitor data-collection rule show \
--name dcr-amp-lab \
--resource-group $RG \
--query "id" -o tsv)
echo $DCR_IMMUTABLE_ID

Assign RBAC on the DCR and DCE. Monitoring Metrics Publisher on the workspace alone isn’t enough — the pipeline identity needs it scoped to the DCR and DCE directly:

az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Monitoring Metrics Publisher" \
--scope $DCR_ID
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Monitoring Metrics Publisher" \
--scope $DCE_ID

Phase 8: Deploy the pipeline group

The pipeline supports three TLS modes: mutualTls (default), serverOnly, and disabled. For this lab, TLS is disabled to keep the rsyslog configuration simple. For production deployments, see the TLS configuration docs to set up server-only or full mTLS.

CUSTOM_LOCATION_ID=$(az customlocation show \
--name cl-amp-lab \
--resource-group $RG \
--query "id" -o tsv)
az deployment group create \
--resource-group $RG \
--template-file /dev/stdin <<EOF
{
"\$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.monitor/pipelineGroups",
"location": "eastus2",
"apiVersion": "2026-04-01",
"extendedLocation": {
"name": "$CUSTOM_LOCATION_ID",
"type": "CustomLocation"
},
"name": "pipeline-amp-lab",
"properties": {
"tlsConfigurations": [
{
"name": "tls-disabled",
"mode": "disabled"
}
],
"receivers": [
{
"type": "Syslog",
"name": "syslog-receiver",
"tlsConfiguration": "tls-disabled",
"syslog": { "endpoint": "0.0.0.0:514" }
}
],
"processors": [
{
"type": "MicrosoftSyslog",
"name": "ms-syslog-processor"
}
],
"exporters": [
{
"type": "AzureMonitorWorkspaceLogs",
"name": "syslog-exporter",
"azureMonitorWorkspaceLogs": {
"api": {
"dataCollectionEndpointUrl": "$DCE_URL",
"dataCollectionRule": "$DCR_IMMUTABLE_ID",
"stream": "Microsoft-Syslog-FullyFormed",
"schema": {
"recordMap": [
{"from": "attributes.Computer", "to": "Computer"},
{"from": "attributes.HostName", "to": "HostName"},
{"from": "attributes.Facility", "to": "Facility"},
{"from": "attributes.SeverityLevel", "to": "SeverityLevel"},
{"from": "attributes.SyslogMessage", "to": "SyslogMessage"},
{"from": "attributes.ProcessName", "to": "ProcessName"},
{"from": "attributes.ProcessID", "to": "ProcessID"},
{"from": "attributes.TimeGenerated", "to": "TimeGenerated"},
{"from": "attributes.CollectorHostName", "to": "CollectorHostName"},
{"from": "attributes.SourceSystem", "to": "SourceSystem"}
]
}
}
}
}
],
"service": {
"pipelines": [
{
"name": "syslog-pipeline",
"type": "Logs",
"receivers": ["syslog-receiver"],
"processors": ["ms-syslog-processor"],
"exporters": ["syslog-exporter"]
}
]
}
}
}
]
}
EOF

Phase 9: Configure rsyslog on the VM

Back on the VM, configure rsyslog to forward to the pipeline over plain TCP:

sudo tee /etc/rsyslog.d/90-azure-pipeline.conf > /dev/null <<EOF
*.* @@10.43.255.205:514
EOF
sudo systemctl restart rsyslog

The IP 10.43.255.205 is the ClusterIP of the pipeline service. Get yours with:

kubectl get svc -n kube-system | grep pipeline-amp-lab-service

Send some test messages:

logger "Azure Monitor Pipeline test - $(date)"
logger -p kern.warn "Kernel warning test"
logger -p auth.info "Auth info test"
logger -p daemon.err "Daemon error test"

Phase 10: Verify

Check the pipeline pod is running:

kubectl get pods -n kube-system | grep pipeline

You want pipeline-amp-lab-statefulset-0 showing 3/3 Running. Check the collector logs to confirm the receiver is up and no errors:

kubectl logs -n kube-system pipeline-amp-lab-statefulset-0 -c collector --tail=10 | grep -v heartbeat

You want to see syslog_cef_receiver.start: [protocol=TCP, listening_addr=0.0.0.0:514] and auth.token_acquired with no export.failed lines.

Wait 5 minutes, then query Log Analytics:

Syslog
| where TimeGenerated > ago(15m)
| project TimeGenerated, Computer, Facility, SeverityLevel, SyslogMessage
| order by TimeGenerated desc

When rows start appearing with Computer: vm-amp-lab and your test messages in SyslogMessage, the lab is complete. You’ll also notice real system logs flowing through automatically — SSH connection attempts, daemon activity, K3s internal logs — none of which you triggered manually. That’s the pipeline doing its job.

Also worth checking the Heartbeat table to confirm the full export path:

Heartbeat
| where TimeGenerated > ago(15m)
| order by TimeGenerated desc

About TLS in production

The pipeline defaults to full mTLS, which means client certificates are required. This is the right posture for production: only sources presenting a cert signed by the pipeline’s internal CA can inject data.

For a lab, disabling TLS is fine. For production, the pipeline supports three modes:

  • mutualTls (default): full mTLS, client cert required
  • serverOnly: TLS encryption, no client cert validation
  • disabled: plain TCP

To use serverOnly with rsyslog, you’d configure the tlsConfiguration in the pipeline group ARM template and point rsyslog at the pipeline using the ossl stream driver with StreamDriverAuthMode="anon". The pipeline’s server certificate is managed automatically by cert-manager and rotated without downtime. Full mTLS requires generating a client certificate signed by the pipeline CA. The cert-manager extension handles the CA, but client cert provisioning for external sources (anything outside the K3s cluster) is a manual step. The official TLS docs cover this at learn.microsoft.com/azure/azure-monitor/data-collection/pipeline-tls.

Things worth knowing before you commit

Arc is a hard dependency. The pipeline runs on Arc-enabled Kubernetes. If you don’t have Arc today, that’s the first investment.

Region support is limited at GA. East US2, Canada Central, Italy North, West US2. West Europe is listed in some docs but the extension registration isn’t complete there yet.

The correct extension type is microsoft.monitor.pipelinecontroller. Various docs show Microsoft.Monitor.Pipeline which doesn’t work. Use the pipelinecontroller type on the Preview release train.

cert-manager must be installed first. The pipeline operator depends on it. Skip this step and the pipeline pod will crash-loop with a certificate manager error.

File descriptor limits matter on small VMs. A Standard_B2s running K3s will hit the default inotify limits. Set the sysctl values before deploying the extension or the collector pod will fail with “too many open files”.

The DCR stream must be Microsoft-Syslog-FullyFormed. Using Microsoft-Syslog causes 400 errors on export. The CLI rejects this stream name, so create the DCR from a JSON file as shown above.

RBAC scope matters. Monitoring Metrics Publisher needs to be assigned on the DCR and DCE resources directly, not just the Log Analytics workspace.

The MicrosoftSyslog processor handles field mapping automatically. Use the attribute names it produces (Computer, HostName, SyslogMessage, Facility, SeverityLevel) in your record map, not raw OTel attribute names.

Is it worth looking at?

If you’re running Sentinel and have ever had a conversation about ingestion costs, yes. The edge filtering story is real. Dropping high-volume noise before it leaves your site is where the cost reduction comes from.

If you’re in a hybrid or multicloud environment with patchy connectivity, the local buffering is worth evaluating. Data loss during a connectivity outage is the kind of thing that shows up in audit findings months later.

If you’re already running a well-tuned Logstash or Cribl pipeline, this is worth watching rather than migrating to immediately. The GA status means the API is stable, but the tooling around it (CLI commands, region availability, documentation accuracy) is still catching up to the announcement.

The OpenTelemetry foundation is a genuine positive. This isn’t going to be a dead-end proprietary thing in five years.

Cleanup

When you’re done:

az group delete --name rg-amp-lab --yes --no-wait

This removes everything: VM, Arc cluster registration, workspace, pipeline, DCE, DCR.

Official docs: learn.microsoft.com/azure/azure-monitor/data-collection/pipeline-overview

Leave a Reply

Discover more from Get Practical

Subscribe now to keep reading and get access to the full archive.

Continue reading