Why most Azure environments fail at governance

As I progressed from “keeping Azure running” to actually owning platform decisions, one thing became painfully clear:
most Azure governance failures are not caused by missing tools — they’re caused by weak technical foundations and late architectural decisions.

I’ll walk through a few very realistic situations I’ve personally encountered over the years. They’re anonymised, but the technical problems are absolutely real.

Situation 1 – Governance added after the subscription model is already wrong

In one environment, governance “existed”. There were policies, RBAC assignments, even management groups. But the subscription design had evolved organically:

  • Production and non-production mixed
  • Shared services deployed directly into workload subscriptions
  • No clear separation between platform and application ownership

When governance policies were introduced, they had to be:

  • Full of exclusions
  • Scoped inconsistently
  • Relaxed to avoid breaking existing workloads

At that point, Azure Policy became reactive instead of preventative.

Technical root cause:
Governance was designed after the subscription and management group hierarchy, instead of being embedded into it.

Lesson learned:
If your management group structure doesn’t clearly separate platform, landing zones, and workloads, no amount of policy will save you later.

Situation 2 – Azure Policy used as a control plane instead of a guardrail

I’ve seen environments where Azure Policy was treated almost like a firewall for engineers:

  • Deny policies everywhere
  • Required tags enforced globally
  • Region restrictions applied without workload awareness

Technically, everything was “correct”. Operationally, it was a nightmare.

Deployments failed in pipelines. IaC had to include policy-specific workarounds. Teams started requesting Owner permissions “just to get things done”.

Technical root cause:
Policies were written without aligning to deployment flows, CI/CD pipelines, or exception handling models.

Lesson learned:
Deny policies should protect platform invariants, not encode business logic.
If a policy breaks automation, the policy is wrong — not the pipeline.

Situation 3 – RBAC explosion without role strategy

This is one I still see far too often.

RBAC assignments were technically valid:

  • Contributors at subscription level
  • Custom roles created ad-hoc
  • Permissions granted directly to users

Over time, access reviews became impossible. No one could confidently answer why a user had access, or what would break if it was removed.

Technical root cause:
RBAC was applied tactically, not strategically — without role boundaries or identity lifecycle integration.

Lesson learned:
Azure RBAC must be designed around:

  • Role scopes (MG → Subscription → Resource Group)
  • Group-based assignments
  • Clear separation between platform, workload, and operational access

Without that, governance collapses into entitlement sprawl.

Situation 4 – Secure Score looks great, governance still weak

I’ve walked into tenants with:

  • Defender for Cloud fully enabled
  • Secure Score in the high 90s
  • Compliance dashboards everywhere

Yet basic governance questions had no clear answers:

  • What defines a production workload?
  • Which subscriptions are business-critical?
  • What is the accepted baseline for risk?

Technical root cause:
Security tooling was enabled without defining governance intent.

Lesson learned:
Secure Score measures configuration, not maturity.
Governance is about decision-making, not checkbox optimisation.

Situation 5 – IaC without governance drift control

In more mature environments, everything was deployed via Terraform or Bicep. On the surface, this looked like governance done right.

But drift existed everywhere:

  • Manual changes in production
  • Policies excluded to “unblock” deployments
  • No enforcement of state consistency

Technical root cause:
Infrastructure as Code was treated as a deployment tool, not as a governance enforcement mechanism.

Lesson learned:
Without:

  • Clear ownership of state
  • Policy alignment with IaC
  • Drift detection processes

IaC accelerates inconsistency instead of preventing it.

What changed my view on Azure governance

The turning point for me was realising that governance is a platform engineering problem, not a compliance one.

Strong Azure governance requires:

  • A deliberate management group and subscription architecture
  • Clear boundaries between platform and workload responsibilities
  • Policies that reflect operational reality
  • Identity and access designed for scale, not convenience

Most importantly, governance has to be boring, predictable, and invisible.
If engineers constantly feel it, something is wrong.

Good governance doesn’t stop people deploying.
It stops them deploying the wrong things in the wrong way.

And every time I see governance fail in Azure, it’s almost never because Azure lacked a feature — it’s because the architecture didn’t respect how governance actually works at scale.

Unknown's avatar

Author: João Paulo Costa

Microsoft MVP, MCT, MCSA, MCITP, MCTS, MS, Azure Solutions Architect, Azure Administrator, Azure Network Engineer, Azure Fundamentals, Microsoft 365 Enterprise Administrator Expert, Microsft 365 Messaging Administrator, ITIL v3.

Leave a comment