Fix: Azurerm Policy Update Fails With InvalidPolicy Error

by Dimemap Team 58 views

Hey guys! Today, we're diving deep into a tricky issue that many of you might have encountered while working with Azure Role Management Policies in Terraform or Pulumi. Specifically, we're talking about the dreaded InvalidPolicy error that pops up when trying to update azurerm_role_management_policy. Let's break down the problem, understand the root cause, and explore a workaround and a suggested fix. So, buckle up, and let's get started!

Understanding the Bug Report: A Deep Dive

Before we jump into the nitty-gritty, let's set the stage. This article is inspired by a detailed bug report highlighting a critical issue with updating PIM (Privileged Identity Management) Role Management Policies using Pulumi and Terraform. The core problem? Updates fail with an InvalidPolicy error, even though the same updates work flawlessly when performed directly via the Azure REST API with a complete rule structure. Let's unravel this mystery!

The Environment Setup

First, let's paint a picture of the environment where this issue occurs:

  • Pulumi Version: v3.200.0
  • Pulumi Azure Provider: v6.26.0 (@pulumi/azure)
  • Underlying Terraform Provider: azurerm (version wrapped by Pulumi Azure 6.26.0)
  • Azure Subscription: Production environments (multiple subscriptions tested)
  • Affected Resource: azure.pim.RoleManagementPolicy (Pulumi) / azurerm_role_management_policy (Terraform)

The Problem Unveiled

The main issue is that when you try to update an existing azurerm_role_management_policy resource—for instance, changing the maximumDuration from PT1H to PT2H—the update fails with the following error:

error: sdk-v2/provider2.go:572: sdk.helper_schema: updating Scoped Role Management Policy (Scope: "/subscriptions/{subscription-id}"
Role Management Policy Name: "{role-definition-id}"): unexpected status 400 (400 Bad Request) with error: InvalidPolicy: The policy is invalid.

This issue isn't isolated to a single role; it affects various Azure built-in roles, including:

  • Contributor (b24988ac-6180-42a0-ab88-20f7382dd24c)
  • AKS RBAC Admin (3498e952-d568-435e-9b2c-8d77e338d7f7)
  • Role Based Access Control Administrator (f58310d9-a9f6-439a-9e8d-f62e7b41a168)
  • Key Vault Secrets User (4633458b-17de-408a-b874-0445c86b69e6)
  • Key Vault Secrets Officer (b86a8fe4-44ce-4948-aee5-eccb2c155cd7)

Root Cause Analysis: Why the Updates Fail

Okay, so why is this happening? The root cause lies in how Terraform/Pulumi providers abstract Azure's Role Management Policy API. The provider exposes a simplified version that includes:

  • activation_rules
  • eligible_assignment_rules
  • notification_rules (partial)

However, here's the catch: the Azure Role Management Policy API actually requires 17 distinct rules to cover various scenarios. This is where things get tricky.

The Complete Azure API Rule Structure

To give you a clearer picture, here are the 17 rules that Azure's API expects:

  1. Enablement_Admin_Eligibility (RoleManagementPolicyEnablementRule)
  2. Expiration_Admin_Eligibility (RoleManagementPolicyExpirationRule)
  3. Notification_Admin_Admin_Eligibility (RoleManagementPolicyNotificationRule)
  4. Notification_Requestor_Admin_Eligibility (RoleManagementPolicyNotificationRule)
  5. Notification_Approver_Admin_Eligibility (RoleManagementPolicyNotificationRule)
  6. Enablement_Admin_Assignment (RoleManagementPolicyEnablementRule)
  7. Expiration_Admin_Assignment (RoleManagementPolicyExpirationRule)
  8. Notification_Admin_Admin_Assignment (RoleManagementPolicyNotificationRule)
  9. Notification_Requestor_Admin_Assignment (RoleManagementPolicyNotificationRule)
  10. Notification_Approver_Admin_Assignment (RoleManagementPolicyNotificationRule)
  11. Approval_EndUser_Assignment (RoleManagementPolicyApprovalRule)
  12. AuthenticationContext_EndUser_Assignment (RoleManagementPolicyAuthenticationContextRule)
  13. Enablement_EndUser_Assignment (RoleManagementPolicyEnablementRule)
  14. Expiration_EndUser_Assignment (RoleManagementPolicyExpirationRule)
  15. Notification_Admin_EndUser_Assignment (RoleManagementPolicyNotificationRule)
  16. Notification_Requestor_EndUser_Assignment (RoleManagementPolicyNotificationRule)
  17. Notification_Approver_EndUser_Assignment (RoleManagementPolicyNotificationRule)

When the Terraform/Pulumi provider attempts to update a policy, it sends only the subset of rules defined in its schema. Azure, expecting all 17 rules, rejects the update as invalid. Think of it like ordering a pizza with only a few toppings when the pizzeria expects the whole shebang!

Reproducing the Issue: A Step-by-Step Guide

To illustrate the problem, let's look at a Pulumi/TypeScript code snippet that triggers this issue:

Pulumi/TypeScript Code

import * as azure from '@pulumi/azure';

new azure.pim.RoleManagementPolicy(
  'contributor-pim-policy',
  {
    scope: `/subscriptions/${subscriptionId}`,
    roleDefinitionId: '/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c',
    eligibleAssignmentRules: {
      expirationRequired: false,
    },
    activationRules: {
      maximumDuration: 'PT2H', // Attempting to change from PT1H to PT2H
      requireApproval: true,
      approvalStage: {
        primaryApprovers: [
          {
            objectId: approverGroupId,
            type: 'Group',
          },
        ],
      },
    },
    notificationRules: {
      activeAssignments: {
        approverNotifications: {
          notificationLevel: 'Critical',
          defaultRecipients: true,
        },
      },
    },
  }
);

Expected vs. Actual Behavior

The expected behavior is that the policy should update successfully with the new maximumDuration value. However, the actual behavior is that the update fails with the InvalidPolicy: The policy is invalid error. Bummer!

Workaround: A Manual Fix

Fear not! There's a workaround to bypass this issue. You can update the policy directly via Azure CLI/REST API with the complete 17-rule structure. Here's how:

Azure CLI/REST API Workaround

SUBSCRIPTION="subscription-id"
ROLE_ID="role-definition-id"

# Fetch current policy with all 17 rules
az rest --method GET \
  --uri "/subscriptions/$SUBSCRIPTION/providers/Microsoft.Authorization/roleManagementPolicies?api-version=2020-10-01" \
  | jq ".value[] | select(.name == \"$ROLE_ID\")" > policy.json

# Update the specific rule
cat policy.json | jq '.properties.rules |= map(
  if .id == "Expiration_EndUser_Assignment"
  then .maximumDuration = \"PT2H\"
  else .
  end
)' > policy_updated.json

# Apply update with complete rule structure
az rest --method PATCH \
  --uri "/subscriptions/$SUBSCRIPTION/providers/Microsoft.Authorization/roleManagementPolicies/$ROLE_ID?api-version=2020-10-01" \
  --body "$(cat policy_updated.json | jq '{properties: {rules: .properties.rules}}')"

This workaround proves a few crucial points:

  1. The update operation itself is valid.
  2. Azure accepts PT2H as a valid duration.
  3. The issue stems from the incomplete rule structure sent by the provider.

Impact: The Ripple Effects

This issue has several significant impacts:

  • Users can't update PIM policies via Terraform/Pulumi for duration changes or other modifications.
  • Manual Azure CLI operations are required, which defeats the purpose of infrastructure-as-code.
  • State drift occurs between Terraform state and the actual Azure configuration when policies are manually updated.
  • Multiple production environments are affected across different Azure regions.

Suggested Fix: A Path Forward

So, how can we fix this? The provider should:

  1. Fetch the complete existing policy before performing updates (similar to a READ operation).
  2. Merge user-specified changes with the complete rule structure.
  3. Send all 17 rules in the PATCH request to Azure.

Alternatively, the provider could:

  1. Expose all rule types in the provider schema, allowing users to explicitly manage the complete policy.

This approach mirrors how the Azure Portal and Azure CLI handle policy updates—they always work with the complete rule structure.

Related Issues: A Web of Connections

It's worth noting that this issue isn't entirely isolated. It's connected to other related problems, such as:

  • #26377 - Constant drift with approval_stage (related but different issue)
  • #26481 - Timeout issues during policy creation (resolved)

Where to File the Bug: Spreading the Word

This bug should primarily be filed in the hashicorp/terraform-provider-azurerm repository, as it's an issue with the underlying Terraform provider that Pulumi wraps. However, a corresponding issue could also be filed in pulumi/pulumi-azure for visibility.

Recommended Repositories:

  1. Primary: https://github.com/hashicorp/terraform-provider-azurerm/issues (Terraform provider)
  2. Secondary: https://github.com/pulumi/pulumi-azure/issues (Pulumi wrapper)

Additional Context: The Devil's in the Details

To provide further context, here are some additional details:

Environment Details:

  • Operating System: macOS (Darwin 24.6.0)
  • Pulumi CLI: v3.200.0
  • Node.js/TypeScript project
  • Multiple Azure production subscriptions tested (US, EU, AP regions)

Conclusion: Wrapping It Up

So, there you have it, guys! We've dissected the InvalidPolicy error when updating azurerm_role_management_policy, uncovered the root cause, provided a workaround, and suggested a fix. This issue highlights the importance of understanding the underlying API requirements and how providers abstract them. By working with the complete rule structure, we can ensure smooth and reliable updates to our PIM policies. Keep an eye on the related issue trackers, and let's hope for a permanent fix soon! Happy coding, and stay tuned for more deep dives into the world of IaC and Azure!