Introduction
The day-to-day work of an MSP DMARC practice is mostly small changes — a new SaaS sender added to a client's SPF, a DKIM key rotated on schedule, a policy nudged from pct=50 to pct=75, a new sub-domain delegated and configured. Each individual change is low-risk; the aggregate of dozens of changes per week across a portfolio of clients is where the real risk lives. The wrong change to the wrong client at the wrong moment can bounce mail for a day; a pattern of unstructured changes across many clients can erode the operational discipline that justifies the recurring engagement.
This article is the change-management companion to the DMARC for MSPs pillar — how to write a change-control policy that protects the portfolio, the approval gates that matter, the rollback windows that keep mistakes small, and the audit trail that holds up under scrutiny.
Why this topic matters
Most MSP practices start without a formal change-management policy. The first client is one engineer's responsibility; that engineer makes changes thoughtfully; nothing goes wrong; the pattern looks like it scales. Then the practice grows to ten engineers and fifty clients, and the same loose discipline produces a steady stream of small incidents — the wrong selector rotated, the wrong policy escalated, an SPF change pushed without notifying the client.
A formal change-management policy is the antidote. Not bureaucracy for its own sake — concrete gates that prevent the specific failure modes that the loose-discipline model produces. The policy pays for itself the first time it catches a mistake that would have escalated.
What counts as a change
The first decision is scope. Too narrow and the policy ignores risky changes; too broad and routine work becomes unworkable. The shape that holds up:
In scope (change-controlled):
- Any modification to a
_dmarcTXT record (policy progression, RUA changes, format changes). - DKIM key rotations (publishing a new selector, switching the active selector, retiring an old selector).
- SPF record changes (adding or removing includes, switching SPF flattening provider).
- BIMI record publication or modification.
- MTA-STS or TLS-RPT record changes.
- Tenant-level platform changes that affect client DNS behavior.
Out of scope (no change control needed):
- Reading aggregate reports.
- Running diagnostic checks.
- Internal MSP-side configuration that doesn't touch client DNS or platform tenant.
- Documentation updates and runbook edits.
- Client communication that doesn't accompany a technical change.
The line is "does this change something the client's mail receivers will observe?" If yes, change control. If no, routine work.
Approval gates by change type
Not every change needs the same level of review. The pattern that holds up:
Low-risk changes (analyst self-approve, log only)
- Adding a new sender to SPF when the sender is on the pre-approved list and the lookup budget has headroom.
- Routine DKIM rotation that follows the documented annual schedule.
- New RUA destination for a client switching DMARC platforms (after the MSA-level conversation has happened).
- Tenant-level platform setting changes that don't affect DNS.
The analyst makes the change, logs it in the change record, moves on. No peer review, no client approval-in-the-moment.
Medium-risk changes (peer review before execution)
- Policy progression to
p=quarantinefor any client. - Policy progression from
pct=50topct=100insidep=quarantine. - DKIM key length changes (1024 → 2048).
- SPF flattening that consolidates includes.
- New sender outside the pre-approved list.
The analyst proposes the change; a second team member reviews; both sign off; the change is executed during the change window. Peer review is light — a 10-minute review, not a formal CAB — but two pairs of eyes catch most mistakes.
High-risk changes (client approval + peer review)
- Policy escalation to
p=reject. - Removing senders from SPF (potentially affects legitimate mail).
- Domain-level RUA or SPF changes for tenant clients in a regulated industry.
- Anything that affects a client domain during a documented change-freeze window (their fiscal year end, their busy season).
The change requires written client approval (an emailed acknowledgment is usually enough) plus internal peer review. The client gets a rollback window; the analyst stays available during the window in case anything goes wrong.
Emergency changes (post-hoc approval, log immediately)
Sometimes mail is breaking now and the policy needs to be rolled back without waiting for the standard approval cycle. The exception is allowed; the discipline is that the change record is filed immediately and the post-hoc review happens within 24 hours. Emergency changes that aren't followed by post-hoc review create the gap where loose discipline regrows.
Change windows and rollback windows
Two distinct concepts:
Change window. The time of day or week when changes are executed. Most MSPs settle on a 2-3 hour weekday morning window (analyst available, client team available, time to react if something goes wrong) and avoid Fridays and weekends. Emergency changes ignore the window; planned changes respect it.
Rollback window. The time after a change during which the analyst stays available to revert if something is breaking. For medium-risk changes, 4-8 hours is typical. For high-risk changes, 24 hours. For policy progression to p=reject, the rollback window is sometimes extended to 72 hours because spoof attempts cluster in bursts and you want one full business cycle to observe.
Pre-document both windows per client. Some clients have specific business cycles (the marketing team sends bulk mail every Tuesday at 9am, for instance) where the window has to shift. The change-window calendar lives in the client runbook.
Change calendars per client
Each client gets a change calendar that lives in the runbook. The shape:
- Standing change windows. When are normal changes allowed? Default is Tuesday-Thursday mornings; client-specific exceptions documented.
- Change-freeze periods. Fiscal year-end, the client's "do not touch DNS" months (sometimes the financial-services holiday season, sometimes the retail Black Friday window), planned platform migrations on the client's side.
- Scheduled changes. Annual DKIM rotation, planned policy escalations, recurring sub-domain delegations. Each on the calendar with the planned date.
- Recently completed changes. A 60-day rolling log of what's been done. Useful for QBR narrative; useful for incident postmortem ("when did we last touch this domain?").
The calendar is a living artefact maintained by the analyst owning the client. The change record (audit trail) is downstream of the calendar — the calendar plans the work; the change record documents what actually happened.
The change-record template
Every change-controlled change produces a record. The template that holds up:
“` CHANGE RECORD #2026-1247
Client: clientco.com Domain affected: clientco.com (root) Record affected: _dmarc TXT Change type: Policy progression Risk level: Medium
Before: v=DMARC1; p=quarantine; pct=50; rua=mailto:reports@dmarc-platform…
After: v=DMARC1; p=quarantine; pct=100; rua=mailto:reports@dmarc-platform…
Rationale: Pass rate at pct=50 has been stable at 99.3% for the last 14 days across all known senders. Moving to pct=100 per the documented rollout phase 3 plan.
Approval: Analyst: J. Chen (proposing) Peer review: M. Hara (approved 2026-06-08 09:14 UTC) Client approval: not required (medium-risk)
Change window: 2026-06-09 09:00-12:00 UTC Rollback window: 2026-06-09 09:00 — 2026-06-09 17:00 UTC
Executed: 2026-06-09 09:23 UTC by J. Chen Verification: aggregate reports confirm propagation 2026-06-09 11:40 UTC
Outcome: No anomalies in 8-hour rollback window. Change accepted. “`
The template captures the before/after state, the rationale, the approval chain, the timing, and the outcome. Stored centrally — usually in the MSP's PSA, ITGlue, or a wiki — and searchable by client, by domain, by date, by analyst.
The discipline that makes the audit trail useful: never modify a change record after the outcome is logged. If a follow-up change is needed, file a new change record that references the original. This preserves the timeline for any later review.
Ticket templates for routine changes
For frequently-recurring change types, pre-built ticket templates speed up the work and enforce the right approval chain. The ticket templates that earn their keep:
"New SaaS sender" ticket — for adding a marketing platform, CRM, transactional service to an existing SPF + DKIM setup. Pre-built fields: sender name, sender's published SPF include, DKIM selector to create, alignment requirements. Routes through low-risk approval.
"Policy progression" ticket — for moving a domain to the next policy phase. Pre-built fields: current policy, target policy, pass-rate evidence, planned date. Routes through medium-risk approval.
"DKIM rotation" ticket — for the annual rotation. Pre-built fields: client, domain, current selector, new selector name, planned date. Routes through low-risk approval if it follows the documented schedule; medium-risk if it's off-schedule.
"Emergency rollback" ticket — for when something is breaking. Pre-built fields: current state, target rollback state, why it's emergency, who was notified. Routes through emergency approval (post-hoc).
The templates aren't just paperwork — they encode the operational discipline. A junior analyst working from the template is operating with the same rigor as a senior one.
Audit trail and compliance
For regulated-industry clients, the audit trail is a compliance artefact, not just an internal discipline. The properties that matter:
- Immutable record. Change records cannot be edited after closure.
- Searchable by client, domain, date, analyst. Audits need to pull specific records quickly.
- Approval chain documented. Who proposed, who reviewed, who approved.
- Justification documented. The rationale field is not optional.
- Timestamp accuracy. UTC, not local time. Real timestamps, not "approximately."
- Linkage to client communication. When client approval was required, the email or ticket where the client approved is referenced.
Many enterprise clients will audit the MSP's change-management discipline as part of their own SOC2, ISO27001, or industry-specific compliance work. A clean audit trail is the artefact that survives that audit; a sloppy one is the artefact that triggers a finding.
Common change-management failure modes
- The "I'll just push it" change. An analyst makes a change without filing a ticket. Catches up to them when something goes wrong and there's no record.
- The "approval by silence" pattern. Client doesn't respond to an approval request; the analyst proceeds anyway. Creates ambiguity that benefits no one.
- The "this is routine" misclassification. A medium-risk change processed as low-risk because the analyst was confident. Confidence isn't a substitute for peer review.
- The "rollback window of zero" change. Change executed at 5pm Friday with no one watching. Even small changes need a rollback period.
- The "emergency that wasn't actually an emergency" pattern. Bypassing the standard approval cycle for convenience. Erodes the policy.
- No post-incident review. When a change does cause an issue, the team patches it but doesn't update the policy. The same incident recurs.
Step-by-step approach to building the policy
- Define the in-scope change types. Start with the list above; adjust to your practice.
- Map each type to a risk level. Low / medium / high / emergency. Be conservative — when in doubt, classify higher.
- Define approval chains per risk level. Self-approve / peer review / client approval.
- Define change and rollback windows per risk level. Standard windows plus client-specific overrides.
- Build the change-record template. Used by every change-controlled change.
- Build ticket templates for the top 5-10 recurring change types. Speeds up the work; enforces the discipline.
- Train the team. New analysts shadow change-controlled work for the first 30 days before executing changes independently.
- Run a quarterly review. What changes happened, where did the policy bend, what should change in the policy itself.
Best practices
- Make the policy short. A change-management policy people don't read isn't a policy.
- Templates over paperwork. Pre-built ticket templates reduce friction; ad-hoc paperwork breeds shortcuts.
- Peer review even for medium-risk. Costs 10 minutes; catches mistakes that cost hours.
- Document the rationale. "Because we agreed in the SOW" isn't a rationale; "pass rate at pct=50 has been stable for 14 days" is.
- Honor the change windows. No Friday-afternoon enforcement escalations.
- Audit the audit trail quarterly. Are change records being filed? Are they complete? Spot-check ten records at random.
Recommended next step
If your practice doesn't have a written change-management policy, draft one this week. Don't aim for the comprehensive version — aim for the version that captures the most common change types and the approval chain. Iterate over the first 90 days. By the end of the first quarter you'll have a policy that fits your practice; from there, refinement is incremental.
FAQ
How big does the practice need to be before formal change management matters?
It matters from day one but the formalism scales with size. A one-engineer practice with three clients can run with light change records and informal review. A ten-engineer practice with fifty clients needs the formal structure. The transition usually happens around 15-20 clients or 4-5 engineers.
Can the analyst peer-review their own change?
No. The peer review is what catches the mistakes the analyst missed; a self-review preserves all the same blind spots.
What about urgent changes outside the change window?
Emergency change category, post-hoc approval, log immediately. The exception exists; the discipline is that exceptions are documented and reviewed.
How long should change records be retained?
Indefinitely for compliance-sensitive clients; 3-7 years for everyone else, depending on the MSA. Retain longer than you think you need; the cost of storage is trivial; the cost of not having a record when you need it is large.
Does the client see the change records?
The client sees their own change records on request. Internal records from the MSP side (peer review comments, analyst notes) may be filtered. The MSA should specify.
What if a client refuses to use change management?
Some clients want changes faster than the discipline allows. The right answer is to educate the client on why the discipline matters — the rollback window is for their protection, not the MSP's. If the client persists, document that the engagement is operating outside standard change management at the client's direction.
How does this work for clients on the self-serve tier?
Self-serve clients change their own records; the MSP isn't running change management for them. The MSP's responsibility is the monitoring and the recommendations; the client owns the changes.
Final thoughts
Change management isn't bureaucracy for its own sake. It's the discipline that keeps the operational pace from outrunning the operational quality. Small change records, fast peer review, honest rollback windows — done consistently, they produce a practice that scales without producing incidents.
The MSP practices that survive their fiftieth client are the ones that wrote the policy at their fifteenth.