Introduction
The single biggest fear in MSP-delivered DMARC rollouts is breaking legitimate client mail at the moment of enforcement. The fear is reasonable; the prevention is well-understood. This article covers the discipline that prevents incidents.
Why this topic matters
A broken rollout is the worst MSP outcome — client mail bouncing, sales calls coming in, account at risk. The good news: every incident traces to a skipped step in the playbook. Following the playbook reliably means uneventful rollouts.
The five safety checks before enforcement
- Sender inventory complete. Every row in aggregate reports attributed. No unknowns.
- All legitimate senders at ≥99% pass. For at least two consecutive weeks.
- Subdomain policy (
sp=) deliberately set. Not just inherited. pct=used for the ramp. No jump top=reject pct=100from quarantine without staging.- Internal stakeholders notified. Marketing, sales, IT operations heads-up day before move.
Skip any of these and the risk goes up. Hit all five and the move is uneventful.
What actually breaks rollouts
Three failure modes:
- Surprise senders. A SaaS marketing tool added by marketing without IT review fails alignment at enforcement. Solution: weekly new-sender review during monitoring.
- Forwarding/mailing-list breakage. Some legitimate mail is forwarded; the forwarder breaks DKIM. Solution: identify these senders in monitoring and either remediate or accept.
- DKIM key issues at the wrong moment. Key expired, rotation incomplete, public key not propagated. Solution: validate DKIM before each policy move.
Each is preventable; none is exotic.
Step-by-step approach for safe enforcement
- Run the five-check audit 7 days before the planned move.
- Address any failures. No move until each check is green.
- Communicate internally to the client. A one-paragraph email to stakeholders.
- Move to next policy level with
pct=ramp.pct=10, then 25, 50, 100. - Watch reports daily for the first week. Any new failures surface fast.
- Have a rollback plan. 24-hour return to previous policy is acceptable; longer means the audit was insufficient.
Best practices
- Don't rush the calendar to satisfy clients. Better one week late than one incident.
- Document the safety checks in the runbook. Make it the standard process, not vibes.
- Pair every move with monitoring. Weekly review minimum during ramps.
- Use pct= for everything except the final lock-in. Belt and braces.
- Train the team on rollback. When to step back, when to push through.
Recommended next step
For every client at the cusp of p=quarantine or p=reject, run the five-check audit this week. If any check fails, that's your remediation priority.
FAQ
How long should I wait between policy moves?
7-14 days at each step. Long enough to see report patterns, short enough to keep momentum.
What if I see a new sender mid-ramp?
Step back to the previous pct= level, authenticate the new sender, re-advance.
What's an acceptable rollback duration?
24-72 hours for remediation. Longer means the audit was insufficient.
Should I tell the client about rollbacks?
Yes — proactive communication builds trust. Surprises don't.
How do I prevent surprise senders going forward?
Monitoring alerts. New-IP-in-reports should trigger an immediate Slack/email.
Final thoughts
The fear of breaking client email is reasonable; the prevention is well-defined. Five safety checks before every move, monitoring during, rollback plan ready. Done by the book, rollouts are uneventful.
The MSPs that build this discipline ship more rollouts than the ones that don't.