The Hidden Cost of Free TLS Everywhere
Certificates are free now. Operating them isn’t
TLS (Transport Layer Security) is the protocol that authenticates endpoints and encrypts traffic between them. It is the reason the browser padlock can mean “this is the site you meant to reach” and “someone on the network can’t read or casually modify this page.”
The promise: “free TLS” and the HTTPS-default internet
A decade ago, HTTPS was still treated like an upgrade, something you did for logins, payments, or “serious” apps. Everywhere else, plain HTTP was common, and the web mostly tolerated it.
Then “free certificates” became real. Let’s Encrypt, ACME clients, and better defaults turned issuance into a near-zero-cost, mostly-automated flow, and browsers followed by nudging (and later pushing) users away from HTTP with “Not Secure” UI, stricter cookie rules, and an ecosystem assumption that HTTPS is the baseline.
This was a genuine win. More encryption reduced passive surveillance, cut down trivial on-path tampering, and made “secure by default” feel normal.
But the big shift was not that TLS became free.
It is that TLS became mandatory.
Certificates are free; the lifecycle isn’t
Issuing a cert is the easy part. Operating certs is the hard part.
Every certificate has a lifecycle:
- Domain validation challenges must succeed (HTTP-01, DNS-01, or TLS-ALPN-01).
- Renewals must happen on time.
- Private keys must be protected and rotated.
- Chains must be correct (intermediates, root trust changes).
- OCSP stapling may need to work.
- Clocks must be sane (valid-from/valid-to, staple freshness).
- Deployments must not break clients (SNI, ALPN, legacy TLS versions).
The cost shows up in two places:
- Automation complexity (because manual renewals do not scale).
- Failure modes (because “it expired” is a production outage, not a warning).
Automation is a system, not a cron job
Many teams start with: “We’ll just run certbot in a cron.” Then reality hits:
- Where does the private key live?
- Which node owns renewal in a fleet?
- How do you atomically deploy to N load balancers?
- How do you avoid rate limits when you roll out new environments?
- What is your rollback plan if a renewal deploy breaks traffic?
TLS automation typically grows into a small distributed system:
- ACME client(s)
- Secrets storage
- Config templating / reload orchestration
- Health checks and alerting
- Runbooks for edge cases
You can absolutely build this. But it is not “free,” and the blast radius is bigger than people expect.
Failure modes are nastier than they look
A few common “free cert” outages happen even to mature teams:
- DNS changes break DNS-01 renewals.
- CDN/WAF rules block HTTP-01 challenges.
- A mis-scoped IAM role cannot update the DNS provider API.
- A deploy reloads the proxy with an incomplete chain, causing trust failures.
- Clock drift makes “valid yet?” checks fail in weird ways.
- One service terminates TLS but upstream expects mTLS, and a cert rotation breaks mutual auth.
Certificates being free did not remove operational responsibility. It concentrated it.
TLS has a real CPU cost
Encryption is not free. Modern TLS is fast, but not zero-cost, and the cost is not evenly distributed.
The most expensive part is usually:
- The handshake, especially under high connection churn.
- Key exchange and signatures, depending on algorithms and hardware.
- Per-record encryption/decryption, which is typically cheaper than the handshake but adds up at high throughput.
If you have lots of short-lived connections (some mobile clients, bots, certain API patterns), you may feel TLS CPU overhead more than a long-lived connection workload.
Mitigations exist, and you should use them:
- Prefer TLS 1.3 (fewer round trips; cleaner negotiation).
- Enable session resumption (tickets) where appropriate.
- Tune keep-alives and connection pooling.
- Terminate TLS on hardware-accelerated edges when it makes sense.
The key point: “Free TLS” often becomes “paid CPU.”
TLS pushes complexity into load balancers and edges
In practice, “TLS everywhere” means “TLS termination somewhere.” Choosing where becomes architecture.
Common termination patterns:
- At the CDN/edge (Cloudflare, Fastly, Akamai)
- At the load balancer (ALB/ELB, NGINX, HAProxy)
- At the ingress (Kubernetes ingress controller)
- End-to-end TLS (encrypted all the way to the service)
- mTLS inside the mesh (service-to-service auth + encryption)
Each choice moves cost and complexity:
- Terminate at the edge:
- Pros: offloads CPU, simplifies cert issuance, gets global DDoS/WAF features.
- Cons: introduces a new trust boundary; you must secure origin traffic and headers.
- Terminate at the load balancer:
- Pros: central control, fewer cert copies than per-service.
- Cons: you’re now operating certs + reloads on a critical traffic choke point.
- End-to-end TLS:
- Pros: reduces “decrypt here” trust issues; better internal threat model.
- Cons: cert distribution becomes harder; debugging and observability change.
There is no universally correct answer, only trade-offs. The “hidden cost” is that TLS forces you to make these trade-offs explicitly.
Debugging encrypted traffic is harder
Before TLS everywhere, debugging often meant:
tcpdumpor packet captures- “Just curl the endpoint”
- Inspect the request/response on the wire
With TLS everywhere:
- You cannot passively read payloads without termination or key access.
- Middleboxes can’t “help” by peeking at HTTP headers (good for privacy; tricky for ops).
- Your observability shifts toward:
- structured application logs
- distributed tracing
- proxy/edge logs
- metrics about handshake failures, cipher negotiation, and cert validity
This is a net positive for security, but it requires discipline:
- better request IDs
- better redaction policies
- better sampling strategies
- better separation of “security data” vs “customer data”
Compliance pressure makes “optional TLS” disappear
Even if you personally think a particular endpoint “doesn’t need HTTPS,” the world increasingly disagrees:
- Many security baselines treat encryption in transit as expected.
- Customers ask for it in vendor questionnaires.
- Auditors want to see it in scope definitions and control evidence.
Frameworks like ISO/IEC 27001 and SOC 2 do not say “use TLS exactly like this,” but they push you toward demonstrating that you manage risks with appropriate technical controls, especially around confidentiality, integrity, and access control. In practice, that translates into: encryption-in-transit isn’t optional, and a “we’ll do it later” stance becomes a business risk.
So the decision gets made for you:
- HTTPS becomes default.
- HSTS becomes expected.
- Internal encryption becomes part of the narrative.
- “No TLS” becomes an exception you must justify.
The economic shift: cost moved, not eliminated
“Free TLS everywhere” did not erase spending. It redistributed it.
Who pays now?
- Operators, in CPU cycles and edge bills.
- Engineers, in automation, maintenance, and on-call time.
- Teams, in added architectural complexity and more sophisticated observability.
And this cost is often invisible on paper because it is not a line item labeled “TLS license.” It shows up as:
- another Kubernetes controller
- another Terraform module
- another “renewal incident”
- another load balancer tier
- another compliance checkbox
- another week of debugging client handshake failures
It is still a win, but it is not free.
TLS everywhere is still absolutely worth it
Despite the operational cost, TLS everywhere remains one of the best trades the industry has made.
It delivers:
- Real-world protection against passive network monitoring.
- Reduced risk of trivial injection attacks and content tampering.
- Stronger default identity guarantees (via certificate validation).
- Better platform primitives (secure cookies, modern browser behaviors, safer APIs).
The question is not “should we do TLS everywhere?” The question is: how do we operate it like infrastructure, not like a one-time setup task?
Practical operator guidance
Design for certificate failure, because it will happen. Alert on expiry early (30/14/7/1-day thresholds) and — more importantly — alert on renewal failures, not just approaching dates. Keep a tested manual issuance path for emergencies. Use ACME staging in CI to validate automation without burning rate limits. Treat cert rollouts like deploys: canary, validate, be ready to roll back.
Choose termination points deliberately. Edge termination is simple and offloads CPU, but you need to secure the origin path (TLS to origin, authenticated headers, allowlisted IPs). Terminating closer to services makes sense when your threat model includes internal lateral movement. If you’re doing mTLS, decide upfront who owns identity — SPIFFE/SPIRE, a mesh CA, your own PKI — and document how rotation works before it matters.
Think about handshakes, not just throughput. Prefer TLS 1.3 where possible. Enable session resumption and keep-alives. Load test with realistic connection churn. Handshake error spikes during deploys usually mean a chain or SNI mistake, so make sure those are surfaced before users notice.
Make your observability TLS-aware. You’ll be relying on logs more than packet captures once everything is encrypted. Track handshake failures, cipher negotiation, cert validation errors, and client compatibility issues. Add per-hop tracing so “works at the edge, fails at origin” problems don’t require half a day to diagnose.
Keep the system boring. One ACME client approach per platform. One secret distribution pattern. One reload strategy. Boring is maintainable. Avoid snowflake cert setups for internal services unless there’s a clear reason, and prefer proven tooling over custom pipelines when the team is small.
You may want to read:
Closing: TLS is infrastructure now
TLS stopped being a feature you add to websites a long time ago. It’s part of the plumbing now, which means the certificate got cheaper but the operational work didn’t disappear — it moved into CPU budgets, automation pipelines, edge configs, and the on-call rotation.
Most of these costs are manageable once you stop treating TLS as a one-time setup task. Solid automation, a deliberate termination strategy, and boring repeatable patterns make “TLS everywhere” something you can actually sustain. The complexity exists regardless — the goal is just putting it somewhere your team can see and handle it.
Written by the Infra Atlas author
I work on infrastructure and software systems across layers: writing code, shipping products, and dealing with the practical trade-offs of hosting, memory, and network behavior in production. When this site says it covers “layer 3 to layer 9,” it’s half a joke and half a truth: from routing and packets, up through operating systems, applications, and the human decisions that actually cause outages.
Infra Atlas is a collection of field notes from that work. Some pages may include affiliate or referral links as a low-key way to support the site. Think of it as buying me a coffee while I write about why systems behave the way they do.