We’ve all been there. It’s Monday morning, you haven’t had your coffee yet, and Slack is blowing up.

Users are seeing “Your connection is not private.” The CTO is asking why the site is “hacked.” Your boss’s boss is cc’d on something. You check the box. Nginx is running. The app is fine. Load averages are low.

Then it hits you: The Certbot cron job failed silently.

Maybe the ACME challenge timed out. Maybe Nginx didn’t reload to pick up the new cert. Maybe DNS propagation was slow. Maybe Mercury was in retrograde. Doesn’t matter—production is down because of a 90-day text file, and you look incompetent.

In the old days, we bought Verisign certs that lasted 3 years and cost as much as a used car. Now we’ve traded dollars for DevOps anxiety—we have to be right every 90 days, forever. Thanks, Let’s Encrypt. (I kid. Mostly.)

Here’s how to script your way out of that.


The “Hard Way”: Bash, OpenSSL, and Grit

I write a fair amount of Python and Rust these days, but when you need something to run on any box, anywhere, with zero dependencies? You go back to Bash. It’s the vi of scripting—ugly, ornery, and always there when you need it.

To monitor this properly, we can’t trust Certbot’s output. Certbot lies. Not maliciously—it’s just optimistic in a way that will ruin your weekend. We need to do what a user does: ask the web server to show us its papers.

1. The Core Command

We’re going to use openssl s_client. If you’ve ever debugged a TLS handshake manually, you know this tool is powerful and incredibly annoying—like a Swiss Army knife where half the blades are unlabeled. We pipe the handshake into x509 to extract the expiration date:

 echo | timeout 10 openssl s_client -servername example.com -connect example.com:443 2>/dev/null \
    | openssl x509 -noout -enddate

Three things to note:

  • The echo | pipe: Without input, s_client waits for stdin (interactive mode). The echo closes the connection cleanly. Yes, you’re literally telling openssl to shut up and get on with it.
  • The -servername flag: Crucial. Without SNI, Nginx serves the default cert—probably the snake-oil one or whatever config it loaded first alphabetically. You’ll get a heart attack for no reason.
  • The timeout: Without it, a hung remote host blocks your script indefinitely. The loop stops, the rest of your domains don’t get checked. How do I know this? Pain.

2. The Monitoring Script (ssl_check.sh)

I’m an Emacs guy (Evil mode, because hjkl is muscle memory I refuse to unlearn), but paste this into whatever editor brings you joy. Even nano. I won’t judge. (I will judge a little.)

This script iterates through your domains, checks expiration against the live server, and screams at you if you have less than a week left.

 #!/bin/bash

# Fail on undefined variables and pipe failures.
# If you aren't using this in bash scripts, start now. Future you will thank present you.
set -uo pipefail

# Configuration
DOMAINS=("example.com" "app.example.com" "api.example.com")
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
DAYS_THRESHOLD=7
TIMEOUT_SECONDS=10

send_alert() {
    local message="$1"
    # Silence curl output so cron doesn't spam you with "100% downloaded" messages
    curl -sf -X POST -H 'Content-type: application/json' \
        --data "{\"text\":\"$message\"}" \
        "$SLACK_WEBHOOK_URL" > /dev/null
}

for DOMAIN in "${DOMAINS[@]}"; do
    # Grab the expiration date from the live cert
    EXP_DATE=$(echo | timeout "$TIMEOUT_SECONDS" openssl s_client \
        -servername "$DOMAIN" \
        -connect "$DOMAIN":443 2>/dev/null \
        | openssl x509 -noout -enddate 2>/dev/null \
        | cut -d= -f2)

    # Validate we got something that looks like a date.
    # Note: 'date -d' is GNU specific. If you are on BSD/macOS, use 'date -j -f'.
    if [[ -z "$EXP_DATE" ]] || ! date -d "$EXP_DATE" &>/dev/null; then
        echo "PANIC: Could not retrieve valid certificate for $DOMAIN"
        send_alert "SSL PANIC: Cannot retrieve certificate for $DOMAIN. Check immediately."
        continue
    fi

    # Convert to epoch for math
    EXP_EPOCH=$(date -d "$EXP_DATE" +%s)
    NOW_EPOCH=$(date +%s)
    DAYS_REMAINING=$(( (EXP_EPOCH - NOW_EPOCH) / 86400 ))

    echo "$(date '+%Y-%m-%d %H:%M:%S') - $DOMAIN: $DAYS_REMAINING days remaining"

    if [[ "$DAYS_REMAINING" -le "$DAYS_THRESHOLD" ]]; then
        send_alert "SSL ALERT: $DOMAIN expires in $DAYS_REMAINING days. Fix it now."
    fi
done

3. The Cron Job

chmod +x that bad boy and throw it in cron:

 0 6 * * * /usr/local/bin/ssl_check.sh >> /var/log/ssl_check.log 2>&1

Runs at 6 AM daily. Early enough to fix things before the US wakes up, late enough that you’re not debugging at 3 AM. Adjust to taste and timezone guilt.


Why This Script Is Going to Burn You (The “Turn”)

I’ve written variations of that script for 25 years. It’s better than nothing, and it proves you respect the fundamentals. But honestly? It’s got holes you could drive a mass production Kubernetes deployment through.

Here’s why relying solely on this gives you a false sense of security:

1. The “Localhost” Lie

The script runs on your server. If you have split-horizon DNS, internal routing, or a load balancer in front, the script might see a valid internal cert while the outside world sees a firewall block or a timeout. Your monitoring says green. Your customers see red. Your phone rings.

2. The Silent Cron Failure

Who monitors the monitor?

I once had a cron daemon die after a botched OS upgrade. Didn’t notice for three weeks—until four certs expired on the same day. The script was perfect. It just wasn’t running.

And unless you configure Postfix (and who actually does that anymore?), cron errors go to /var/mail/root. Nobody reads /var/mail/root. That file is where error messages go to die.

3. The Intermediate Chain Problem

The script as written only checks the leaf certificate. openssl s_client receives the full chain, but openssl x509 typically only parses the first PEM block it sees.

If your intermediate cert expires—or Nginx isn’t sending the chain at all—older clients (especially Android, bless its fragmented heart) will reject the connection. Your script says “30 days remaining.” Half your mobile users see a warning page. Support tickets pile up. “Works on my machine” becomes your epitaph.

4. The Wrong Cert Problem

Here’s a fun one: the cert is valid. Expiration looks great. It’s just not your cert.

Maybe someone renewed with a different CA. Maybe the wildcard got swapped for a single-domain cert during a late-night deploy. Maybe a junior admin grabbed the staging cert by mistake. (No shade—we’ve all been that junior admin.)

The expiration date looks fine. The site is broken anyway. Your script sees nothing wrong because technically nothing is wrong—except everything.

5. Maintenance Debt

Every time you spin up a new container, subdomain, or microservice, you have to SSH in and update the DOMAINS array. Forget once, and you’re not monitoring it.

“I’ll add it to the script later,” you say, mass producing a future 2 AM incident.


The Solution: External Monitoring

I got tired of maintaining bespoke Bash scripts across different environments. I wanted something that checked my sites from the outside—like a real user—and handled the edge cases automatically.

That’s why I built SSLGuard.

It’s the tool I wish I had back when I was manually verifying cert chains on Sun boxes and writing Perl scripts I’m not proud of.

  • Checks from the outside: No more “works on my machine” false positives.
  • Validates the full chain: If the intermediate is busted, you’ll know before your users do.
  • Alerts to Slack, Teams, email: You get notified before the cert expires, not after.
  • Zero maintenance: No scripts to update, no cron to babysit, no /var/mail/root to ignore.

If you have zero budget, steal the Bash script above. It’s free, and it’s better than nothing.

But if you value your time—and your sleep—give SSLGuard a try. Takes 30 seconds to set up. You’ll never have that Monday morning panic again.

Your future self will thank you. Possibly with coffee.