Detect Silent Background Job Failures
Cron jobs run silently in the background—when they fail, you often don't know until damage is done. Database backups that never run, payment processors that stop working, data syncs that fail silently. Heartbeat monitoring ensures your critical scheduled tasks are actually executing.
Key Features
Heartbeat Detection
Jobs ping a unique URL when they run. Get alerted if expected heartbeat is missed.
Flexible Schedules
Monitor jobs running every minute, hourly, daily, weekly, or on custom schedules.
Execution Tracking
Track execution history, duration, and success/failure patterns over time.
📊 Incident Analysis: 89-Day Backup Gap
Company: B2B SaaS platform (~340 business customers, MySQL 8.0 database)
Discovery date: March 12, 2024
Backup failure began: December 13, 2023 (89 days earlier)
Root cause: `/var/backups` partition reached 100% capacity, mysqldump silently exited with code 1
Their backup script ran at 2 AM daily via cron: `0 2 * * * /usr/local/bin/backup-db.sh >> /dev/null 2>&1`. The redirect to /dev/null meant errors went unnoticed. The disk filled when log rotation stopped working. Database continued operating normally.
On March 12, a corruption in the `invoices` table required restoring from backup. The team discovered their most recent valid backup was from December 12, 2023—89 days old, missing 14,223 transactions across 2,847 customer records.
Recovery: 47 hours reconstructing data from application logs, email receipts, and payment processor records
Data gaps: 89 invoices completely unrecoverable, ~$31,400 in billing discrepancies
Prevention: Heartbeat monitoring (checking for HTTP ping from backup script) would have alerted on December 14 at 2:01 AM. 1-day data gap vs 89-day gap.
Getting Started
Create a Heartbeat Monitor
Set the expected schedule (e.g., "daily at 2 AM"). Get a unique ping URL.
Add Ping to Your Job
Add a simple curl or wget command to ping the URL when your job completes successfully.
Get Alerted on Failures
If the expected heartbeat is missed, you'll be alerted within minutes of the missed execution.
Frequently Asked Questions
Your cron job pings a unique URL when it runs successfully. If the ping doesn't arrive within the expected schedule, you get alerted. This 'heartbeat' approach catches jobs that never run, run but fail silently, or finish but don't complete successfully.
If your script exits with an error code but still sends the heartbeat ping, monitoring won't catch the failure. Best practice: only ping on successful completion (exit code 0). Alternatively, use monitoring that supports both start and finish pings to detect long-running or stuck jobs.
Yes. Set up separate monitors for each schedule: hourly jobs ping every hour, daily at specific times, weekly on certain days. The monitoring system expects pings based on each job's unique schedule and alerts if any job misses its window.
Add a curl or wget command at the end of your script: curl -fsS --retry 3 https://your-monitor-url. Use -fsS for silent mode with error output, --retry for resilience against temporary network issues. Only ping after your job completes successfully.
Use start/finish ping monitoring. Ping once when the job starts, again when it finishes. This detects jobs that never start, start but never finish (crashed/stuck), or exceed expected runtime. Set grace periods longer than your typical job duration.
Monitor critical jobs that affect data integrity, backups, billing, notifications, or data syncing. Skip trivial jobs like cache clearing or log rotation that can fail occasionally without impact. Focus monitoring on jobs where failure causes immediate business problems.