Table of Contents

The Hero Sysadmin

Analogous to the Hero Developer, a hero sysadmin is the single individual who understands the operational infrastructure so deeply – and often exclusively – that the organization's long-term stability depends on their presence. Their knowledge is undocumented, their tooling is non-standard, and their workflows are tightly coupled to personal habits rather than shared operational processes.

While the hero sysadmin may be efficient individually, their impact on long-term organizational health is usually negative. The bus factor drops to 1, onboarding becomes difficult, and system resilience declines over time.

Don't be fooled if a sysadmin left and everything still works – usually the systems only start breaking down due to hardware failure, software updates and seemingly insignificant changes.

Early Warning Signs

Here are some indicators that your team may have (or is creating) a hero sysadmin:

Real-World Examples

Works on my Machine™

A sysadmin maintains a set of provisioning scripts and services that only function correctly on their personal workstation, or if you run them in some arcane order which isn't documented properly. They might rely on environment variables and configuration files stored in their home directory.

When another team member attempts to reproduce the setup on a different machine, deployments inexplicably fail, services misbehave, or dependencies cannot be resolved. They need to create their environment either by reverse engineering the other sysadmin's environment or reading the scripts to figure out what to put in the config files.

This is the classic "works on my machine" anti-pattern: the sysadmin's environment has diverged from other machines. The organization becomes dependent on a fragile, not easily reproducible setup tied to a single person.

How to Prevent This: Ephemeral Environments

To avoid this failure mode, infrastructure should be reproducible, declarative, and environment-agnostic. One effective approach is using ephemeral environments, such as:

Ephemeral environments ensure:

Hidden Cron Job

A production synchronization process stops functioning three weeks after a sysadmin leaves. Investigation reveals a cron job running on an old utility server under the sysadmin's personal user account, which had been removed due to security guidelines in the offboarding procedure. No one knew it existed.

Custom Kernel Modules

A server suddenly fails to boot after a kernel update. The previous sysadmin had installed a custom kernel module without documentation, and no one knows how it was built or why it was needed.

The Magical Backup Script

Backups quietly fail for months because the "backup script" was actually a bash script maintained manually on one machine. It was never checked into Git and stopped working when a path changed, but didn't report this in monitoring. The script also doesn't have proper error codes, so debugging the wrong path is nontrivial.

Preventing Hero Sysadmins

To avoid creating or depending on hero sysadmins:

See Also