Analogous to the Hero Developer, a hero sysadmin is the single individual who understands the operational infrastructure so deeply – and often exclusively – that the organization's long-term stability depends on their presence. Their knowledge is undocumented, their tooling is non-standard, and their workflows are tightly coupled to personal habits rather than shared operational processes.
While the hero sysadmin may be efficient individually, their impact on long-term organizational health is usually negative. The bus factor drops to 1, onboarding becomes difficult, and system resilience declines over time.
Don't be fooled if a sysadmin left and everything still works – usually the systems only start breaking down due to hardware failure, software updates and seemingly insignificant changes.
Here are some indicators that your team may have (or is creating) a hero sysadmin:
A sysadmin maintains a set of provisioning scripts and services that only function correctly on their personal workstation, or if you run them in some arcane order which isn't documented properly. They might rely on environment variables and configuration files stored in their home directory.
When another team member attempts to reproduce the setup on a different machine, deployments inexplicably fail, services misbehave, or dependencies cannot be resolved. They need to create their environment either by reverse engineering the other sysadmin's environment or reading the scripts to figure out what to put in the config files.
This is the classic "works on my machine" anti-pattern: the sysadmin's environment has diverged from other machines. The organization becomes dependent on a fragile, not easily reproducible setup tied to a single person.
To avoid this failure mode, infrastructure should be reproducible, declarative, and environment-agnostic. One effective approach is using ephemeral environments, such as:
Ephemeral environments ensure:
A production synchronization process stops functioning three weeks after a sysadmin leaves. Investigation reveals a cron job running on an old utility server under the sysadmin's personal user account, which had been removed due to security guidelines in the offboarding procedure. No one knew it existed.
A server suddenly fails to boot after a kernel update. The previous sysadmin had installed a custom kernel module without documentation, and no one knows how it was built or why it was needed.
Backups quietly fail for months because the "backup script" was actually a bash script maintained manually on one machine. It was never checked into Git and stopped working when a path changed, but didn't report this in monitoring. The script also doesn't have proper error codes, so debugging the wrong path is nontrivial.
To avoid creating or depending on hero sysadmins: