# Toil

## **Toil in SRE**:

* Toil refers to operational work that is manual, repetitive, automatable, tactical, and devoid of long-term value.
* It is tied to running production services and scales linearly with service growth.

## Characteristics of Toil

* **Manual**: Tasks that require human intervention.
* **Repetitive**: Tasks done repeatedly over time.
* **Automatable**: Tasks that could be done by machines.
* **Tactical**: Interrupt-driven, reactive work (e.g., handling pager alerts).
* **No enduring value**: Tasks that don’t result in permanent improvement.
* **Scales linearly**: Effort increases with service size or usage.

## Why Reducing Toil Matters

* SREs aim to keep toil under **50% of their time** to focus on long-term engineering projects.
* Excessive toil leads to:
  * **Burnout** and **low morale**.
  * **Stagnation** in career growth.
  * **Slower progress** and productivity loss.
  * **Confusion** about SRE’s role as an engineering organization.
  * Risk of **attrition** among top engineers.

## Calculating Toil

* **On-call shifts** make up a minimum of 25%-33% of an SRE’s time.
* Interrupts, urgent responses, and manual processes (e.g., releases) contribute significantly to toil.

## Engineering vs. Toil

* **Engineering work**: Novel, strategic, and produces lasting value.
  * Includes coding, creating automation tools, and system configuration.
* **Overhead**: Administrative tasks like HR work or team meetings, which aren't considered toil but also don't involve engineering.

## Toil's Impact on Teams

* Toil is not always bad; small amounts can be calming and provide quick wins.
* However, too much toil leads to inefficiency, slower feature delivery, and lower morale.

## Conclusion

* Reducing toil through automation and engineering helps scale services more efficiently and enables SREs to focus on high-value, strategic work.
