Service Level Objectives are one of those topics where the difference between textbook knowledge and operational knowledge is enormous. The textbook says “set a target, measure against it.” The reality is: SLOs are the single tool we’ve found that lets engineering and product teams have a non-emotional conversation about “ship faster” vs “stabilize.”
Done right, an SLO is a number you can defend with data, and an error budget you can spend. Done wrong, it’s a vanity dashboard that nobody acts on.

The three concepts (in plain English)
SLI — what you measure
A Service Level Indicator is the raw signal. “p99 latency on the checkout endpoint.” “Percentage of API requests that return 5xx.” “Time from page request to interactive.” Concrete, observable, comparable over time.
SLO — the target for that signal
The objective is the line you draw on the SLI graph. “P99 latency < 300ms for 99.5% of the time.” The percentage is a service-quality commitment you’re making to your users.
Error budget — the inverse
If your SLO is 99.9% success, your error budget is 0.1%. In a 30-day window, that’s 43 minutes of downtime (or equivalent failed requests). The budget is what you’re ALLOWED to spend — and the entire point is that you spend it deliberately, not accidentally.
The conversation that SLOs unlock
Without an SLO, the “ship faster” vs “stabilize” conversation is a values argument. Product wants velocity; SRE wants reliability; nobody has a way to be objectively right.
With an SLO, the conversation is mechanical. Budget healthy? Ship faster — take more risks, roll out experiments aggressively. Budget depleted? Freeze risky work, pay down reliability debt. The data drives the decision; the team has cover for either path.
Setting your first SLO
- Pick the user journey, not the system.“Can the user complete checkout?” is a journey. “Is the database up?” is a system. SLOs are about user-facing outcomes.
- Pick TWO indicators for that journey. Usually a latency SLI and a success-rate SLI. One alone is gameable; two together describe quality properly.
- Set the SLO at the current performance level, minus a buffer. If your current p99 latency is 280ms, set 300ms. Set targets you’re currently meeting; tighten over time as you improve.
- Review monthly. Burn rate, missed minutes, what consumed the budget. Adjust if reality has changed.
Mistakes that turn SLOs into vanity metrics
- Setting SLOs you can’t enforce.“99.99% uptime” looks great on a vendor pitch but means nothing if no one stops deploys when the budget is gone.
- Measuring infrastructure, not user experience. Database CPU is not an SLI; checkout success is.
- Setting one SLO for the whole product. Different surfaces have different reliability needs. Marketing pages can tolerate worse latency than the checkout endpoint.
- Not reviewing budget burn weekly.The budget is most useful as a leading indicator. If you’re only looking at it once a quarter, you’re looking too late.
How we approach this
For products under our Ongoing Maintenance engagement, we publish SLO dashboards as part of the operating cadence. The monthly review is grounded in budget burn, not gut feel — which is what makes the product/engineering conversation tractable.
Takeaways
- SLI is what you measure; SLO is your target; error budget is what you spend.
- Pick user-journey SLIs, not infrastructure metrics.
- Start at current performance + a buffer. Tighten over time.
- Budget healthy = ship faster. Budget depleted = stabilize. The data picks.







