-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support simplified syntax #160
Comments
I like the idea behind this, but I'm not 100% convinced that we can do this without dropping the flexibility that we get from OpenSLO vs the other formats you mention. We love Sloth and are using it in production with our clients, and the lack of support for certain parts of OpenSLO has indeed meant that we have gone with the Sloth native format. I'm wondering if there's a compromise to be had here by retaining the apiVersion: openslo/v1
kind: SLO
metadata:
name: shorter-slo
spec:
indicator:
spec:
ratioMetric:
good:
query: results{code="good"}
dataSource: prometheus # Optional Field
total:
query: results{}
dataSource: prometheus # Optional Field
budgetingMethod: Timeslices
targetPercent: 99.9
timeSliceTarget: 0.9
timeSliceWindow: 5m I only suggest this because I have a feeling there may be times where our larger customers may want an SLI that pulls the "good" events from one source and the "total" events from another. Granted, we don't actually have this use-case at the moment, but we do have systems where "success/failure" is logged to a file on disk and yet details of the actual execution of the task are stored in a SQL database. In this event, we could do something like: apiVersion: openslo/v1
kind: SLO
metadata:
name: shorter-slo
spec:
indicator:
spec:
ratioMetric:
good:
query: SELECT COUNT(id) FROM logs WHERE success == "1" AND system_id == "our_platform";
dataSource: SQLDB
total:
query: count_over_time({system_id="our_platform"}[1h])
dataSource: loki
budgetingMethod: Timeslices
targetPercent: 99.9
timeSliceTarget: 0.9
timeSliceWindow: 5m Sloth, Pyrra, and other tools could then just ignore the |
Indents in both your yaml examples are inconsistent, it makes it hard to follow. I like your suggestion, but at the same time the DataSource/metricSource bits of the spec are sufficiently unclear that I'm not sure how they're supposed to be used and therefore how to make changes compatible with them. |
I've added a separate short issue that implements a version of what you've proposed in #162 |
Odd, it was a cut/paste, I'll try and fix it shortly.
I've just seen your comments on #162 - I'm not opposed to that approach by any means, definitely feels like a discrepancy there, now I'm wondering whether we need #162 or whether it can just be pulled in to this issue? |
Both Sloth and Pyrra support only Prometheus which makes things much easier. OpenSLO aims to be flexible enough to support every data source. As described in #107, different data sources have different idiosyncracies in defining how to pull data from them. I think we're not able to come up with a spec that will be universal, so we made a decision to leave it to implementation. Being able to reference arbitrarily chosen data sources in different queries as a part of one SLO is needed to be able to implement powerful composite SLOs as described in #142. I favor simplicity, but I would rather avoid hurting flexibility because we want others to adopt OpenSLO and they may have different needs and we used to try to address them. I think it's very good to have a broader discussion and improve the current state, but we have to be cautious |
One approach, which @proffalken and I discussed at Monitorama, to help with simplification is a means of defining an SLO with a cucumber like syntax. All the details needed for an OpenSLO compliant document are then generated from the syntax. Granted, it would be very limiting as it would not be possible to have a detailed cucumber like syntax without it also becoming overly complex, but it could be used for the "simple" SLO cases to reduce what needs to be defined. |
It's a really interesting idea @kenfinnigan & @proffalken, we need to discuss this further. I've seen #146. It should help with SLO adoption and cross-department understanding in bigger organizations that may not be technological companies. It may give them simple guidelines/templates on how to define SLOs for new services, user journeys, etc. without bothering about every detail |
@programmer04 it was inspired by Sophia's talk at Monitorama. I still think a good "first step" is to move the datasource into the metric so tools like pyrra/sloth can just ignore it whilst others make use of it, but my bigger concern here is that in the pursuit of something more simplistic the wider spec and process of getting an easily adopted toolchain is neglected. I'm in favour of this PR (And of simplification in general), but I'd rather see the community focus on Olso / SloGen and getting them working with a larger number of platforms as a priority so we have an ecosystem that can easily be deployed to customers. |
I'm about to head out for a week of PTO, so can't really catch up on the discussions here (though I'll do that before the next call), but it just occurred to me that |
Per discussion of this issue during yesterday's community call, I've split all proposals into three separate issues, so that they can be worked on separately, and updated this issue to match. |
Since the 3 issues that you split from this one are closed @mmazur I'm closing this main issue :) |
Problem to solve
For the purpose of defining a simple SLO both pyrra's and sloth's syntax is much clearer and less verbose. While we can't get to the exact same level of simplicity due to OpenSLO having to support more use cases, we should at least try to get closer than we currently are.
As a rule, OpenSLO should aim for, whenever possible, adding new features in ways that do not add additional complexity to the simpler use cases.
Proposal
With the merging of #155 an SLO target can be defined as
target
using a float 0…1 or atargetPercent
using the more human-friendly and standard-in-slo-space 0…100. The reason the spec contains both options is to both provide backwards compatibility (target
) and be more user friendly and match typical user expectations (targetPercent
).I propose we take the same approach with other aspects of the spec. I'd like this to be valid OpenSLO alongside what's currently in the spec:
For comparison the
spec:
portion of this SLO definition is 12 lines in the example above and 19 lines in the spec as it currently is.Further details
Three changes are proposed, which, per discussion during August's community call, were split into three separate issues:
.objectives[].
#168.timeWindow[]
list #169Links / references
Original contents1. Make.objectives[].
optionalIt's my guess that the reason.objectives[].
exist in the way it does was that the initial OpenSLO spec was based on nobl9's experiences with their system and multiple objectives per SLO is something that system supports. And it made sense in that context.But the spec evolved beyond that, where nowadays a much more natural definition of multiple SLO targets would be to have a singleSLI
object and then multipleSLO
objects reference it, with eachSLO
having a single target only.Under which scenario.objectives[].
makes no sense and is just superfluous verbosity. Verbosity which would make using OpenSLO syntax with tools like pyrra or sloth an obviously worse choice compared to their native config formats. Not to mention it makes the current format a bit inconsistent – why isbudgetMethod: Timeslices
outside.objectives[].
? It'd make just as much sense if it was in, alongsidetimeSliceTarget
.I also strongly suspect most tools using OpenSLO spec won't consider supporting multiple targets per SLO, and if they do, it'll be using SLIs with multiple SLOs.(If we ever make an OpenSLO 2.0 release, the backwards-incompatible dropping of.objectives[].
would likely be on the list.)2. MakeDataSource
optional (#162)WhenDataSource
was added in PR #111 it was done to increase the flexibility of the spec, unfortunately at the cost of increased verbosity when a user of OpenSLO has no need for that flexibility. With pyrra or sloth the syntax currently would need to be:I'd like to correct that omission by at least partially restoring the original syntax and making it clear that one can useDataSource
if there's a need for that, otherwise it's fine to just usegood.query
like in the olden days. (There's likely more discussion needed on how to do it properly.)The text was updated successfully, but these errors were encountered: