Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support simplified syntax #160

Closed
mmazur opened this issue Jul 27, 2022 · 11 comments
Closed

Support simplified syntax #160

mmazur opened this issue Jul 27, 2022 · 11 comments
Labels
enhancement New feature or request schema Shape of YAMLs

Comments

@mmazur
Copy link
Collaborator

mmazur commented Jul 27, 2022

Problem to solve

For the purpose of defining a simple SLO both pyrra's and sloth's syntax is much clearer and less verbose. While we can't get to the exact same level of simplicity due to OpenSLO having to support more use cases, we should at least try to get closer than we currently are.

As a rule, OpenSLO should aim for, whenever possible, adding new features in ways that do not add additional complexity to the simpler use cases.

Proposal

With the merging of #155 an SLO target can be defined as target using a float 0…1 or a targetPercent using the more human-friendly and standard-in-slo-space 0…100. The reason the spec contains both options is to both provide backwards compatibility (target) and be more user friendly and match typical user expectations (targetPercent).

I propose we take the same approach with other aspects of the spec. I'd like this to be valid OpenSLO alongside what's currently in the spec:

apiVersion: openslo/v1
kind: SLO
metadata:
  name: shorter-slo
spec:
  indicator:
    spec:
      ratioMetric:
        good:
          query: results{code="good"}
        total:
          query: results{}
  window: 28d
  budgetingMethod: Timeslices
  targetPercent: 99.9
  timeSliceTarget: 0.9
  timeSliceWindow: 5m

For comparison the spec: portion of this SLO definition is 12 lines in the example above and 19 lines in the spec as it currently is.

Further details

Three changes are proposed, which, per discussion during August's community call, were split into three separate issues:

  1. Simplified Syntax: support defining a single SLO target without using .objectives[]. #168
  2. Simplified Syntax: make DataSource use optional #162
  3. Simplified Syntax: move away from the single-item .timeWindow[] list #169

Links / references

Original contents

1. Make .objectives[]. optional

It's my guess that the reason .objectives[]. exist in the way it does was that the initial OpenSLO spec was based on nobl9's experiences with their system and multiple objectives per SLO is something that system supports. And it made sense in that context.

But the spec evolved beyond that, where nowadays a much more natural definition of multiple SLO targets would be to have a single SLI object and then multiple SLO objects reference it, with each SLO having a single target only.

Under which scenario .objectives[]. makes no sense and is just superfluous verbosity. Verbosity which would make using OpenSLO syntax with tools like pyrra or sloth an obviously worse choice compared to their native config formats. Not to mention it makes the current format a bit inconsistent – why is budgetMethod: Timeslices outside .objectives[].? It'd make just as much sense if it was in, alongside timeSliceTarget.

I also strongly suspect most tools using OpenSLO spec won't consider supporting multiple targets per SLO, and if they do, it'll be using SLIs with multiple SLOs.

(If we ever make an OpenSLO 2.0 release, the backwards-incompatible dropping of .objectives[]. would likely be on the list.)

2. Make DataSource optional (#162)

When DataSource was added in PR #111 it was done to increase the flexibility of the spec, unfortunately at the cost of increased verbosity when a user of OpenSLO has no need for that flexibility. With pyrra or sloth the syntax currently would need to be:

ratioMetric:
  good:
    StringThatDoesntMatterBecauseItsNotUsedForAnythingYetRequired:
      spec:
        query: results{code="good"}

I'd like to correct that omission by at least partially restoring the original syntax and making it clear that one can use DataSource if there's a need for that, otherwise it's fine to just use good.query like in the olden days. (There's likely more discussion needed on how to do it properly.)

@mmazur mmazur added the enhancement New feature or request label Jul 27, 2022
@proffalken
Copy link
Collaborator

proffalken commented Jul 28, 2022

I like the idea behind this, but I'm not 100% convinced that we can do this without dropping the flexibility that we get from OpenSLO vs the other formats you mention.

We love Sloth and are using it in production with our clients, and the lack of support for certain parts of OpenSLO has indeed meant that we have gone with the Sloth native format.

I'm wondering if there's a compromise to be had here by retaining the dataSource but moving it down a level and making it optional so we get something like this:

apiVersion: openslo/v1
kind: SLO
metadata:
  name: shorter-slo
spec:
  indicator:
    spec:
      ratioMetric:
        good:
            query: results{code="good"}
            dataSource: prometheus  # Optional Field 
        total:
            query: results{}
            dataSource: prometheus  # Optional Field
  budgetingMethod: Timeslices
  targetPercent: 99.9
  timeSliceTarget: 0.9
  timeSliceWindow: 5m

I only suggest this because I have a feeling there may be times where our larger customers may want an SLI that pulls the "good" events from one source and the "total" events from another. Granted, we don't actually have this use-case at the moment, but we do have systems where "success/failure" is logged to a file on disk and yet details of the actual execution of the task are stored in a SQL database.

In this event, we could do something like:

apiVersion: openslo/v1
kind: SLO
metadata:
  name: shorter-slo
spec:
  indicator:
    spec:
      ratioMetric:
        good:
            query: SELECT COUNT(id) FROM logs WHERE success == "1" AND system_id == "our_platform";
            dataSource: SQLDB 
        total:
            query: count_over_time({system_id="our_platform"}[1h])
            dataSource: loki
  budgetingMethod: Timeslices
  targetPercent: 99.9
  timeSliceTarget: 0.9
  timeSliceWindow: 5m

Sloth, Pyrra, and other tools could then just ignore the dataSource for each query because they are focused on a single data storage layer, whereas tools such as Oslo or SloGen would observe the dataSource field for each query and act accordingly?

@mmazur
Copy link
Collaborator Author

mmazur commented Jul 29, 2022

Indents in both your yaml examples are inconsistent, it makes it hard to follow.

I like your suggestion, but at the same time the DataSource/metricSource bits of the spec are sufficiently unclear that I'm not sure how they're supposed to be used and therefore how to make changes compatible with them.

@mmazur
Copy link
Collaborator Author

mmazur commented Jul 29, 2022

I've added a separate short issue that implements a version of what you've proposed in #162

@proffalken
Copy link
Collaborator

proffalken commented Jul 29, 2022

Indents in both your yaml examples are inconsistent, it makes it hard to follow.

Odd, it was a cut/paste, I'll try and fix it shortly.

I like your suggestion, but at the same time the DataSource/metricSource bits of the spec are sufficiently unclear that I'm not sure how they're supposed to be used and therefore how to make changes compatible with them.

I've just seen your comments on #162 - I'm not opposed to that approach by any means, definitely feels like a discrepancy there, now I'm wondering whether we need #162 or whether it can just be pulled in to this issue?

@programmer04
Copy link
Member

Both Sloth and Pyrra support only Prometheus which makes things much easier. OpenSLO aims to be flexible enough to support every data source. As described in #107, different data sources have different idiosyncracies in defining how to pull data from them. I think we're not able to come up with a spec that will be universal, so we made a decision to leave it to implementation. Being able to reference arbitrarily chosen data sources in different queries as a part of one SLO is needed to be able to implement powerful composite SLOs as described in #142.

I favor simplicity, but I would rather avoid hurting flexibility because we want others to adopt OpenSLO and they may have different needs and we used to try to address them. I think it's very good to have a broader discussion and improve the current state, but we have to be cautious

@kenfinnigan
Copy link
Collaborator

One approach, which @proffalken and I discussed at Monitorama, to help with simplification is a means of defining an SLO with a cucumber like syntax. All the details needed for an OpenSLO compliant document are then generated from the syntax.

Granted, it would be very limiting as it would not be possible to have a detailed cucumber like syntax without it also becoming overly complex, but it could be used for the "simple" SLO cases to reduce what needs to be defined.

@programmer04
Copy link
Member

It's a really interesting idea @kenfinnigan & @proffalken, we need to discuss this further. I've seen #146. It should help with SLO adoption and cross-department understanding in bigger organizations that may not be technological companies. It may give them simple guidelines/templates on how to define SLOs for new services, user journeys, etc. without bothering about every detail

@proffalken
Copy link
Collaborator

@programmer04 it was inspired by Sophia's talk at Monitorama.

I still think a good "first step" is to move the datasource into the metric so tools like pyrra/sloth can just ignore it whilst others make use of it, but my bigger concern here is that in the pursuit of something more simplistic the wider spec and process of getting an easily adopted toolchain is neglected.

I'm in favour of this PR (And of simplification in general), but I'd rather see the community focus on Olso / SloGen and getting them working with a larger number of platforms as a priority so we have an ecosystem that can easily be deployed to customers.

@mmazur
Copy link
Collaborator Author

mmazur commented Aug 4, 2022

I'm about to head out for a week of PTO, so can't really catch up on the discussions here (though I'll do that before the next call), but it just occurred to me that timeWindow also suffers from the oddity of being a single object list. The really concise version could be just spec.window: 28d.

@mmazur
Copy link
Collaborator Author

mmazur commented Aug 19, 2022

Per discussion of this issue during yesterday's community call, I've split all proposals into three separate issues, so that they can be worked on separately, and updated this issue to match.

@fpiwowarczyk fpiwowarczyk added the schema Shape of YAMLs label Dec 9, 2022
@nieomylnieja
Copy link
Member

Since the 3 issues that you split from this one are closed @mmazur I'm closing this main issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request schema Shape of YAMLs
Projects
None yet
Development

No branches or pull requests

6 participants