-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don"t try to compact blocks marked for deletion. #4328
Don"t try to compact blocks marked for deletion. #4328
Conversation
@@ -599,11 +599,11 @@ func (c *Compactor) compactUser(ctx context.Context, userID string) error { | |||
deduplicateBlocksFilter := block.NewDeduplicateFilter() | |||
|
|||
// While fetching blocks, we filter out blocks that were marked for deletion by using IgnoreDeletionMarkFilter. | |||
// The delay of deleteDelay/2 is added to ensure we fetch blocks that are meant to be deleted but do not have a replacement yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm... this comment was actually explaining the reason why we did it. It was introduced here:
https://github.com/cortexproject/cortex/pull/2335/files#diff-adc81f6ffba52bc1871b0dac7c88237aa1cdea40e7691e35e834b40129ab723f
It was imported by Thanos (where the logic is still there):
https://github.com/thanos-io/thanos/blob/e53a1f70f947fe83d4d232c9e2887911d390d03c/cmd/thanos/compact.go#L224-L227
I think we should think a bit more about this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only result of using the delay here is that compactor will get to see blocks that are already marked for deletion. It will then use such blocks for grouping and planning.
If such block marked for deletion (let"s call it "B") is already "included" in some other block (as determined by compactor sources in meta.json), it will be ignored anyway.
If "B" is not yet included in any other block [*], and can be compacted together with other blocks, it will be. I think that is a mistake, because it would result in deleted block to be included in a new block, effectively resurrecting it.
There may be other blocks that are "source" blocks for our block "B". Presumably, those other blocks are also marked for deletion already, since "B" contains their data now. If compactor doesn"t see blocks marked for deletion, it will not see those source blocks either.
If those source blocks are not marked for deletion yet (but "B" is) and compactor doesn"t see "B" but sees source blocks, it would create new compacted block (basically "new B") from source blocks. This would be unfortunate, but this is a scenario which I don"t see how it could happen.
[*] Either because of scenario I"ve described with backfill, or very inconsistent storage, which makes new blocks invisible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thinking, I agree with you. I also tried to look at the Thanos history to see if I could find more information about why it was done like that in the compactor and looks it was picked by the Thanos bucket store, but I can"t see how the two are correlated here.
Not fixing this bug makes backfilling somewhat complicated. In our latest backfilling project, we start to push latest samples to target cluster sometime during day X, but then want to backfill this day X from scratch to have full 24h of data. To do that we need to delete (using deletion marker) half-empty block for day X (that resulted from push samples), and run backfill process. However we need to be careful, as compactor would still compact half-empty block and backfilled block together and they may not contain the same samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow up discussion! LGTM
@@ -599,11 +599,11 @@ func (c *Compactor) compactUser(ctx context.Context, userID string) error { | |||
deduplicateBlocksFilter := block.NewDeduplicateFilter() | |||
|
|||
// While fetching blocks, we filter out blocks that were marked for deletion by using IgnoreDeletionMarkFilter. | |||
// The delay of deleteDelay/2 is added to ensure we fetch blocks that are meant to be deleted but do not have a replacement yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thinking, I agree with you. I also tried to look at the Thanos history to see if I could find more information about why it was done like that in the compactor and looks it was picked by the Thanos bucket store, but I can"t see how the two are correlated here.
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
197fc91
to
58621b2
Compare
* Don"t try to compact blocks marked for deletion. * Update CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> Signed-off-by: Alvin Lin <[email protected]>
What this PR does: Before this PR, compactor would consider blocks that has been marked for deletion recently (within
-compactor.deletion-delay / 2
period) as eligible for compaction. This PR changes that, and makes compactor completely ignore blocks marked for deletion.Under normal Cortex run this is not a problem, because blocks only get marked for deletion after they are compacted to larger blocks (or when retention period hits), and compactor would also find this "larger" block and avoid compacting the smaller ones.
However there is a scenario, where this original behavior of compactor turned out problematic. Let"s say we have a block "A" for some day. After manually marking block "A" for deletion, and then running a backfill for the very same day, new block "B" is created for this day. Compactor would then find both "A" (recently marked for deletion) and "B" (newly backfilled block), and try to merge them together. That"s not desirable in this case.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]