Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DateHistogram bucket gap #2183

Merged
merged 2 commits into from
Sep 21, 2023
Merged

Fix DateHistogram bucket gap #2183

merged 2 commits into from
Sep 21, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Sep 21, 2023

Fixes a computation issue of the number of buckets needed in the DateHistogram.

This is due to a missing normalization from request values (ms) to fast field
values (ns), when converting an intermediate result to the final result.
This results in a wrong computation by a factor 1_000_000.
The Histogram normalizes values to nanoseconds, to make the user input like
extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible.
This normalization happens only for date type fields, as other field types don't have precision settings.
The normalization does not happen due a missing column_type, which is not
correctly passed after merging an empty aggregation (which does not have a column_type set), with a regular aggregation.

Another related issue is an empty aggregation, which will not have
column_type set, will not convert the result to human readable format.

This PR fixes the issue by:

  • Limit the allowed field types of DateHistogram to DateType
  • Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as is_date_agg.
  • Fix the merge logic

Add a flag to do normalization only once. This is not an issue currently, but it could become easily one.

closes quickwit-oss/quickwit#3837

Fixes a computation issue of the number of buckets needed in the
DateHistogram.

This is due to a missing normalization from request values (ms) to fast field
values (ns), when converting an intermediate result to the final result.
This results in a wrong computation by a factor 1_000_000.
The Histogram normalizes values to nanoseconds, to make the user input like
extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible.
This normalization happens only for date type fields, as other field types don't have precision settings.
The normalization does not happen due a missing `column_type`, which is not
correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation.

Another related issue is an empty aggregation, which will not have
`column_type` set, will not convert the result to human readable format.

This PR fixes the issue by:
- Limit the allowed field types of DateHistogram to DateType
- Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`.
- Fix the merge logic

Add a flag to to normalization only once. This is not an issue
currently, but it could become easily one.

closes quickwit-oss/quickwit#3837
@PSeitz
Copy link
Contributor Author

PSeitz commented Sep 21, 2023

Failing coverage depends on: time-rs/time#624

(or we could switch temporarily to an older nightly version)

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 98.38% and project coverage change: 0.01% 🎉

Comparison is base (2d73903) 94.40% compared to head (1a51c5f) 94.41%.
Report is 3 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2183       /-   ##
==========================================
  Coverage   94.40%   94.41%    0.01%     
==========================================
  Files         322      322              
  Lines       63225    63486      261     
==========================================
  Hits        59688    59941      253     
- Misses       3537     3545        8     
Files Changed Coverage Δ
src/query/query_parser/query_parser.rs 92.42% <0.00%> (-0.40%) ⬇️
query-grammar/src/query_grammar.rs 98.02% <100.00%> ( 0.06%) ⬆️
query-grammar/src/user_input_ast.rs 98.48% <100.00%> ( 0.05%) ⬆️
src/aggregation/agg_limits.rs 100.00% <100.00%> (ø)
src/aggregation/agg_req_with_accessor.rs 98.25% <100.00%> ( 0.05%) ⬆️
src/aggregation/bucket/histogram/date_histogram.rs 97.25% <100.00%> ( <0.01%) ⬆️
src/aggregation/bucket/histogram/histogram.rs 99.47% <100.00%> ( <0.01%) ⬆️
src/aggregation/bucket/term_agg.rs 99.14% <100.00%> ( 0.03%) ⬆️
src/aggregation/intermediate_agg_result.rs 93.90% <100.00%> ( 0.04%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PSeitz PSeitz requested a review from fulmicoton September 21, 2023 05:57
@PSeitz PSeitz merged commit 34920d3 into main Sep 21, 2023
5 checks passed
@PSeitz PSeitz deleted the check_bug branch September 21, 2023 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aborting aggregation because memory limit was exceeded
3 participants