Fix DateHistogram bucket gap #2183

PSeitz · 2023-09-21T03:30:23Z

Fixes a computation issue of the number of buckets needed in the DateHistogram.

This is due to a missing normalization from request values (ms) to fast field
values (ns), when converting an intermediate result to the final result.
This results in a wrong computation by a factor 1_000_000.
The Histogram normalizes values to nanoseconds, to make the user input like
extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible.
This normalization happens only for date type fields, as other field types don't have precision settings.
The normalization does not happen due a missing column_type, which is not
correctly passed after merging an empty aggregation (which does not have a column_type set), with a regular aggregation.

Another related issue is an empty aggregation, which will not have
column_type set, will not convert the result to human readable format.

This PR fixes the issue by:

Limit the allowed field types of DateHistogram to DateType
Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as is_date_agg.
Fix the merge logic

Add a flag to do normalization only once. This is not an issue currently, but it could become easily one.

closes quickwit-oss/quickwit#3837

Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes quickwit-oss/quickwit#3837

PSeitz · 2023-09-21T03:52:03Z

Failing coverage depends on: time-rs/time#624

(or we could switch temporarily to an older nightly version)

codecov-commenter · 2023-09-21T05:37:23Z

Codecov Report

Patch coverage: 98.38% and project coverage change: 0.01% 🎉

Comparison is base (2d73903) 94.40% compared to head (1a51c5f) 94.41%.
Report is 3 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2183       /-   ##
==========================================
  Coverage   94.40%   94.41%    0.01%     
==========================================
  Files         322      322              
  Lines       63225    63486      261     
==========================================
  Hits        59688    59941      253     
- Misses       3537     3545        8

Files Changed	Coverage Δ
src/query/query_parser/query_parser.rs	`92.42% <0.00%> (-0.40%)`	⬇️
query-grammar/src/query_grammar.rs	`98.02% <100.00%> ( 0.06%)`	⬆️
query-grammar/src/user_input_ast.rs	`98.48% <100.00%> ( 0.05%)`	⬆️
src/aggregation/agg_limits.rs	`100.00% <100.00%> (ø)`
src/aggregation/agg_req_with_accessor.rs	`98.25% <100.00%> ( 0.05%)`	⬆️
src/aggregation/bucket/histogram/date_histogram.rs	`97.25% <100.00%> ( <0.01%)`	⬆️
src/aggregation/bucket/histogram/histogram.rs	`99.47% <100.00%> ( <0.01%)`	⬆️
src/aggregation/bucket/term_agg.rs	`99.14% <100.00%> ( 0.03%)`	⬆️
src/aggregation/intermediate_agg_result.rs	`93.90% <100.00%> ( 0.04%)`	⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

use older nightly for time crate (breaks build)

1a51c5f

PSeitz mentioned this pull request Sep 21, 2023

Aggregation: Prune first before checking the bucket limit #2182

Closed

PSeitz requested a review from fulmicoton September 21, 2023 05:57

fulmicoton approved these changes Sep 21, 2023

View reviewed changes

PSeitz merged commit 34920d3 into main Sep 21, 2023
5 checks passed

PSeitz deleted the check_bug branch September 21, 2023 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DateHistogram bucket gap #2183

Fix DateHistogram bucket gap #2183

PSeitz commented Sep 21, 2023 •

edited

Loading

PSeitz commented Sep 21, 2023

codecov-commenter commented Sep 21, 2023

Fix DateHistogram bucket gap #2183

Fix DateHistogram bucket gap #2183

Conversation

PSeitz commented Sep 21, 2023 • edited Loading

PSeitz commented Sep 21, 2023

codecov-commenter commented Sep 21, 2023

Codecov Report

PSeitz commented Sep 21, 2023 •

edited

Loading