Tags: linbo-lin/gporca
Tags
Limit number of bindings per group expression in Orca In some cases with deeply nested subqueries, an expression can end up being binded many times. The expression is decorrelated, and the new expression generated continuously is decorrelated over and over. While this does work and will properly decorrelate complex trees, it also results in high optimization times/OOM in severe cases. Looking at the memo, we are generating new and unique plans. However, the vast majority of the groups generated are duplicates. I'm not sure if there's a relatively simple way to prune these away in the current way we do decorrelation. This commit adds a guc, optimizer_xform_bind_threshold, to limit the number of bindings for each group expression to a certain value. By default, this is set to 0 (unlimited). A reasonable value for this would be 1000 or 10000, but it is query dependent. Example query affected by this change: ``` create table t (a int) distributed by (a); explain select a in ( select a from t as t1 where a in ( select a from t as t2 where a in ( select a from t as t3 where a in ( select a from t as t4 join t as t5 using(a) group by t4.a union select a from t as t4 join t as t5 using(a) group by t4.a union select a from t as t4 join t as t5 using(a) group by t4.a ) ) ) ) from t; ``` Previously this was binded hundreds of thousands of times with an optimization time of 40s : ``` CXformLeftOuterApply2LeftOuterJoin: 5 calls, 433071 total bindings, 136085 alternatives generated, 34598ms CXformLeftSemiApplyIn2LeftSemiJoin: 3 calls, 23345 total bindings, 9075 alternatives generated, 2073ms ```
Fix distribution spec used for CTE producer requirement After optimizing with the "natural" distribution spec of a CTE producer, we try to translate the query's distribution spec to the column of the producer and do another round of optimization. This could lead to incorrect results when the query had a distribution spec (including equivalent distribution specs) that came from multiple CTE consumers, with some of these columns being equivalent because of "=" predicates between them. For example: with cte as (select a,b from foo where b<10) select * from cte x1 join cte x2 on x1.a=x2.b On the query side, columns x1.a and x2.b are equivalent, but we should NOT treat columns a and b of the producer as equivalent. (backported from GPDB commit 607dd6402021101474887eb664f87fd2184b0bb2)
Migrate dockerhub images to gpc (#621) - Update to pull images from GCP. https://github.com/pivotal/gp-image-baking#supported-images - Replace docker-image with registry-image. find . -type f -exec sed -i -e 's/docker-image/registry-image/g' {} \;
[2X branch]: Migrate dockerhub images to gpc (#622) - Update to pull images from GCP. https://github.com/pivotal/gp-image-baking#supported-images - Replace docker-image with registry-image. find . -type f -exec sed -i -e 's/docker-image/registry-image/g' {} \;
Add error handling to PexprScalarExactChild() This function is only called in places where the exact child must exist, and returns null otherwise. However, there have been some cases where we assume the child must exist, but due to another bug the child does not and we don't handle the null. This is a defensive change; if we get into this situation we'll throw an exception and fall back to planner instead of crashing.
PreviousNext