Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcut CASE evaluation (lazy scalar subquery evaluation) #16022

Open
hlcianfagna opened this issue May 23, 2024 · 3 comments
Open

Shortcut CASE evaluation (lazy scalar subquery evaluation) #16022

hlcianfagna opened this issue May 23, 2024 · 3 comments

Comments

@hlcianfagna
Copy link
Contributor

Problem Statement

I have a query where on a limited number of cases an expensive lookup needs to take place.
At the moment CASE, COALESCE, and IF all result on the lookup taking place for all rows.

Example:

CREATE FUNCTION sleep (ms int)
RETURNS int language javascript
as
$$ function sleep(ms){
var end = new Date().getTime()   ms;
var a =1;
while(new Date().getTime() < end) {a  ;}
return 1;}
$$;
	
SELECT COALESCE(0,sleep(5000));
--> SELECT 1 row in set (5.006 sec)	
SELECT CASE WHEN 0 IS NULL THEN sleep(5000) ELSE 0 END;
--> SELECT 1 row in set (5.007 sec)

Possible Solutions

Only evaluate "the arguments that are needed to determine the result" as in https://www.postgresql.org/docs/current/functions-conditional.html

Considered Alternatives

Store intermediate results on working tables and then use a query with an UNION considering the different cases.

@mfussenegger
Copy link
Member

Only evaluate "the arguments that are needed to determine the result" as in

This is actually kinda the case, except that we also have a constant-folding/normalization step to reduce operations like 1 1 to 2, to avoid having to repeat the step per row.

In your example the sleep parameter is literal, so this logic kicks in.

How does the expensive lookup in your actual use-case look like?

@hlcianfagna
Copy link
Contributor Author

How does the expensive lookup in your actual use-case look like?

It is along the lines of something like this:

CREATE TABLE invoices (
	some_data TEXT
	,customer_id INT
	,invoice_specific_payment_terms TEXT
);

INSERT INTO invoices
SELECT 'abc',1,'60 days from receipt of invoice';

REFRESH TABLE invoices;

--> NB in this example invoice_specific_payment_terms is never NULL

CREATE TABLE customers (
	customer_id INT
	,customer_default_payment_terms TEXT
);

INSERT INTO customers
SELECT 1,'30 days from receipt of invoice';

REFRESH TABLE customers;

CREATE TABLE distributed_summits AS
SELECT * FROM sys.summits;

ANALYZE;

SELECT some_data,
	CASE
		WHEN invoice_specific_payment_terms IS NOT NULL
			THEN invoice_specific_payment_terms
		ELSE (	SELECT customer_default_payment_terms
				FROM customers
				CROSS JOIN distributed_summits s1				
				CROSS JOIN distributed_summits s2
				CROSS JOIN distributed_summits s3
				WHERE customers.customer_id=invoices.customer_id
				ORDER BY s3.height DESC
				LIMIT 1
			)
	END 
FROM invoices;

@mfussenegger
Copy link
Member

That's a bigger topic then as subqueries are generally treated in a special way, independent of the evaluation logic of an expression.

@mfussenegger mfussenegger changed the title Shortcut CASE evaluation Shortcut CASE evaluation (lazy scalar subquery evaluation) Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants