Convert nested join in Vector Queries to Pandas Merge. #1298

Chitti-Ankith · 2023-10-17T21:41:11Z

Profiling on Vector Scan showed that we are spending a lot of time in the post-processing logic doing a Nested Join. This is an initial commit to change that into a Join using Pandas. Change showed ~50% improvement in Similarity Queries.

jiashenC · 2023-10-18T15:59:20Z

For 20% speedup, how many rows does the table contain?

Chitti-Ankith · 2023-10-18T16:09:00Z

For 20% speedup, how many rows does the table contain?

100k

jiashenC · 2023-10-19T02:23:46Z

evadb/executor/vector_index_scan_executor.py

-                        for col_name in column_list:
-                            res_row[col_name] = row[col_name]
-                        res_row_list[idx] = res_row
+            result_df = pd.merge(


Instead of doing O(n) of merging, will we get better performance if get all batches from the child and do merging only once?

Instead of doing O(n) of merging, will we get better performance if get all batches from the child and do merging only once?

Thanks for the suggestion, I have also made changes to not add child frames into the result df before merging to avoid unnecessary processing. The speedup is 2X now.

jiashenC · 2023-10-26T00:27:12Z

evadb/executor/vector_index_scan_executor.py

+            left_index=True,
+            right_index=True,
+            how="left",
+            # sort=False


Just remove this?

Chitti-Ankith added 2 commits October 17, 2023 17:38

index scan

b35b5fb

lint fixes

6cf6810

Chitti-Ankith mentioned this pull request Oct 17, 2023

use database native index scan to accelerate table scan during similarity search #1222

Open

2 tasks

xzdandy assigned Chitti-Ankith Oct 18, 2023

xzdandy added the Optimizations Features/Bugs related to optimizations label Oct 18, 2023

gaurav274 requested a review from jiashenC October 18, 2023 15:53

jiashenC reviewed Oct 19, 2023

View reviewed changes

Chitti-Ankith added 3 commits October 22, 2023 17:49

Merge optimisations

a990ebc

lint fixes

c066d73

Indexed the columns for performing the join

7903b51

jiashenC approved these changes Oct 26, 2023

View reviewed changes

Minor change

aa34aec

Chitti-Ankith merged commit f420faa into georgia-tech-db:staging Oct 26, 2023

xzdandy added this to the v0.3.9 milestone Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert nested join in Vector Queries to Pandas Merge. #1298

Convert nested join in Vector Queries to Pandas Merge. #1298

Chitti-Ankith commented Oct 17, 2023 •

edited

Loading

jiashenC commented Oct 18, 2023

Chitti-Ankith commented Oct 18, 2023

jiashenC Oct 19, 2023

Chitti-Ankith Oct 22, 2023

jiashenC Oct 26, 2023

Chitti-Ankith Oct 26, 2023

Convert nested join in Vector Queries to Pandas Merge. #1298

Convert nested join in Vector Queries to Pandas Merge. #1298

Conversation

Chitti-Ankith commented Oct 17, 2023 • edited Loading

jiashenC commented Oct 18, 2023

Chitti-Ankith commented Oct 18, 2023

jiashenC Oct 19, 2023

Choose a reason for hiding this comment

Chitti-Ankith Oct 22, 2023

Choose a reason for hiding this comment

jiashenC Oct 26, 2023

Choose a reason for hiding this comment

Chitti-Ankith Oct 26, 2023

Choose a reason for hiding this comment

Chitti-Ankith commented Oct 17, 2023 •

edited

Loading