feat: Async Query Client from Hackathon #9353

fuziontech · 2022-04-05T19:14:38Z

Async query client backend from hackathon

Instead of using UUID for referencing the status/results of the query we are using a hash of the query itself. This is ensures we don't end up submitting the same query over and over in a bad situation. The downside of this method is future use cases when people may want to run DDLs that look the same within a 10 minute period may no-op as they hit cache.

Note:
mypy fixes are quite annoying. Supposedly there is a better fix in progress. But that fix has been underway for 3 years 😢
lidatong/dataclasses-json#31

rcmarron

So pumped to see this! A couple of thoughts:

Reload insight experience
It'd be great if we could enable the experience where a user refreshes a page for an insight that's already running, and instead of re-running the insight, we can get the status of the already running query.

Using the hash of the query as the key gets us close to this, but the problem is that most insight queries contain timestamps, so the hash will be different.

For the results cache today, we use a hash of the insight's filters. We probably want to do the same thing here. However, it feels wrong to have this querying function be so tightly coupled with insights. I wonder if we should just have an optional query_id param on enqueue_execute_with_progress. If it's set, we use that. If it's not set, then we generate a random uuid like before. That way the "application layer" can manage the querying however it needs to.

If it worked this way, I'd imagine the application side of the "kick of query" function would:

Generate a query_id based on the insight filters
Check if the query is already running
If it isn't running, kick off a new query
Return the query_id to the client

Restart query experience
I imagine we may want to have an explicit "restart the query" option for the customer. For the case where the query is seeming to hang or whatever.

I don't know if this is possible, but to enable this, it would be great if enqueue_execute_with_progress gets called with an query_id that's already running, it would cancel the existing query and start a new one.

If this were the case, then the flow here would be the same as above, but without the check for an already running query.

If this is adding too much complexity, we could also skip this case for now, but I think we'd still want to handle the case where 2 queries have the same query_id.

Existing results cache

I was having some internal debate about how this should related to the existing insight results cache. I keep going back and forth between these are very different, and it's exact same thing...🤔.

I think we should keep them separate, and just wanted to share the reasoning here to make sure others agree too:

The results cache is a long-term cache. Not all queries run with this function should be stored in a long-term cache (e.g. they might not all be insights)
The results cache doesn't store raw query results. Rather it stores the transformed data to be returned to the client.

posthog/client.py

rcmarron · 2022-04-08T18:18:19Z

posthog/client.py

+    status: str = "submitted"
+    complete: bool = False
+    error: bool = False


Having status, complete, and error feels a bit odd because it's the same information in 2 different ways. I'm wondering if we have only status?

mariusandra · 2022-04-11T13:48:06Z

Adding query_id to the application layer is a legit fix here, though:

Using the hash of the query as the key gets us close to this, but the problem is that most insight queries contain timestamps, so the hash will be different.

Can we change the timestamps in the frontend? E.g. instead of $now_to_the_millisecond, use $now_to_the_next_hour. This could provide a large enough caching window to be meaningful?

mariusandra · 2022-04-11T13:48:49Z

Also, on the frontend, can we have a heghehog-jumping-over-cactuses game when a query is longer than 15 seconds?

posthog-bot · 2022-04-25T07:32:56Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

rcmarron

Looks great! One question and a couple nits

Very excited to get this one in!

posthog/client.py

rcmarron · 2022-04-26T00:59:50Z

posthog/client.py

+                results=rv,
+                task_id=task_id,
+            )
+            redis_client.set(key, query_status.to_json(), ex=REDIS_STATUS_TTL)  # type: ignore


Nit: Don't feel strongly about it, but I think we can get away without the new library by doing something like: json.dumps(dataclasses.asdict(query_status))

100% agree but the other side that's kind of nice is the ability to unmarshal JSON into a dataclass DataClassExample.from_json('{"totally": "json"}')

fuziontech · 2022-04-28T03:53:37Z

Dude, excellent feedback here. Thanks!

* master: (137 commits) feat(cohorts): add cohort filter grammars (#9540) feat(cohorts): Backwards compatibility of groups and properties (#9462) perf(ingestion): unsubscribe from buffer topic while no events are produced to it (#9556) fix: Fix `Loading` positioning and `LemonButton` disabled state (#9554) test: Speed up backend tests (#9289) fix: LemonSpacer -> LemonDivider (#9549) feat(funnels): Highlight significant deviations in new funnel viz (#9536) docs(storybook): Lemon UI (#9426) feat: add support for list of teams to enable the conversion buffer for (#9542) chore(onboarding): cleanup framework grid experiment (#9527) fix(signup): domain provisioning on cloud (#9515) chore: split out async migrations ci (#9539) feat(ingestion): enable json ingestion for self-hosted by default (#9448) feat(cohort): add all cohort filter selectors to Storybook (#9492) feat(ingestion): conversion events buffer consumer (#9432) ci(run-backend-tests): remove CH version default (#9532) feat: Add person info to events (#9404) feat(ingestion): produce to buffer partitioned by team_id:distinct_id (#9518) fix: bring latest_migrations.manifest up to date (#9525) chore: removes unused feature flag (#9529) ...

rcmarron

Ship it!

* cherry pick client from metahog * cherrypick celery and requirements from metahog * Change key to be based on hash of query and add test * test that caching works * black formatting * Remove last references to uuid since it's not a uuid anymore * don't statically set CLICKHOUSE_DATABASE * mypy fixes (oof) * add more tests! * last test * black format again * only a bit of feedback incorporated - more to come * add query_id override, force, and tests * black format * Flake8 and test docs * black format tests * mypy fixes * from_json typing pains * Feedbacked * mypy feedback * pin redis to 6.x

fuziontech added 2 commits April 5, 2022 18:50

cherry pick client from metahog

aa6e081

cherrypick celery and requirements from metahog

0513c1d

fuziontech marked this pull request as draft April 5, 2022 19:14

EDsCODE added highlight ⭐ Release highlight and removed highlight ⭐ Release highlight labels Apr 5, 2022

fuziontech added 2 commits April 7, 2022 17:21

Change key to be based on hash of query and add test

f6aaca5

test that caching works

8f77acd

fuziontech changed the title ~~[WIP] Async Query Client from Hackathon~~ feat: Async Query Client from Hackathon Apr 7, 2022

fuziontech added 4 commits April 7, 2022 18:15

black formatting

d83c847

Remove last references to uuid since it's not a uuid anymore

c1dba25

don't statically set CLICKHOUSE_DATABASE

26e0e3e

mypy fixes (oof)

917b97e

fuziontech self-assigned this Apr 7, 2022

fuziontech added 3 commits April 7, 2022 19:24

add more tests!

fa8f9fb

last test

0ce0355

black format again

5875de0

fuziontech requested review from EDsCODE and rcmarron April 7, 2022 20:24

fuziontech marked this pull request as ready for review April 7, 2022 20:25

rcmarron reviewed Apr 8, 2022

View reviewed changes

fuziontech added 7 commits April 15, 2022 01:13

only a bit of feedback incorporated - more to come

7426bc2

add query_id override, force, and tests

f087b60

black format

bbeb739

Flake8 and test docs

8645b5c

black format tests

dc18dee

mypy fixes

c109ad9

from_json typing pains

459848c

posthog-bot added the stale label Apr 25, 2022

rcmarron reviewed Apr 26, 2022

View reviewed changes

fuziontech removed the stale label Apr 28, 2022

Feedbacked

330688a

fuziontech added 4 commits April 27, 2022 20:59

mypy feedback

1e2162d

pin redis to 6.x

5077a58

merge conflict

02315ad

rcmarron self-requested a review April 28, 2022 16:40

rcmarron approved these changes Apr 28, 2022

View reviewed changes

fuziontech merged commit 2a0c69e into master Apr 28, 2022

fuziontech deleted the async_query branch April 28, 2022 18:21

EDsCODE mentioned this pull request Jun 9, 2022

We fire off way too many queries in insights #10185

Closed

2 tasks

webjunkie mentioned this pull request Nov 15, 2023

feat(queries): Add possibility to run queries async #18571

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Async Query Client from Hackathon #9353

feat: Async Query Client from Hackathon #9353

fuziontech commented Apr 5, 2022 •

edited

Loading

rcmarron left a comment

rcmarron Apr 8, 2022

mariusandra commented Apr 11, 2022

mariusandra commented Apr 11, 2022

posthog-bot commented Apr 25, 2022

rcmarron left a comment

rcmarron Apr 26, 2022

fuziontech Apr 28, 2022

fuziontech commented Apr 28, 2022

rcmarron left a comment

feat: Async Query Client from Hackathon #9353

feat: Async Query Client from Hackathon #9353

Conversation

fuziontech commented Apr 5, 2022 • edited Loading

rcmarron left a comment

Choose a reason for hiding this comment

rcmarron Apr 8, 2022

Choose a reason for hiding this comment

mariusandra commented Apr 11, 2022

mariusandra commented Apr 11, 2022

posthog-bot commented Apr 25, 2022

rcmarron left a comment

Choose a reason for hiding this comment

rcmarron Apr 26, 2022

Choose a reason for hiding this comment

fuziontech Apr 28, 2022

Choose a reason for hiding this comment

fuziontech commented Apr 28, 2022

rcmarron left a comment

Choose a reason for hiding this comment

fuziontech commented Apr 5, 2022 •

edited

Loading