Dask divisions,npartitions and indexes after a groupby call #10459

simonykq · 2023-08-23T21:08:54Z

simonykq
Aug 23, 2023

There is not much doc I can find about the behaviour of indexing and divisions after performing a ddf.groupby operations other than just typing out and try out myself. So far this is what I have found:

As for divisions, it seems that the only case where dask will keep the divisions after performing groupby is when calling it on an indexed column and then followed by apply. e.g (ddf.groupby(ddf.index.name).apply(fn)). See this: Groupby-agg on index with known divisions results in unknown divisions #2999. All other cases will generated unknown divisions, such as groupby.transform, groupby.agg().
As for indexing, it seems that after ddf.groupby(any_cols).apply() will keep whatever the index ddf has before, although the order of the index values could change. And the same goes for ddf.groupby(any_cols).transform(). The only case where index get reset is when you call reduction on groupby, such as ddf.groupby(any_cols).agg({}). And even you could get multi-index if you passed a list to groupby , although Dask does not generally supports multi-index.
As for npartitions, based on the observations above, my general assumption is for groupby.apply(), will keep the original indexing as well as the original npartititions before calling groupby.apply()? So ddf.groupby().apply() will have the same number of partitions as ddf? And for ddf.groupby.agg, it would get 1 partition in the result unless you specified otherwise via split_out parameter in agg() function?

It would be good if we have a general document that describe the behaviour of groupby in terms of the results' index, divisions, and npartitions after the groupby call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask divisions,npartitions and indexes after a groupby call #10459

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Dask divisions,npartitions and indexes after a groupby call #10459

simonykq Aug 23, 2023

Replies: 0 comments

simonykq
Aug 23, 2023