Benefit of `Emit` Statements #63

susan-garry · 2023-04-19T20:42:33Z

susan-garry
Apr 19, 2023
Collaborator

As I understand it, the main theoretical benefits of emit statements is that they should make it easier to parallelize a program by ensuring that each element in a set is independent of the others (or at least making it easier to run an analysis to determine the independence of these elements). But using these special sets, for all their guarantees, is not sufficient to say that two elements in the set are independent in that the order of computation does not matter. Here is a very simple example in which elements are not technically independent of each other, and where their computation cannot be parallelized:

i: int = 0;
s: ParSet<int> = new ParSet();
while i < 10 {
    emit i to s;
    i = i   2;
}

Here, because each element is computed from the value of the previous element, they are not independent. Even though emit statements guarantee that we cannot access value already in s, we can still keep track of these values using state variables such that the value of one depends on the values of one or more of the previously computed element(s).

Here is an example of a silly but syntactically valid pangenomic graph transformation:

s: ParSet<Segment> = new ParSet();
prev_seq : Strand = new Strand();
for node in nodes {
    seq' : Strand = node.sequence   prev_seq;
    prev_seq = node.sequence;
    emit { node with sequence = seq' }
}

By using prev_seq as a state variable that gives us information about the other nodes in the graph, the value of each node in s depends on the value of the previous node added to s. We may not want this to be a valid pollen program, but it is syntactically valid, and it's not clear to me how we would differentiate this as an invalid program. Perhaps we can check if a body that contains an emit statement is modifying any variables defined outside of the body to ensure that there aren't any variables being used to keep track of state. This way, analyses that use subset-paths can still be parallelized, and perhaps we can implement a pass that tells a user if their computation can or can't be parallelized and why or why not. But this does limit our ability to automatically parallelize programs (for example, it's not obvious to me how we could straightforwardly parallelize while loops). Has anyone already thought about this snag, and is there any existing literature on parallelizing programs that are written in a sequential manner?

anshumanmohan · 2023-04-20T17:03:43Z

anshumanmohan
Apr 20, 2023
Maintainer

Thanks for this, Susan!

It seems to me that a large number of odgi algorithms can be run in parallel, or by sequencing together multiple parallel passes. I know we're at some point going to want to fly the odgi nest and let users program whatever they want, but it's probably still fair to assume that a large number of interesting pangenome-ey questions can be executed in this parallel pattern. We can shop this idea around and see what the odgi folks think. Unless proven otherwise, I think that the above should be an invalid program, at least for pollen v1.

A pass that flags unparallelizable programs sounds super, and we can see what info there is about that already. Suggesting reasons/fixes for non-parallelizability sounds super, but maybe that can be a future goal?

0 replies

anshumanmohan · 2023-04-20T18:49:29Z

anshumanmohan
Apr 20, 2023
Maintainer

Just a reminder that the term @sampsyo told us to look out for is loop-carried dependencies.

0 replies

anshumanmohan · 2023-04-21T00:05:49Z

anshumanmohan
Apr 21, 2023
Maintainer

I also jotted down the conclusion of a discussion re: emit preserving order.

The semantics of emit should not guarantee an order. A user is not allowed to say, "well I only wanted the depth of node 4; why have you gone and searched the whole graph of 10000 nodes?" Early return like that is not allowed, or is at least not exposed to the user.

A layer below, the compiler is allowed to stop such traversals early. It will do that by checking that traversing all the other nodes is indeed free of side-effects, for example.

0 replies

susan-garry · 2023-04-21T01:36:58Z

susan-garry
Apr 21, 2023
Collaborator Author

Just a quick summary of what our options are in dealing with loop-dependent variables before I close out the issue:

Don't check for them. Assume that all (for) loops that use emit statements can be parallelized, and if they contain loop-dependent variables that cause the program to crash, the program will simply crash.
Check for loop-dependent variables and assume that all loops which contain LDVs cannot be parallelized; then
1. Compile loops that contain LDVs to run sequentially, or
2. Throw an error.
  We could also elect to give a warning or some other kind of explanation to the user about the loop that can't be parallelized and why. Additionally, in the future, perhaps users can decorate a loop with an attribute that tells the compiler to parallelize the loop computation even if it contains an LDV.
Come up with a way to determine which LDVs render a loop computation unparallelizable. We can treat loops that cannot be parallelized the same as in (2).

1 reply

sampsyo Apr 21, 2023
Maintainer

Yes indeed! I suggested synchronously, FWIW, that option 1 is a simple way to get started that doesn't paint us into a corner, i.e., it won't prevent doing something more principled (such as 3(ii)) in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benefit of `Emit` Statements #63

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Benefit of Emit Statements #63

susan-garry Apr 19, 2023 Collaborator

Replies: 4 comments · 1 reply

anshumanmohan Apr 20, 2023 Maintainer

anshumanmohan Apr 20, 2023 Maintainer

anshumanmohan Apr 21, 2023 Maintainer

susan-garry Apr 21, 2023 Collaborator Author

sampsyo Apr 21, 2023 Maintainer

Benefit of `Emit` Statements #63

susan-garry
Apr 19, 2023
Collaborator

Replies: 4 comments 1 reply

anshumanmohan
Apr 20, 2023
Maintainer

anshumanmohan
Apr 20, 2023
Maintainer

anshumanmohan
Apr 21, 2023
Maintainer

susan-garry
Apr 21, 2023
Collaborator Author

sampsyo Apr 21, 2023
Maintainer