Benefit of Emit
Statements
#63
Replies: 4 comments 1 reply
-
Thanks for this, Susan! It seems to me that a large number of A pass that flags unparallelizable programs sounds super, and we can see what info there is about that already. Suggesting reasons/fixes for non-parallelizability sounds super, but maybe that can be a future goal? |
Beta Was this translation helpful? Give feedback.
-
Just a reminder that the term @sampsyo told us to look out for is loop-carried dependencies. |
Beta Was this translation helpful? Give feedback.
-
I also jotted down the conclusion of a discussion re: The semantics of A layer below, the compiler is allowed to stop such traversals early. It will do that by checking that traversing all the other nodes is indeed free of side-effects, for example. |
Beta Was this translation helpful? Give feedback.
-
Just a quick summary of what our options are in dealing with loop-dependent variables before I close out the issue:
|
Beta Was this translation helpful? Give feedback.
-
As I understand it, the main theoretical benefits of emit statements is that they should make it easier to parallelize a program by ensuring that each element in a set is independent of the others (or at least making it easier to run an analysis to determine the independence of these elements). But using these special sets, for all their guarantees, is not sufficient to say that two elements in the set are independent in that the order of computation does not matter. Here is a very simple example in which elements are not technically independent of each other, and where their computation cannot be parallelized:
Here, because each element is computed from the value of the previous element, they are not independent. Even though emit statements guarantee that we cannot access value already in
s
, we can still keep track of these values using state variables such that the value of one depends on the values of one or more of the previously computed element(s).Here is an example of a silly but syntactically valid pangenomic graph transformation:
By using
prev_seq
as a state variable that gives us information about the other nodes in the graph, the value of each node ins
depends on the value of the previous node added tos
. We may not want this to be a valid pollen program, but it is syntactically valid, and it's not clear to me how we would differentiate this as an invalid program. Perhaps we can check if a body that contains anemit
statement is modifying any variables defined outside of the body to ensure that there aren't any variables being used to keep track of state. This way, analyses that usesubset-paths
can still be parallelized, and perhaps we can implement a pass that tells a user if their computation can or can't be parallelized and why or why not. But this does limit our ability to automatically parallelize programs (for example, it's not obvious to me how we could straightforwardly parallelize while loops). Has anyone already thought about this snag, and is there any existing literature on parallelizing programs that are written in a sequential manner?Beta Was this translation helpful? Give feedback.
All reactions