Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update extensions structure in JSON to explicitly define order in models #3

Open
dustine32 opened this issue Apr 4, 2018 · 4 comments

Comments

@dustine32
Copy link
Contributor

Currently the JSON structure for terms under a relation in the extensions field only allow a list, which may be unordered. If we think we should change this spec to better represent the desired nesting structure of the SynGO GO-CAM models I have some ideas.

The existing structure for SYNGO_2082 is:

"extensions": [
                        {
                            "occurs_in": [
                                "UBERON:0001876",
                                "GO:0098982"
                            ]
                        }
                    ]

Perhaps, to represent the desired nesting order it should be:

"extensions": [
                        {
                            "occurs_in": {
                                "GO:0098982": {
                                    "part_of": "UBERON:0001876"
                                }
                            }
                        }
                    ]

where the value in each dictionary determines whether the chain should be terminated? Dictionary="keep going" whereas String="we're done"?

There's also the method used in the ontobio python library for parsing GAF/GPAD, which structures extensions as described here.

After adjusting @cmungall 's example a bit (replacing "union_of" key with "extensions"):

"extensions": [
   {'intersection_of':[ 
       {property:P1, filler:F1},
        ...
      ]
    },
    ...
]

And then putting in our example values:

"extensions": [
   {'intersection_of':[ 
       {property:"occurs_in", filler:"GO:0098982"},
        ...
      ]
    },
   {'intersection_of':[ 
       {property:"part_of", filler:"UBERON:0001876"},
        ...
      ]
    },
    ...
]

Hm, though now I see that the ontobio solution doesn't really specify order. Perhaps our use case in SynGO is so specialized maybe we shouldn't worry about the spec being changed and just handle this in code? What do you guys think @ftwkoopmans @thomaspd ?

@cmungall
Copy link
Member

cmungall commented Apr 4, 2018

Need a bit more context here..

the syngo model correctly nests the synapse in the amygdala.

The export to GPAD is valid but lossy:
http://noctua-dev.berkeleybop.org/workbench/annpreview/?model_id=gomodel:SYNGO_2082

this is because GAF/GPAD extensions do not allow nesting. There is no way around this without extending GAF/GPAD

(The nesting is implicitly there, especially with the separate CC annotation with its amygdala extension. But it's not explicit and any attempt to recover the nesting is heuristic rather than 100% guaranteed valid)

I suggest we simply call this done. People rarely use the extensions, never mind care about nesting.

If people want the full info, it's there in the GO-CAM, we should focus energies on making this more accessible.

Maybe I'm missing something, but where does the JSON you are generating fit in?

@dustine32
Copy link
Contributor Author

OK @cmungall , so is the nesting implicit due to the ontology hierarchy between the CC term and the UBERON? If so, I probably just forgot that and can then expect to handle this in code because I can check the ontology when generating the model.

As far as I know, the JSON that is specced here is the export from SynGO and the input for the syngo2lego Scala code to generate the GO-CAM TTL.

@cmungall
Copy link
Member

cmungall commented Apr 4, 2018

The nesting is absolutely explicit in the GO-CAM

In the GPAD, the nesting is strongly suggested by a number of factors: po CC, po AE generally means po 'CC,po(AE)'. But this is not 100% guaranteed - edge cases involving synapses that are very distant from the soma of the cell. Also, in the GPAD the annotations are not formally associated. So you could have a GPAD where ann1 says g1 is part-of a synapse and ann2 says g1 is part-of the liver. But these are totally different contexts and it would be wrong to say synapse in liver. The fact that the publications are the same suggests the context is the same but the GO-CAM is the only unambiguous way of saying the same context.

syngo2lego - ok, get the context now. So I don't know enough about the source syngo representation and how flexible it is. I don't know if it reliably has nesting or if this is something you must do heuristically when converting to go-cam

@dustine32
Copy link
Contributor Author

Thanks @cmungall for explaining the resulting GPAD limitations! I think for now we can just depend on the syngo2lego code I have to nest these in the model. If someone finds a model with incorrect nesting order I can address it in the syngo2lego code rather than adding flexibility/complexity to the JSON source spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants