Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gametic heterozygosity_observed #279

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

timothymillar
Copy link
Contributor

@timothymillar timothymillar commented Jul 18, 2019

See #277 for earlier discussion

This updates heterozygosity_observed to use "gametic heterozygosity" which assumes polysomic inheritance (i.e. autopolyploidy).
Gametic heterozygosity is identical to the existing calculation (Nei's method) for the diploid case but generalises it to autopolyploids.

This implementation follows Hardy 2016 and Meirmans and Liu 2018.

An additional argument corrected is added which defaults to True to correct for the ploidy level.
If this is set to False uncorrected Ho is calculated which is discussed in Meirmans and Liu 2018 for comparing across ploidy levels.

Note that the existing code is used as a special case for diploids because it is faster - not because it produces a different result.

I updated the triploid test case though I'm not entirely sure about the applicability to odd-numbered ploidy levels (Edit: this method should be fine for odd ploidy levels).

@pep8speaks
Copy link

pep8speaks commented Jul 18, 2019

Hello @timothymillar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-04 07:23:26 UTC

@timothymillar
Copy link
Contributor Author

@alimanfoo I'm having some second thoughts about this PR now.

This heterozygosity_observed still requires a GenotypeArray an hence a single ploidy level for all samples.

If #287 were to be implemented then heterozygosity_observed could be updated for mixed ploidy, but #287 is just a suggestion at this point.

Alternatively a new function could be implemented that takes a GenotypeAlleleCountsArray and assumes the ploidy level at each loci in each sample is equivalent to the sum of allele counts (i.e. it assumes that all genotypes are complete). This would allow for mixed ploidy levels but would require that the user removes any partial genotypes themselves.

This function is based on the definition in Hardy (2016)
"Population genetics of autopolyploids under a mixed mating
model and the estimation of selfing rate".
It calculates the 'level' of heterozygosity of individuals
in a consistant way across ploidy levels such that a diploid
always has a heterozygosity of 0 (hom) or 1 (het) and
a polyploid will always have a values from 0 (fully hom) to
1 (fully het).
This also enables using heterozygosity_observed with mixed-ploidy
data via the 'ploidy' argument.
@timothymillar
Copy link
Contributor Author

@alimanfoo I have updated this with the following changes:

  • factored out a heterozygosity_individual function that calculates heterozygosity per individual (0 or 1 for diploids)
  • added a ploidy argument to heterozygosity_individual and heterozygosity_observed to allow for mixed ploidy genotypes (along either/both axes)

I think this is the correct approach for supporting mixed ploidy data as it makes it explicit which functions are supported and avoids complicating the base genotype model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants