Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional disabling of statistics in poppr table #248

Open
2 of 5 tasks
Neato-Nick opened this issue Sep 13, 2021 · 6 comments
Open
2 of 5 tasks

Optional disabling of statistics in poppr table #248

Neato-Nick opened this issue Sep 13, 2021 · 6 comments

Comments

@Neato-Nick
Copy link

Please place an "x" in all the boxes that apply

  • I have the most recent version of poppr and R
  • I have found a bug
  • I want to request a new feature

Would it be possible to enable or disable some of the diversity statistics calculated in the poppr table? In a population I'm analyzing, .ia is getting an error because of negative values. It's already been shown that this population is very highly clonal, but I am interested in the other diversity indices aside from index of association. A workaround I see for this is to just not run the index of association, maybe this looks like a parameter that takes a character vector of 'calculations to run' and by default it just runs all of them?

@zkamvar
Copy link
Member

zkamvar commented Sep 13, 2021

In a population I'm analyzing, .ia is getting an error because of negative values.

That's.... odd. This function shouldn't be erroring on negative values of IA. Can you include a copy of this error?

Would it be possible to enable or disable some of the diversity statistics calculated in the poppr table?

I might be able to implement that. The poppr() function is one of those weird legacy functions whose scope changed over time.

FWIW, You can get most of the way to the table with the diversity_stats() function, which gives you clonality statistics from the MLG counts and I think I could write in the rarefaction in there (but I'd need to check). As for stats like Hexp, that one I really should just export since it calculates the corrected version (whereas summary() calculates the uncorrected version).

If all you need are MLG stats, then diversity_table() is your best bet.

@Neato-Nick
Copy link
Author

Neato-Nick commented Sep 13, 2021

That's.... odd. This function shouldn't be erroring on negative values of IA. Can you include a copy of this error?

I don't have a reprex for the data that caused the error, but this is the error I'm getting. I've gotten negative ia results before and poppr reported them, I'm guessing this is maybe a combination of large data (>=130 individuals w/40k snps) and very high clonality? Negative length vector is a weird error and my google-fu didn't get me very far. I kind of was just thinking this was due to my very highly clonal data.
Error in .Ia.Rd(popx, missing) : negative length vectors are not allowed

diversity_stats()

Thanks, this is helpful! Clonality stats are good. I do want Hexp eventually too though. Looks like it doesn't split out populations nicely like the poppr table does though, guess I'd need to seppop and run mlg.table() and diversity_stats() for each one?

@zkamvar
Copy link
Member

zkamvar commented Sep 13, 2021

Error in .Ia.Rd(popx, missing) : negative length vectors are not allowed

This is definitely a bug and I'm not sure where it's coming from. Would you be willing to send me a copy of your data so I could check it out?

@Neato-Nick
Copy link
Author

Neato-Nick commented Sep 27, 2021

Hi @zkamvar , sorry I ghosted you. I was tracking down other potential issues in my data to make sure it was not a bug upstream and I feel much more confident in my data now. I'm e-mailing you at the address in your bio.

@zkamvar
Copy link
Member

zkamvar commented Sep 28, 2021

I realize I responded via email and not on GitHub.

This is happening because there are too many loci being fed into the index of association function. It needs to construct a vector of n(n-1)/2 (where n is the number of loci) to calculate the denominator and the number of loci presented is just too large. The recommended alternative is to convert your source data to genlight or snpclone and use something like samp.ia() to calculate the index off of a random sample of loci.

Details for nerds

The reason this is happening is because there is an integer overflow in the construction of the covariance matrix (the count variable on line 111):

poppr/src/poppr_distance.c

Lines 95 to 116 in 94c745a

SEXP pairwise_covar(SEXP pair_vec)
{
int I;
int i;
int j;
int count;
SEXP Rout;
I = length(pair_vec);
PROTECT(pair_vec = coerceVector(pair_vec, REALSXP));
PROTECT(Rout = allocVector(REALSXP, (I*(I-1)/2) ));
count = 0;
for(i = 0; i < I-1; i )
{
R_CheckUserInterrupt();
for(j = i 1; j < I; j )
{
REAL(Rout)[count ] = sqrt(REAL(pair_vec)[i] * REAL(pair_vec)[j]);
}
}
UNPROTECT(2); // pair_vec; Rout
return Rout;
}

I can avoid this by specifying a long int (or whatever solution I came up with to solve #234), adding a warning about too many loci, and adding a meaningful error when the counter overflows.

@Neato-Nick
Copy link
Author

Ah, that totally makes sense. It worked for ~22k loci but crashed at ~56k loci, but I recognize that even ~22k loci is a lot to use. I'll definitely subset my data.

Thanks for digging into the weeds to figure out what's going on. I must be a huge nerd, 234 and #235 were interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants