Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in sklearn.manifold tsne #3526

Closed
gchers opened this issue Aug 3, 2014 · 1 comment · Fixed by #3786
Closed

Bug in sklearn.manifold tsne #3526

gchers opened this issue Aug 3, 2014 · 1 comment · Fixed by #3786
Labels

Comments

@gchers
Copy link
Contributor

gchers commented Aug 3, 2014

It looks to me there's an issue when using some distance different from 'euclidean' in tsne implementation.
Example:

import numpy as np
from sklearn.manifold import TSNE
x=np.random.randn(10,10)
t=TSNE(metric='chebyshev')
z=t.fit_transform(x)
Produces:
TypeError: pdist() got an unexpected keyword argument 'squared'

In fact at lines 438-439 of sklearn/manifold/t_sne.py function _fit() I read:
distances = pairwise_distances(X, metric=self.metric, squared=True)
which means that it always provides 'squared' argument to pairwise_distances(). Now, not all the distances support this (see sklearn/metric/pairwise.py), which leads to an error.

I report this as an issue because in t_sne.py from line 338 it says:

If metric is a string, it must be one of the options
allowed by scipy.spatial.distance.pdist for its metric parameter, or
a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.

I'd suggest to substitute line 438 with something like:

if self.metric == 'euclidean':
    distances = pairwise_distances(X, metric=self.metric, squared=True)
else:
    distances = pairwise_distances(X, metric=self.metric)

but I'm quite sure it's not as simple as that: I don't, for example, understand why the author wanted squared=True.

Cheers

@larsmans larsmans added the Bug label Aug 4, 2014
@makokal
Copy link
Contributor

makokal commented Aug 5, 2014

I agree, the squared named parameter is annoying when specifying custom distance metrics. Also for squared euclidean distance which I believe was intended, we can use the sqeuclidean as in Scipy pdist.

AlexanderFabisch pushed a commit to AlexanderFabisch/scikit-learn that referenced this issue Oct 13, 2014
Some of the pairwise distances do not support the additional `squared` parameter. I suggest using `sqeuclidian` and such whenever this is required.
AlexanderFabisch pushed a commit to AlexanderFabisch/scikit-learn that referenced this issue Oct 13, 2014
New fix following the discussion on the previous pull request. Thanks to @jnothman,  @mblondel and others ..
AlexanderFabisch pushed a commit to AlexanderFabisch/scikit-learn that referenced this issue Oct 20, 2014
Some of the pairwise distances do not support the additional `squared` parameter. I suggest using `sqeuclidian` and such whenever this is required.
AlexanderFabisch pushed a commit to AlexanderFabisch/scikit-learn that referenced this issue Oct 20, 2014
New fix following the discussion on the previous pull request. Thanks to @jnothman,  @mblondel and others ..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants