Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

Closed
wants to merge 1 commit into from

Conversation

dengemann
Copy link
Contributor

Hi there, I finally got it running.

This implements a feature "advocated" on this scipy page (section on large arrays and linalg):

http://wiki.scipy.org/PerformanceTips

When directly calling blass instead of np.dot it"s possible to avoid copying when data are passed in F-contiguous order. In addition I"ve added chunking to the _logcosh function which avoids an extra copy.
This is now how it looks on 1GB testing data:

fast_dot_chunking_logcosh

This was how the same test would have looked on the current master (plot from the last memory PR):
memory_ica_par_w1_computation_del_gwx

To make this functionality available for other use cases I"ve added a fast_dot function to utils.extmath with almost stupid but explicit tests that exemplify the mapping between np.dot and fast_dot which can be a hell.
Finally I"ve made sure that down-stream applications are still workin. For example with this local branch the mne-python ICA looks as good as it had looked before.

cc @agramfort @GaelVaroquaux @mblondel

"""Compute fast dot products directly calling blas.

This function calls BLAS directly while warranting
that Fortran contiguity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortran contiguity. [ no "that" ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouups.

@pgervais
Copy link
Contributor

There are two fundamental flaws in the current design, that I think should be adressed before merging:

  • fast_dot output depends on the ordering of the input arrays. This is highly implicit, and different from the usual Numpy behaviour
  • fast_dot and numpy.dot must have the same signature (for consistency). fast_dot should just be a faster | leaner version of np.dot.

@GaelVaroquaux
Copy link
Member

Agreed with @pgervais comment on design flaws.

In addition, the travis tests fail.

@dengemann
Copy link
Contributor Author

Agreed with @pgervais comment on design flaws.

see inline discussion above ...

In addition, the travis tests fail.

would be great to know more about the travis environment, their blas module seems to be somewhat older / different from what I have on my machine, grrr.

@GaelVaroquaux
Copy link
Member

their blas module seems to be somewhat older / different from what I have on my machine, grrr.

Yeah, users tend to do that :)

@dengemann
Copy link
Contributor Author

Yeah, users tend to do that :)

LOL

@dengemann
Copy link
Contributor Author

@GaelVaroquaux @pgervais my recent commit addresses our discussion, tests passing on my box. I now need to find a way to deal with blas versions.

if X.flags.c_contiguous:
return array2d(X.T, copy=False, order='F'), True
else:
return array2d(X, copy=False, order='F'), False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array2d make a copy if the order changes. Is it what you intend here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I got it wrong. This code works.

@dengemann
Copy link
Contributor Author

@agramfort addressed your comments.

@dengemann
Copy link
Contributor Author

@agramfort seems we"re finally done here.

@dengemann
Copy link
Contributor Author

@agramfort rebased. Travis won"t be though happy unless the master is fixed.

@dengemann
Copy link
Contributor Author

Rebased, updating to MRG

def test_fast_dot():
"""Check fast dot blas wrapper function"""
A = np.random.random([2, 10])
B = np.random.random([2, 10])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never use the np.random singleton in tests, rather do:

rng = np.random.RandomState(42)
A = rng.random([2, 10])
B = rng.random([2, 10])

The goal is to have all the test run side effects-free to make them order independent.

@ogrisel
Copy link
Member

ogrisel commented Jul 27, 2013

You have to check with numpy 1.7.1 but I think this optimization has already been included in upstream numpy:

>>> import numpy as np
>>> from sklearn.utils.extmath import fast_dot
>>> A, B = np.random.normal(size=(1000, 1000)), np.random.normal(size=(1000, 1000))
>>> %timeit _ = fast_dot(A, B)
10 loops, best of 3: 71.9 ms per loop
>>> %timeit _ = np.dot(A, B)
10 loops, best of 3: 66.5 ms per loop

I am using a numpy built against the OSX 10.8 system install of Blas. Which Blas do you have? MKL?

@@ -199,12 +199,12 @@ def test_fit_transform():
X = rng.random_sample((100, 10))
for whiten, n_components in [[True, 5], [False, 10]]:

ica = FastICA(n_components=5, whiten=whiten, random_state=0)
ica = FastICA(n_components=n_components, whiten=whiten, random_state=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should be cherry-picked in another PR to be merged to master before the release.

@dengemann
Copy link
Contributor Author

Thanks @ogrisel addressed the testing issue in a recent commit. My blas should be MKL, binaries from Canopy64.

@dengemann
Copy link
Contributor Author

@pgervais @agramfort @ogrisel @GaelVaroquaux @vene | whoever got a few minutes to spend on this

With this gist:

https://gist.github.com/dengemann/6094449

You can produce this plot:

fast_dot_profile

It would be good to know whether you can get similar results on your machines across shapes / sizes.

@dengemann
Copy link
Contributor Author

Here is my second benchmark for n_features, n_samples = 5e3, 5e3
fast_dot_square_matrix

@agramfort
Copy link
Member

Can you open a new pr without fast dot but just ICA improvements?

@dengemann
Copy link
Contributor Author

Can you open a new pr without fast dot but just ICA improvements?

Yes, that"s possible. I think there aren"t so many in this PR. Oh yes, the logcosh ....
I fear this is to interwoven to cherry pick, need to do it manually

FIXES: dot products

ENH: fix BLAS, get tests running, reduce MEM

COSMITS + FIX version issue

ENH: equalize call signatures, address API

ENH: improve impose f order

ENH: return np.dot if blas not available

ENH catch attribute error

ENH: remove debugging support statements + checks

ENH: more fast dots

FIX/STY: addressing @agramfort "s comments

ENH: better 2d handling

COSMIT

what"s new

COSMITS

ENH: better tests

ENH: another fast_dot
@dengemann
Copy link
Contributor Author

Closing this PR to create two separate ones: fast_dot only + ica improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants