-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248
Conversation
"""Compute fast dot products directly calling blas. | ||
|
||
This function calls BLAS directly while warranting | ||
that Fortran contiguity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortran contiguity. [ no "that" ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouups.
There are two fundamental flaws in the current design, that I think should be adressed before merging:
|
Agreed with @pgervais comment on design flaws. In addition, the travis tests fail. |
see inline discussion above ...
would be great to know more about the travis environment, their blas module seems to be somewhat older / different from what I have on my machine, grrr. |
Yeah, users tend to do that :) |
LOL |
@GaelVaroquaux @pgervais my recent commit addresses our discussion, tests passing on my box. I now need to find a way to deal with blas versions. |
if X.flags.c_contiguous: | ||
return array2d(X.T, copy=False, order='F'), True | ||
else: | ||
return array2d(X, copy=False, order='F'), False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array2d make a copy if the order changes. Is it what you intend here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I got it wrong. This code works.
@agramfort addressed your comments. |
@agramfort seems we"re finally done here. |
@agramfort rebased. Travis won"t be though happy unless the master is fixed. |
Rebased, updating to MRG |
def test_fast_dot(): | ||
"""Check fast dot blas wrapper function""" | ||
A = np.random.random([2, 10]) | ||
B = np.random.random([2, 10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please never use the np.random
singleton in tests, rather do:
rng = np.random.RandomState(42)
A = rng.random([2, 10])
B = rng.random([2, 10])
The goal is to have all the test run side effects-free to make them order independent.
You have to check with numpy 1.7.1 but I think this optimization has already been included in upstream numpy:
I am using a numpy built against the OSX 10.8 system install of Blas. Which Blas do you have? MKL? |
@@ -199,12 +199,12 @@ def test_fit_transform(): | |||
X = rng.random_sample((100, 10)) | |||
for whiten, n_components in [[True, 5], [False, 10]]: | |||
|
|||
ica = FastICA(n_components=5, whiten=whiten, random_state=0) | |||
ica = FastICA(n_components=n_components, whiten=whiten, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix should be cherry-picked in another PR to be merged to master before the release.
Thanks @ogrisel addressed the testing issue in a recent commit. My blas should be MKL, binaries from Canopy64. |
@pgervais @agramfort @ogrisel @GaelVaroquaux @vene | whoever got a few minutes to spend on this With this gist: https://gist.github.com/dengemann/6094449 You can produce this plot: It would be good to know whether you can get similar results on your machines across shapes / sizes. |
Can you open a new pr without fast dot but just ICA improvements? |
Yes, that"s possible. I think there aren"t so many in this PR. Oh yes, the logcosh .... |
FIXES: dot products ENH: fix BLAS, get tests running, reduce MEM COSMITS + FIX version issue ENH: equalize call signatures, address API ENH: improve impose f order ENH: return np.dot if blas not available ENH catch attribute error ENH: remove debugging support statements + checks ENH: more fast dots FIX/STY: addressing @agramfort "s comments ENH: better 2d handling COSMIT what"s new COSMITS ENH: better tests ENH: another fast_dot
Closing this PR to create two separate ones: fast_dot only + ica improvements. |
Hi there, I finally got it running.
This implements a feature "advocated" on this scipy page (section on large arrays and linalg):
http://wiki.scipy.org/PerformanceTips
When directly calling blass instead of np.dot it"s possible to avoid copying when data are passed in F-contiguous order. In addition I"ve added chunking to the
_logcosh
function which avoids an extra copy.This is now how it looks on 1GB testing data:
This was how the same test would have looked on the current master (plot from the last memory PR):
To make this functionality available for other use cases I"ve added a
fast_dot
function toutils.extmath
with almost stupid but explicit tests that exemplify the mapping betweennp.dot
andfast_dot
which can be a hell.Finally I"ve made sure that down-stream applications are still workin. For example with this local branch the mne-python ICA looks as good as it had looked before.
cc @agramfort @GaelVaroquaux @mblondel