[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

dengemann · 2013-07-26T08:32:41Z

Hi there, I finally got it running.

This implements a feature "advocated" on this scipy page (section on large arrays and linalg):

http://wiki.scipy.org/PerformanceTips

When directly calling blass instead of np.dot it"s possible to avoid copying when data are passed in F-contiguous order. In addition I"ve added chunking to the _logcosh function which avoids an extra copy.
This is now how it looks on 1GB testing data:

This was how the same test would have looked on the current master (plot from the last memory PR):

To make this functionality available for other use cases I"ve added a fast_dot function to utils.extmath with almost stupid but explicit tests that exemplify the mapping between np.dot and fast_dot which can be a hell.
Finally I"ve made sure that down-stream applications are still workin. For example with this local branch the mne-python ICA looks as good as it had looked before.

cc @agramfort @GaelVaroquaux @mblondel

pgervais · 2013-07-26T08:48:51Z

sklearn/utils/extmath.py

+    """Compute fast dot products directly calling blas.
+
+    This function calls BLAS directly while warranting
+    that Fortran contiguity.


Fortran contiguity. [ no "that" ]

pgervais · 2013-07-26T09:13:30Z

There are two fundamental flaws in the current design, that I think should be adressed before merging:

fast_dot output depends on the ordering of the input arrays. This is highly implicit, and different from the usual Numpy behaviour
fast_dot and numpy.dot must have the same signature (for consistency). fast_dot should just be a faster | leaner version of np.dot.

GaelVaroquaux · 2013-07-26T09:37:44Z

Agreed with @pgervais comment on design flaws.

In addition, the travis tests fail.

dengemann · 2013-07-26T09:45:55Z

Agreed with @pgervais comment on design flaws.

see inline discussion above ...

In addition, the travis tests fail.

would be great to know more about the travis environment, their blas module seems to be somewhat older / different from what I have on my machine, grrr.

GaelVaroquaux · 2013-07-26T09:46:56Z

their blas module seems to be somewhat older / different from what I have on my machine, grrr.

Yeah, users tend to do that :)

dengemann · 2013-07-26T09:47:42Z

Yeah, users tend to do that :)

LOL

dengemann · 2013-07-26T09:59:58Z

@GaelVaroquaux @pgervais my recent commit addresses our discussion, tests passing on my box. I now need to find a way to deal with blas versions.

pgervais · 2013-07-26T11:33:04Z

sklearn/utils/extmath.py

+    if X.flags.c_contiguous:
+        return array2d(X.T, copy=False, order='F'), True
+    else:
+        return array2d(X, copy=False, order='F'), False


array2d make a copy if the order changes. Is it what you intend here?

Sorry, I got it wrong. This code works.

dengemann · 2013-07-26T12:47:15Z

@agramfort addressed your comments.

dengemann · 2013-07-26T13:01:18Z

@agramfort seems we"re finally done here.

dengemann · 2013-07-26T22:28:03Z

@agramfort rebased. Travis won"t be though happy unless the master is fixed.

dengemann · 2013-07-27T09:05:41Z

Rebased, updating to MRG

ogrisel · 2013-07-27T09:12:38Z

sklearn/utils/tests/test_extmath.py

+def test_fast_dot():
+    """Check fast dot blas wrapper function"""
+    A = np.random.random([2, 10])
+    B = np.random.random([2, 10])


Please never use the np.random singleton in tests, rather do:

rng = np.random.RandomState(42) A = rng.random([2, 10]) B = rng.random([2, 10])

The goal is to have all the test run side effects-free to make them order independent.

ogrisel · 2013-07-27T09:21:24Z

You have to check with numpy 1.7.1 but I think this optimization has already been included in upstream numpy:

>>> import numpy as np
>>> from sklearn.utils.extmath import fast_dot
>>> A, B = np.random.normal(size=(1000, 1000)), np.random.normal(size=(1000, 1000))
>>> %timeit _ = fast_dot(A, B)
10 loops, best of 3: 71.9 ms per loop
>>> %timeit _ = np.dot(A, B)
10 loops, best of 3: 66.5 ms per loop

I am using a numpy built against the OSX 10.8 system install of Blas. Which Blas do you have? MKL?

ogrisel · 2013-07-27T09:22:46Z

sklearn/decomposition/tests/test_fastica.py

@@ -199,12 +199,12 @@ def test_fit_transform():
    X = rng.random_sample((100, 10))
    for whiten, n_components in [[True, 5], [False, 10]]:

-        ica = FastICA(n_components=5, whiten=whiten, random_state=0)
+        ica = FastICA(n_components=n_components, whiten=whiten, random_state=0)


This fix should be cherry-picked in another PR to be merged to master before the release.

dengemann · 2013-07-27T09:23:07Z

Thanks @ogrisel addressed the testing issue in a recent commit. My blas should be MKL, binaries from Canopy64.

dengemann · 2013-07-27T10:07:24Z

@pgervais @agramfort @ogrisel @GaelVaroquaux @vene | whoever got a few minutes to spend on this

With this gist:

https://gist.github.com/dengemann/6094449

You can produce this plot:

It would be good to know whether you can get similar results on your machines across shapes / sizes.

dengemann · 2013-07-27T10:20:48Z

Here is my second benchmark for n_features, n_samples = 5e3, 5e3

agramfort · 2013-07-27T17:18:34Z

Can you open a new pr without fast dot but just ICA improvements?

dengemann · 2013-07-27T17:29:43Z

Can you open a new pr without fast dot but just ICA improvements?

Yes, that"s possible. I think there aren"t so many in this PR. Oh yes, the logcosh ....
I fear this is to interwoven to cherry pick, need to do it manually

@agramfort

FIXES: dot products ENH: fix BLAS, get tests running, reduce MEM COSMITS + FIX version issue ENH: equalize call signatures, address API ENH: improve impose f order ENH: return np.dot if blas not available ENH catch attribute error ENH: remove debugging support statements + checks ENH: more fast dots FIX/STY: addressing @agramfort "s comments ENH: better 2d handling COSMIT what"s new COSMITS ENH: better tests ENH: another fast_dot

dengemann · 2013-07-28T10:42:37Z

Closing this PR to create two separate ones: fast_dot only + ica improvements.

pgervais reviewed Jul 26, 2013
View reviewed changes

ogrisel reviewed Jul 27, 2013
View reviewed changes

dengemann closed this Jul 28, 2013

This was referenced Jul 28, 2013

[MRG]: add fast_dot function that directly calls BLAS with appropriate data input #2288

Merged

[MRG]: improve logcosh function + tests #2290

Merged

aynroot mentioned this pull request Jan 16, 2015

Documentation is outdated (profile.timestamp) pythonprofilers/memory_profiler#90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

dengemann commented Jul 26, 2013

pgervais Jul 26, 2013

dengemann Jul 26, 2013

pgervais commented Jul 26, 2013

GaelVaroquaux commented Jul 26, 2013

dengemann commented Jul 26, 2013

GaelVaroquaux commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

pgervais Jul 26, 2013

pgervais Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 27, 2013

ogrisel Jul 27, 2013

ogrisel commented Jul 27, 2013

ogrisel Jul 27, 2013

dengemann commented Jul 27, 2013

dengemann commented Jul 27, 2013

dengemann commented Jul 27, 2013

agramfort commented Jul 27, 2013

dengemann commented Jul 27, 2013

dengemann commented Jul 28, 2013

[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

[MRG]: add fast_dot function calling BLAS directly and consume only twice the memory of your data #2248

Conversation

dengemann commented Jul 26, 2013

pgervais Jul 26, 2013

Choose a reason for hiding this comment

dengemann Jul 26, 2013

Choose a reason for hiding this comment

pgervais commented Jul 26, 2013

GaelVaroquaux commented Jul 26, 2013

dengemann commented Jul 26, 2013

GaelVaroquaux commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

pgervais Jul 26, 2013

Choose a reason for hiding this comment

pgervais Jul 26, 2013

Choose a reason for hiding this comment

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 26, 2013

dengemann commented Jul 27, 2013

ogrisel Jul 27, 2013

Choose a reason for hiding this comment

ogrisel commented Jul 27, 2013

ogrisel Jul 27, 2013

Choose a reason for hiding this comment

dengemann commented Jul 27, 2013

dengemann commented Jul 27, 2013

dengemann commented Jul 27, 2013

agramfort commented Jul 27, 2013

dengemann commented Jul 27, 2013

dengemann commented Jul 28, 2013