Improvement to reduce memory usage on large audio files #6

Mickle-Mouse · 2023-06-23T04:42:11Z

The np.diag(sigma2) in __ndlp, uses O(n^2) memory, where n grows linearly with signal (audio) length. I believe it can be fixed by replacing a matrix multiply with an element-wise multiply. Note two things:
(1) np.dot on two 2D arrays is interpreted as matrix multiply.
(2) np.dot(A, np.diag(B)) = matmul(A, np.diag(B)) = A * B[np.newaxis, :]
(2.1) Just to elaborate why the above holds. The matrix multiply can be thought of repeating for each row in A, multiply together the ith column by the ith row in the diagonal matrix (because everywhere else is zero). Thus, this reduces to an element-wise multiply.

Numerical check (run it as often as you want to verify):
tmp1 = np.random.rand(2,8)
tmp2 = np.random.rand(8)
res1 = (tmp1 @ np.diag(tmp2))
res2 = (tmp1 * tmp2[None, :])
np.allclose(res1,res2)

Code that is to be modified

def __ndlp(self, xk):
        """Variance-normalized delayed liner prediction 
        Here is the specific WPE algorithm implementation. The input should be
        the reverberant time-frequency signal in a single frequency bin and 
        the output will be the dereverberated signal in the corresponding 
        frequency bin.
        Args:
            xk: A 2-dimension numpy array with shape=(frames, input_chanels)
        Returns:
            A 2-dimension numpy array with shape=(frames, output_channels)
        """
                
        cols = xk.shape[0] - self.d
        xk_buf = xk[:,0:self.out_num]
        xk = np.concatenate(
            (np.zeros((self.p - 1, self.channels)), xk),
            axis=0)
        xk_tmp = xk[:,::-1].copy()
        frames = stride_tricks.as_strided(
            xk_tmp,
            shape=(self.channels * self.p, cols),
            strides=(xk_tmp.strides[-1], xk_tmp.strides[-1]*self.channels))
        frames = frames[::-1]
        sigma2 = np.mean(1 / (np.abs(xk_buf[self.d:]) ** 2), axis=1)
        
        for _ in range(self.iterations):
            x_cor_m = np.dot(
                    #np.dot(frames, np.diag(sigma2)),  # REPLACE THIS LINE WITH THE FOLLOWING
                    frames * sigma2[None, :],
                    np.conj(frames.T))
            x_cor_v = np.dot(
                frames, 
                np.conj(xk_buf[self.d:] * sigma2.reshape(-1, 1)))
            coeffs = np.dot(np.linalg.inv(x_cor_m), x_cor_v)
            dk = xk_buf[self.d:] - np.dot(frames.T, np.conj(coeffs))
            sigma2 = np.mean(1 / (np.abs(dk) ** 2), axis=1)
        return np.concatenate((xk_buf[0:self.d], dk))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement to reduce memory usage on large audio files #6

Improvement to reduce memory usage on large audio files #6

Mickle-Mouse commented Jun 23, 2023

Improvement to reduce memory usage on large audio files #6

Improvement to reduce memory usage on large audio files #6

Comments

Mickle-Mouse commented Jun 23, 2023