Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement to reduce memory usage on large audio files #6

Open
Mickle-Mouse opened this issue Jun 23, 2023 · 0 comments
Open

Improvement to reduce memory usage on large audio files #6

Mickle-Mouse opened this issue Jun 23, 2023 · 0 comments

Comments

@Mickle-Mouse
Copy link

The np.diag(sigma2) in __ndlp, uses O(n^2) memory, where n grows linearly with signal (audio) length. I believe it can be fixed by replacing a matrix multiply with an element-wise multiply. Note two things:
(1) np.dot on two 2D arrays is interpreted as matrix multiply.
(2) np.dot(A, np.diag(B)) = matmul(A, np.diag(B)) = A * B[np.newaxis, :]
(2.1) Just to elaborate why the above holds. The matrix multiply can be thought of repeating for each row in A, multiply together the ith column by the ith row in the diagonal matrix (because everywhere else is zero). Thus, this reduces to an element-wise multiply.

Numerical check (run it as often as you want to verify):
tmp1 = np.random.rand(2,8)
tmp2 = np.random.rand(8)
res1 = (tmp1 @ np.diag(tmp2))
res2 = (tmp1 * tmp2[None, :])
np.allclose(res1,res2)

Code that is to be modified

def __ndlp(self, xk):
        """Variance-normalized delayed liner prediction 
        Here is the specific WPE algorithm implementation. The input should be
        the reverberant time-frequency signal in a single frequency bin and 
        the output will be the dereverberated signal in the corresponding 
        frequency bin.
        Args:
            xk: A 2-dimension numpy array with shape=(frames, input_chanels)
        Returns:
            A 2-dimension numpy array with shape=(frames, output_channels)
        """
                
        cols = xk.shape[0] - self.d
        xk_buf = xk[:,0:self.out_num]
        xk = np.concatenate(
            (np.zeros((self.p - 1, self.channels)), xk),
            axis=0)
        xk_tmp = xk[:,::-1].copy()
        frames = stride_tricks.as_strided(
            xk_tmp,
            shape=(self.channels * self.p, cols),
            strides=(xk_tmp.strides[-1], xk_tmp.strides[-1]*self.channels))
        frames = frames[::-1]
        sigma2 = np.mean(1 / (np.abs(xk_buf[self.d:]) ** 2), axis=1)
        
        for _ in range(self.iterations):
            x_cor_m = np.dot(
                    #np.dot(frames, np.diag(sigma2)),  # REPLACE THIS LINE WITH THE FOLLOWING
                    frames * sigma2[None, :],
                    np.conj(frames.T))
            x_cor_v = np.dot(
                frames, 
                np.conj(xk_buf[self.d:] * sigma2.reshape(-1, 1)))
            coeffs = np.dot(np.linalg.inv(x_cor_m), x_cor_v)
            dk = xk_buf[self.d:] - np.dot(frames.T, np.conj(coeffs))
            sigma2 = np.mean(1 / (np.abs(dk) ** 2), axis=1)
        return np.concatenate((xk_buf[0:self.d], dk))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant