-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation - Find and Fill Method for Dropout Layer #3684
Implementation - Find and Fill Method for Dropout Layer #3684
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we split this one into two implementations, like #3683? The numbers in #3662 show that this implementation is actually a slowdown, although the implementation here is general enough to be used with Bandicoot too.
(Requesting changes just because I don't want to merge a slowdown without a bit of discussion)
it is up to you, usually the dropout is happening once to my understanding per epoch, while there is a slow down, but it is not going to affect the general speed of training. |
It will happen every step of the forward pass, so the slowdown may be noticeable. One quick test would just be to run the |
okay it makes sense then to do the benchmark |
@rcurtin @shrit Absolutely, I'm on the same page. I was actually leaning towards running additional benchmarks for that exact reason before. I've got a few ideas to potentially optimize the algorithm, and I'm eager to test them out in some upcoming benchmarks. Fingers crossed, I should be able to run them later today :) |
@rcurtin @shrit The new algorithm appears slightly faster than the old one when processing small matrices but significantly slower with larger matrices (also visible in the graph in the discussion before). If it's not necessary to replace it, I wouldn't. However, if we need to eliminate the transform function, this could be a viable alternative. For mnist_simple:
For mnist_cnn:
|
@MarkFischinger Okay, I see the benchmark, in this case it would make sense to have both implementation. Please update this PR by following a similar structure that is done in this #3683 |
@shrit In the latest commit, I've implemented both algorithms using a structure similar to what you did in #3683. I have a question about that: Why do we wrap the entire function within the #ifndef directive? Could it be placed inside the function instead? Is there a specific reason for this that I might not know about yet? I'd appreciate understanding your reasoning behind that :) |
@MarkFischinger the other implementation that you added is for bandicoot library. It is a linear algebra library that we are developing (it has the same API as armadillo) to provide GPU acceleration for matrices operations. |
@shrit Thanks for your response! :) Sorry if my initial question wasn’t as clear. I've understood the general concept, but I'm specifically curious about the decision to place the Because my first thought was to implement it directly inside like this (for #3683 for example): template<typename MatType>
void LogSoftMaxType<MatType>::ForwardImpl(const MatType& input,
MatType& output)
{
MatType maxInput = repmat(max(input), input.n_rows, 1);
output = (maxInput - input);
ApplyExponentialFunction(output);
maxInput.each_row() = log(sum(output));
output = input - maxInput;
}
template<typename MatType>
void LogSoftMaxType<MatType>::ApplyExponentialFunction(MatType& output)
{
#ifdef MLPACK_HAS_COOT // <-- Inside the function like this
if constexpr (coot::is_coot_type<MatType>::value)
{
output = exp(output * -1);
}
else
#endif
{
// Approximation of the base-e exponential function. Credits to Leon Bottou.
output.transform([](double x)
{
static constexpr double A0 = 1.0, A1 = 0.125, A2 = 0.0078125, A3 = 0.00032552083, A4 = 1.0172526e-5;
if (x < 13.0)
{
double y = A0 x * (A1 x * (A2 x * (A3 x * A4)));
y *= y; y *= y; y *= y; return 1 / y;
}
return 0.0;
});
}
} I couldn't find any best practices in C on this topic on the web. Do you know which is the preferred implementation? |
@MarkFischinger we can not use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just some comments on the multiple implementations whenever you have a chance. 👍
* @param input Input data used for evaluating the specified function. | ||
* @param output Resulting output activation. | ||
*/ | ||
void ForwardImpl(const MatType& input, MatType& output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to use SFINAE here, e.g. add an argument like const typename std::enable_if_t<arma::is_arma_type<MatType>::value>::type* = 0
(I just did that from memory, it may not be exactly right).
@@ -74,10 74,18 @@ DropoutType<MatType>::operator=(DropoutType&& other) | |||
return *this; | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an extra line 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this one is still missing.
*/ | ||
void ForwardImpl(const MatType& input, MatType& output); | ||
|
||
#ifdef MLPACK_HAS_COOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually need this guard here: we want to use the optimized implementation when we have an Armadillo matrix, and the general implementation otherwise. So long as the second implementation of ForwardImpl()
doesn't use the name coot::
inside of it (and it is possible to avoid that), we should be able to avoid the need for MLPACK_HAS_COOT
, and just use SFINAE here too with a negated condition (i.e. std::enable_if_t<!arma::is_arma_type< ...
).
mask.randu(input.n_rows, input.n_cols); | ||
arma::uvec indices = arma::find(mask > ratio); | ||
mask.zeros(); | ||
mask.elem(indices).fill(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask.elem(indices).fill(1); | |
mask.elem(find(mask > ratio)).fill(1); |
Would this work? It avoids the use of arma::
here, and would make this function fully general. But I think then we would need to find a way to get rid of the .zeros()
call... maybe can we just do mask = (mask > ratio)
? Or something along those lines? (It would be interesting to see the speed of that approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcurtin You can take a look at my implementation in the latest commit. It should be a bit faster and is fully functional. I avoided using arma:: to keep it more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, it looks good to me! Thanks for doing that.
@@ -74,10 74,18 @@ DropoutType<MatType>::operator=(DropoutType&& other) | |||
return *this; | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this one is still missing.
template<typename T = MatType, typename std::enable_if_t<arma::is_arma_type<T>::value, int> = 0> | ||
void ForwardImpl(const T& input, T& output); | ||
|
||
/** | ||
* General implementation of the forward pass of the dropout layer. | ||
* | ||
* @param input Input data used for evaluating the specified function. | ||
* @param output Resulting output activation. | ||
*/ | ||
template<typename T = MatType, typename std::enable_if_t<!arma::is_arma_type<T>::value, int> = 0> | ||
void ForwardImpl(const T& input, T& output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkFischinger
Looks better, just use directly MatType
instead of T
. It is better for readability because T
does not mean a lot.
Also, could you break the line? mlpack code base is usually under 80 lines of code.
Also, the syntax is correct, I would remove int
and instead add a *
after >
so the signature looks as follows:
typename std::enable_if_t<!arma::is_arma_type<T>::value>* = 0>
The reason for this is to have the same signature over all the code base, making it easier for everyone.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
for (size_t i = 0; i < input.n_rows; i) { | ||
for (size_t j = 0; j < input.n_cols; j) { | ||
mask(i, j) = (mask(i, j) > this->ratio) ? 1.0 : 0.0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (size_t i = 0; i < input.n_rows; i) { | |
for (size_t j = 0; j < input.n_cols; j) { | |
mask(i, j) = (mask(i, j) > this->ratio) ? 1.0 : 0.0; | |
} | |
for (size_t i = 0; i < input.n_rows; i) | |
{ | |
for (size_t j = 0; j < input.n_cols; j) | |
{ | |
mask(i, j) = (mask(i, j) > this->ratio) ? 1.0 : 0.0; | |
} |
for (size_t i = 0; i < input.n_rows; i) { | ||
for (size_t j = 0; j < input.n_cols; j) { | ||
mask(i, j) = (mask(i, j) > this->ratio) ? 1.0 : 0.0; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update it as above to me in mlpack style ?
mask.randu(input.n_rows, input.n_cols); | ||
#pragma omp parallel for collapse(2) | ||
for (size_t i = 0; i < input.n_rows; i) | ||
{ | ||
for (size_t j = 0; j < input.n_cols; j) | ||
{ | ||
mask(i, j) = (mask(i, j) > this->ratio) ? 1.0 : 0.0; | ||
} | ||
} | ||
output = input % mask * this->scale; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkFischinger It looks to me as if it is the same implementation when using Armadillo or Bandicoot.
It this is the case, then it is fine, no worries, I would just keep one of them, and remove SFINAE and the other overload since we no longer use the transform function.
Let me know if I overlooked something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a really bad implementation for Bandicoot, but we can handle that some other time. (Elementwise accesses cause GPU/CPU transfers and are painful.)
The updated Forward Dropout Layer implementation shows approximately Here are the benchmarks for the OpenMP Find and Fill on an MNIST Benchmark:
Original Version
|
@MarkFischinger this is still failing can you be sure that this is compiling on your machine ? |
@MarkFischinger can you checkout to the master and pull out the latest version, then you go back to this branch and merge it with the master. It will show you the conflict it has with the master so you can resolve it and push it here again. |
…and dropout_impl.hpp
5167b23
to
5027de1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcurtin nothing to add on my end, let us get this one merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the fixes 👍
Following our discussion in issue #3662, I've implemented the new algorithm in the dropout layer.