-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak and crash, now and 2 years ago, tfjs-node #8326
Comments
Hi, @borodadada I apologize for the delay in my response and thank you for bringing this issue to our attention and as far I know to avoid memory leak you'll have to use tf.tidy which executes the provided function fn and after it is executed, cleans up all intermediate tensors allocated by Using this method helps avoid memory leaks. In general, wrap calls to operations in tf.tidy() for automatic memory cleanup. NOTE: Variables do not get cleaned up when inside a You can also use tf.memory which returns memory info at the current time in the program. Could you please give it try after adding tf.tidy and tf.dispose in your code and see memory leak is happening or not ? If I have missed something here please let me know. Thank you for your cooperation and patience. |
I don’t understand the principle of a memory leak, all the algorithm needs is to change the coefficients and then feedback and comparison of the result, it should work endlessly without a memory leak. |
You have to dispose the xs and ys after the fit step. Otherwise tfjs creates new tensors in every loop step. They fill up the memory with each step if you don't dispose them. |
using my example, can you show how it should be? |
it should be like this
This way the tensors get unusable and are freed by tfjs. By the way that doesn't mean that there isn't any other memory leak. I stumbled across your comment because I'm also hunting a memory issue. But disposing unused tensors will at least wipe out one possible reason. |
I am writing to you about this, that it has not reached this point
this part of the code doesn't work because it should fire after the fit function
if you have the opportunity to run the code, you will see everything for yourself Now I’ll run the test with your amendments, I’ll write a little later |
Oh sorry, my fault. I didn't recognize the amount of epochs. Then you're right and it looks like some internal problem. I'll try it out and see what happens. |
Thank you, I'll wait for the result, this problem is bothering me a lot |
The problem has not disappeared anywhere, 2 years ago and now the same thing. Visually, through the task manager in the past there was an information leak, and the process was constantly increasing in memory, now this is not the case, everything is fine, but the result is the same, after a while everything crashes. I managed to take screenshots at the moment when it all started. The screenshots show my working environment, NOT the TEST that I posted, the test itself is as close and simplified as possible, in the future I will post information on this test.
I'm not doing anything, just taking screenshots.
There are 4 identical programs running on the computer, 4 copies, one of them begins to fail. This happens when the number of epochs is measured in millions. For test you can run only one copy. On a modern processor, the procedure usually takes 4-6 hours, on an old one more than a day.
Memory leak starts, this starts happening quickly, as can be seen in the screenshot
Process, note 3 other processes, usually from 100 to 200 megabytes in size
Full memory
After
All node js have closed
Logs, there is nothing in them, they are empty, the editor is open
For test simple code, just copy past
TEST CODE
System information
Okey, this is results from test code:
modern PC intel 13700, crash after 4.4 millions epochs
old PC intel 3770, crash after 4.4 millions epochs - windows 10 x64 nodejs 20.10.0
I can't do the calculations because the program always crashes, and I need many more epochs than here!!! I really hope you fix this, it's a disaster that this bug hasn't been fixed for years!
The text was updated successfully, but these errors were encountered: