Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle network instability during file upload in the Editor #2134

Open
Tracked by #2245
vchendrix opened this issue May 17, 2023 · 11 comments
Open
Tracked by #2245

Handle network instability during file upload in the Editor #2134

vchendrix opened this issue May 17, 2023 · 11 comments
Assignees
Labels
bug editor ESS-DIVE Issues associated with the ESS-DIVE project
Milestone

Comments

@vchendrix
Copy link
Collaborator

Describe the bug
There seems to be an issue with data file uploads where the upload fails consistently for files >= 150mb. I tried this out on dev.nceas.ucsb.edu.

To Reproduce
Steps to reproduce the behavior:

  1. Go to dev.nceas.ucsb.edu
  2. Click on Submit
  3. Upload a 200mb file from a connection upload speed of ~40 Mbps using WIFI (with and without VPN)
  4. See error
  5. Upload a couple of smaller files (they should succeed)
  6. Save dataset

Your permissions should be lost and the user is no longer able to edit the dataset.

Expected behavior
Should be able to upload files of several GB files with no problems over a strong wifi connection. I have not had any problems uploading files over wifi using other web applications. I do wireless backups to the cloud, upload to google drive and submit data via curl to metacat with no problems. All using the same wireless connection.

Screenshots
On NCEAS dev ui
Screenshot 2023-05-17 at 12 09 51 PM

On nceas dev during upload
Screenshot 2023-05-17 at 1 56 14 PM

On nceas after save
I was able to upload two 100GB files but the 200GB file failed
Screenshot 2023-05-17 at 12 27 34 PM

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: Chrome
  • Version: Version 113.0.5672.92 (Official Build) (x86_64)

Additional context
I plugged my laptop directly into the router and had no problems uploading files. I was able to upload up to 1GB size of files (See https://dev.nceas.ucsb.edu/view/urn:uuid:052ac749-d225-40f4-a72a-6f2d5c3968ff).

So, this might be an issue in how metacatui handles uploading from flaky wifi connections. We are seeing an increasing number of users having this problem on ESS-DIVE.

I think there at least tow major issues:

  1. Not handling upload errors gracefully (missing resource map, loss of permissions)
  2. Flaky upload support for strong wifi connections
@vchendrix vchendrix added bug ESS-DIVE Issues associated with the ESS-DIVE project labels May 17, 2023
@mbjones
Copy link
Member

mbjones commented May 17, 2023

I just tried this on dev.nceas.ucsb,.edu, first with a single 1Gb file, and then with two 1Gb files. They all uploaded fine, no errors from my home wifi with a cable modem connection rated at 300Mbps down/10 Mbps up. I did this on Mac with FF 112.0.2. Could be browser differences. I think figuring out the problem is going to take some sleuthing. @vchendrix are there any errors or warnings in the browser javascript console that give additional info we could explore?

image image

@vchendrix
Copy link
Collaborator Author

vchendrix commented May 17, 2023

Using Safari (MacOS 13.2.1 )

Screenshot 2023-05-17 at 2 44 37 PM Screenshot 2023-05-17 at 2 45 19 PM

@vchendrix
Copy link
Collaborator Author

Firefox. Not really any interesting messages

Screenshot 2023-05-17 at 2 54 51 PM

@mbjones
Copy link
Member

mbjones commented May 17, 2023

All of those console log errors indicate connection lost or connect reset types of messages. I think tracking down why the connection was reset will be critical to figuring out the cause. The app failure is a side effect of the connection reset.

@vchendrix
Copy link
Collaborator Author

All of those console log errors indicate connection lost or connect reset types of messages. I think tracking down why the connection was reset will be critical to figuring out the cause. The app failure is a side effect of the connection reset.

Agreed. I am trying to see if there is something in my OS network settings. One thing to note is that my WIFI is the only problem. A wired connecion on my router works fine.

@robyngit
Copy link
Member

@vchendrix Did you determine if this issue and issue #2135 are local network or VPN problems? It looks like MetacatUI gave an appropriate error message. How would you envision the Metadata Editor handling these issues more gracefully?

@mbjones
Copy link
Member

mbjones commented May 25, 2023

@robyngit We discussed this on the ess dive meeting, let"s chat tomorrow on our dev call on issues to consider.

@robyngit
Copy link
Member

robyngit commented May 25, 2023

During today"s dev call, Matt provided more details on this issue, including ways in which we could handle network instability better.

In order to lessen the likelihood of getting web connection error in the first place, a longer-term solution could involve adopting newer technologies, like as HTTP/2 (allows multiple data transfers to occur simultaneously on a single connection) or web sockets (allows a more reliable connection between the user and the server). However, these improvements would require changes to both MetacatUI and our server-side systems.

At a minimum, we should improve MetacatUI"s ability to handle interrupted file transfers more gracefully. The objective for this issue is to retain the state of the user"s data package in the event of a network interruption during upload. Here"s what that might look like:

  • Even if the upload fails, a user should still be able to save the metadata entered thus far, along with any files that have been successfully uploaded.
  • Failed uploads should be removed from the resource map, and the resource map should not get lost or become corrupted/invalid. Permissions should not be changed.
  • The user should have the option to retry failed uploads. (or perhaps MetacatUI should automatically retry failed uploads?)

@robyngit robyngit changed the title Data file upload failure on 200mb size files results in permissions loss and missing resource map Handle network instability during file upload in the Editor May 25, 2023
@robyngit robyngit added this to the 2.27.0 milestone May 25, 2023
@mbjones
Copy link
Member

mbjones commented Jun 6, 2023

@robyngit your proposed bulleted list seems like a good first pass to prevent data package corruption. I think we can pass on the longer-term solution, and simply work from your list of error-recovery fixes that would keep package editing working correctly. So, I repeat them here as a list to check off as we work through them. I think the first and second would need to be implemented together. I think the third item should be fine as is because the user can always add new files to a dataset -- but we should test that it works properly.

  • Even if the upload fails, a user should still be able to save the metadata entered thus far, along with the resource map entries for any files that have been successfully uploaded.
  • Failed uploads should be removed from the resource map, and the resource map should not get lost or become corrupted/invalid. Permissions should not be changed.
  • TEST: The user should have the option to retry failed uploads. (or perhaps MetacatUI should automatically retry failed uploads?)

Could you work on prioritizing this in your bug fixes please? We can discuss whether this might be a good item for @rushirajnenuji when he is done his folder work.

@vchendrix
Copy link
Collaborator Author

@rushirajnenuji @robyngit Looking at the Apache logs, I see the following for one of the files in error

2024-01-31 08:57:55.774 PST [2024-01-31 16:57:55.774713] [R:YXyYx3iJ8NI] Request 0 on C:1rzEx3iJ8NI pid:15 tid:134847804729024
2024-01-31 08:57:55.774 PST [2024-01-31 16:57:55.774721] [R:YXyYx3iJ8NI] UA:"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
2024-01-31 08:57:55.774 PST [2024-01-31 16:57:55.774726] [R:YXyYx3iJ8NI] Referer:"https://portal-demo.wfsi-data.org/submit/wfsi 20240131T143232469-15b3e9a99ea5230"
2024-01-31 08:57:55.775 PST [2024-01-31 16:57:55.774729] [proxy_ajp:error] [R:YXyYx3iJ8NI] (70008)Partial results are valid but processing is incomplete: AH02822: dialog with client 35.191.26.153:57200 failed
2024-01-31 08:57:55.776 PST 35.191.26.153 - - [31/Jan/2024:16:57:52 +0000] [R:YXyYx3iJ8NI] "POST /catalog/d1/mn/v2/object/ HTTP/1.1" 400 226 - "https://portal-demo.wfsi-data.org/submit/wfsi-20240131T143232469-15b3e9a99ea5230" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

@robyngit robyngit modified the milestones: 2.28.0, 2.29.0 Feb 8, 2024
@robyngit robyngit modified the milestones: 2.29.0, 2.30.0 Apr 18, 2024
@robyngit robyngit removed this from the 2.30.0 milestone Jun 20, 2024
@robyngit robyngit added this to the 2.31.0 milestone Jun 20, 2024
@robyngit
Copy link
Member

From Slack:

We got three datasets today where PIs made updates through the website and the data files were removed. One of them got stuck on the spinning "Submitting..." button, and one knows that she had an internet connectivity issue while trying to submit. Justin and I haven"t been able to recreate these types of problems so we don"t really have more details, but it does seems to be happening at an increasing rate
It also seems more common for data packages with lots of data files. But maybe that"s just because there"s a higher likelihood that they"ll run into connectivity issues since they take longer to upload

@robyngit robyngit modified the milestones: 2.31.0, 2.32.0 Sep 9, 2024
@robyngit robyngit self-assigned this Sep 9, 2024
@robyngit robyngit moved this from Todo to In Progress in MetacatUI submission & error handling Sep 9, 2024
@robyngit robyngit modified the milestones: 2.32.0, 2.33.0 Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug editor ESS-DIVE Issues associated with the ESS-DIVE project
Projects
Development

No branches or pull requests

4 participants