-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync
command copies object everytime
#676
Labels
Comments
same |
Same with 2.2.2 against MinIO even with --size-only:
But every time I sync:
|
ilkinulas
pushed a commit
that referenced
this issue
Jul 12, 2024
The problem is mainly caused by the `compareObjects` function inside `command/sync.go`, where `s5cmd` compares source and destination paths and extracts files that are present only in the source or destination path (while also counting nested folders or rather name with its prefixes) along with common objects. If they both are non-objects, like wildcard expression, prefix, or bucket, getting relative paths of files with `src.URL.Relative()` results in compatible and comparable paths. In this case, no problem is present, at least within the scope of this issue. However, when an object is selected as the source, it is not assigned a relative path using `func (u *url.URL) SetRelative(base *url.URL)`, so the `src.URL.Relative()` function returns its absolute path. Let's say the source file has an absolute path of `folder/foo.txt`. The algorithm compares `folder/foo.txt` with `s3://bucket/remoteFolder/` and looks for the item `s3://bucket/remoteFolder/folder/foo.txt`. If it does not match, except for the edge case where there is a duplicate item inside the searched path, the files never match. While copying files, `s5cmd` does not use relative paths, so `foo.txt` is written to the intended path in the remote. However, this happens during every sync operation, as they do not match. Problem solved by taking path of source object as its name. This made algorithm to simply look for matches in destination, a file named `foo.txt` as intended. This PR adds new test cases to the sync command. Previously, tests failed to capture sync command cases where the source is an object in a prefix, not an object directly in a bucket, or not multiple objects like a wildcard or prefix expression. If an object is in the `s3://bucket/` path, its relative path is the same as its absolute path, so they match during comparison. This prevented copying the file every time. The new test cases cover all scenarios. Resolves: #676.
tarikozyurtt
added a commit
to tarikozyurtt/s5cmd
that referenced
this issue
Jul 12, 2024
The problem is mainly caused by the `compareObjects` function inside `command/sync.go`, where `s5cmd` compares source and destination paths and extracts files that are present only in the source or destination path (while also counting nested folders or rather name with its prefixes) along with common objects. If they both are non-objects, like wildcard expression, prefix, or bucket, getting relative paths of files with `src.URL.Relative()` results in compatible and comparable paths. In this case, no problem is present, at least within the scope of this issue. However, when an object is selected as the source, it is not assigned a relative path using `func (u *url.URL) SetRelative(base *url.URL)`, so the `src.URL.Relative()` function returns its absolute path. Let's say the source file has an absolute path of `folder/foo.txt`. The algorithm compares `folder/foo.txt` with `s3://bucket/remoteFolder/` and looks for the item `s3://bucket/remoteFolder/folder/foo.txt`. If it does not match, except for the edge case where there is a duplicate item inside the searched path, the files never match. While copying files, `s5cmd` does not use relative paths, so `foo.txt` is written to the intended path in the remote. However, this happens during every sync operation, as they do not match. Problem solved by taking path of source object as its name. This made algorithm to simply look for matches in destination, a file named `foo.txt` as intended. This PR adds new test cases to the sync command. Previously, tests failed to capture sync command cases where the source is an object in a prefix, not an object directly in a bucket, or not multiple objects like a wildcard or prefix expression. If an object is in the `s3://bucket/` path, its relative path is the same as its absolute path, so they match during comparison. This prevented copying the file every time. The new test cases cover all scenarios. Resolves: peak#676.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Native
aws s3 sync
command copies the file from S3 bucket to local only once (as expected):aws s3 sync s3://data/testfile .
But
s5cmd sync
copies the file everytime and overwrites:s5cmd sync s3://data/testfile .
The text was updated successfully, but these errors were encountered: