Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pv cannot be restored correctly with one click #187

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

njuptlzf
Copy link

@njuptlzf njuptlzf commented Sep 26, 2023

Pull Request template

Why is this PR required? What issue does it fix?:
fix: #186
premise:

  1. openebs ready
  2. App namespace=nginx-example exists

testing scenarios:

  1. Backup
  2. Completely remove namespace=nginx-example
  3. Recovery

Error reporting related:

  1. velero-plugin injects new volume name,
time="2023-09-23T08:58:06Z" level=info msg="Creating PVC for snapshot:nginx-backup-with-pv-9 in namespace=nginx-example" cmd=/plugins/velero-blockstore-openebs logSource="/home/go/src/github.com/openebs/velero-plugin/pkg/cstor/pvc_operation.go:131" pluginName=velero-blockstore-openebs restore=velero/nginx-backup-with-pv-9-20230923085805
time="2023-09-23T08:58:11Z" level=info msg="PVC(nginx-test-logs) created.." cmd=/plugins/velero-blockstore-openebs logSource="/home/go/src/github.com/openebs/velero-plugin/pkg/cstor/pvc_operation.go:161" pluginName=velero-blockstore-openebs restore=velero/nginx-backup-with-pv-9-20230923085805
time="2023-09-23T08:58:11Z" level=info msg="Generated PV name is pvc-cb4a1a52-c9e5-4f30-b9b2-07c06e01cfc4" cmd=/plugins/velero-blockstore-openebs logSource="/home/go/src/github.com/openebs/velero-plugin/pkg/cstor/pv_operation.go:209" pluginName=velero-blockstore-openebs restore=velero/nginx-backup-with-pv-9-20230923085805
  1. Velero requires restoring old pv name,
time="2023-09-23T08:58:16Z" level=info msg="Attempting to restore PersistentVolume: pvc-9bbc68c4-c2d5-4ae6-97ee-e8bfaf296a44" logSource="pkg/restore/restore.go:1337" restore=velero/nginx-backup-with-pv-9-20230923085805
time="2023-09-23T08:58:16Z" level=error msg="Error retrieving in-cluster version of pvc-cb4a1a52-c9e5-4f30-b9b2-07c06e01cfc4: persistentvolumes \"pvc-9bbc68c4-c2d5-4ae6-97ee-e8bfaf296a44\" not found" logSource="pkg/restore/restore.go:1360" restore=velero/nginx-backup-with-pv-9-20230923085805

old pvname(pvc-9bbc68c4-c2d5-4ae6-97ee-e8bfaf296a44), new pv name(pvc-cb4a1a52-c9e5-4f30-b9b2-07c06e01cfc4)

I think there are two ways to fix it:

  1. velero
    Use new pv name
  2. velero-plugin
    Maintenance of velero-plugin, backup and recovery of pvc, pv, cvc.

Modifying velero-plugin seems to break velero compatibility less, so submit this PR

What this PR does?:

Does this PR require any upgrade changes?:
Configure openebs namespace and velero nodename, such as pod

        env:
        - name: OPENEBS_NAMESPACE
          value: openebs
        - name: VELERO_NODE_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName

If the changes in this PR are manually verified, list down the scenarios covered and commands you used for testing with logs:

Any additional information for your reviewer?:
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

@njuptlzf
Copy link
Author

@abhilashshetty04
Could you please assign relevant personnel to review this PR? Thanks. :)

@njuptlzf
Copy link
Author

njuptlzf commented Sep 26, 2023

test

create data

#  kubectl exec -n nginx-example $(kubectl get po -n nginx-example -l app=nginx -o name |head -n 1) -it bash
Defaulting container name to nginx.
Use 'kubectl describe pod/nginx-deployment-7dfcf8d8b5-sns9g -n nginx-example' to see all of the containers in this pod.
root@nginx-deployment-7dfcf8d8b5-sns9g:/# ls /var/log/nginx
access.log  error.log  lost found
root@nginx-deployment-7dfcf8d8b5-sns9g:/# touch /var/log/nginx/1.log 
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/1.log
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/access.log
root@nginx-deployment-7dfcf8d8b5-sns9g:/# echo `date` >> /var/log/nginx/access.log
root@nginx-deployment-7dfcf8d8b5-sns9g:/# echo `date` >> /var/log/nginx/1.log     
root@nginx-deployment-7dfcf8d8b5-sns9g:/# 
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/1.log
Tue Sep 26 11:03:46 UTC 2023
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/access.log
Tue Sep 26 11:03:42 UTC 2023
root@nginx-deployment-7dfcf8d8b5-sns9g:/# exit

backup

# kubectl exec -n velero $(kubectl get po -n velero -l component=velero -oname | head -n 1)  -it -- /velero backup create nginx-backup-with-pv-0926 --include-namespaces nginx-example --csi-snapshot-timeout=20m
Backup request "nginx-backup-with-pv-0926" submitted successfully.
Run `velero backup describe nginx-backup-with-pv-0926` or `velero backup logs nginx-backup-with-pv-0926` for more details.
# kubectl exec -n velero $(kubectl get po -n velero -l component=velero -oname | head -n 1)  -it -- /velero backup get nginx-backup-with-pv-0926
NAME                        STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION      SELECTOR
nginx-backup-with-pv-0926   Completed   0        2          2023-09-26 12:30:16  0000 UTC   29d       default-local-minio   <none>

create disaster

  1. openebs ready
  2. app notready
# kubectl get pvc -n nginx-example
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
my-pvc-04   Bound    pvc-0890162d-d7b8-459e-b7d6-34ee1513930d   1Gi        RWO            cstor-csi-disk-immediate   13m
# kubectl get pv pvc-0890162d-d7b8-459e-b7d6-34ee1513930d
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS               REASON   AGE
pvc-0890162d-d7b8-459e-b7d6-34ee1513930d   1Gi        RWO            Delete           Bound    nginx-example/my-pvc-04   cstor-csi-disk-immediate            13m
# kubectl delete ns nginx-example
namespace "nginx-example" deleted
# kubectl get pv pvc-0890162d-d7b8-459e-b7d6-34ee1513930d
Error from server (NotFound): persistentvolumes "pvc-0890162d-d7b8-459e-b7d6-34ee1513930d" not found

restore

# kubectl exec -n velero $(kubectl get po -n velero -l component=velero -oname | head -n 1)  -it -- /velero restore create --restore-volumes=true --from-backup nginx-backup-with-pv-0926
Restore request "nginx-backup-with-pv-0926-20230926123224" submitted successfully.
Run `velero restore describe nginx-backup-with-pv-0926-20230926123224` or `velero restore logs nginx-backup-with-pv-0926-20230926123224` for more details.
# kubectl exec -n velero $(kubectl get po -n velero -l component=velero -oname | head -n 1)  -it -- /velero restore get nginx-backup-with-pv-0926-20230926123224
NAME                                       BACKUP                      STATUS      STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
nginx-backup-with-pv-0926-20230926123224   nginx-backup-with-pv-0926   Completed   2023-09-26 12:32:24  0000 UTC   2023-09-26 12:32:35  0000 UTC   0        3          2023-09-26 12:32:24  0000 UTC   <none>
# kubectl get pvc -n nginx-example
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
my-pvc-04   Bound    pvc-0890162d-d7b8-459e-b7d6-34ee1513930d   1Gi        RWO            cstor-csi-disk-immediate   2m44s
# kubectl get pv pvc-0890162d-d7b8-459e-b7d6-34ee1513930d
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS               REASON   AGE
pvc-0890162d-d7b8-459e-b7d6-34ee1513930d   1Gi        RWO            Delete           Bound    nginx-example/my-pvc-04   cstor-csi-disk-immediate            2m49s
# kubectl get po -n nginx-example
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-7dfcf8d8b5-sns9g   2/2     Running   0          2m44s
# kubectl exec -n nginx-example nginx-deployment-7dfcf8d8b5-sns9g -it bash
Defaulting container name to nginx.
Use 'kubectl describe pod/nginx-deployment-7dfcf8d8b5-sns9g -n nginx-example' to see all of the containers in this pod.
root@nginx-deployment-7dfcf8d8b5-sns9g:/# ls /var/log/nginx
1.log  access.log  error.log  lost found
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/1.log
Tue Sep 26 11:03:46 UTC 2023
root@nginx-deployment-7dfcf8d8b5-sns9g:/# cat /var/log/nginx/access.log
Tue Sep 26 11:03:42 UTC 2023
root@nginx-deployment-7dfcf8d8b5-sns9g:/# exit
exit

@trunet
Copy link

trunet commented Oct 3, 2023

I'm trying to give it a shot with this as I indeed can't restore pv(s)/pvc(s) here.

I'm getting this error when trying to restore from a previous version:

time="2023-10-03T23:30:16Z" level=error msg="Failed to read data from file{backups/k8s-20231001030525/-pvc-f99f8e44-0101-4f41-b56a-db958685f7af-k8s-20231001030525.pv} : blob (code=NotFound): NoSuchKey: The resource you requested does not exist\n\tstatus code: 404, request id: , host id: " cmd=/plugins/velero-blockstore-openebs logSource="/home/trunet/src/velero-plugin/pkg/clouduploader/operation.go:142" pluginName=velero-blockstore-openebs restore=velero/k8s-20231001030525-20231004012958
time="2023-10-03T23:30:16Z" level=error msg="CreatePVC returned error=failed to restore PV=: failed to download pv: failed to download PV file=backups/k8s-20231001030525/-pvc-f99f8e44-0101-4f41-b56a-db958685f7af-k8s-20231001030525.pv" cmd=/plugins/velero-blockstore-openebs logSource="/home/trunet/src/velero-plugin/pkg/cstor/pv_operation.go:207" pluginName=velero-blockstore-openebs restore=velero/k8s-20231001030525-20231004012958

To reinforce, this backup was created with the current 3.5.0 version. I'm trying to restore it with the code from this branch PR.

I believe it'll only works when you make the backup with this PR codebase as you're also backing up the PV and the CStorVolumeConfig (that's why you need the openebs environment variable).

I think this approach is unpractical as would break all previous backups. I'll leave it running and check tomorrow if I can restore a new backup with it.

@njuptlzf
Copy link
Author

njuptlzf commented Oct 5, 2023

I'm trying to give it a shot with this as I indeed can't restore pv(s)/pvc(s) here.

I'm getting this error when trying to restore from a previous version:

time="2023-10-03T23:30:16Z" level=error msg="Failed to read data from file{backups/k8s-20231001030525/-pvc-f99f8e44-0101-4f41-b56a-db958685f7af-k8s-20231001030525.pv} : blob (code=NotFound): NoSuchKey: The resource you requested does not exist\n\tstatus code: 404, request id: , host id: " cmd=/plugins/velero-blockstore-openebs logSource="/home/trunet/src/velero-plugin/pkg/clouduploader/operation.go:142" pluginName=velero-blockstore-openebs restore=velero/k8s-20231001030525-20231004012958
time="2023-10-03T23:30:16Z" level=error msg="CreatePVC returned error=failed to restore PV=: failed to download pv: failed to download PV file=backups/k8s-20231001030525/-pvc-f99f8e44-0101-4f41-b56a-db958685f7af-k8s-20231001030525.pv" cmd=/plugins/velero-blockstore-openebs logSource="/home/trunet/src/velero-plugin/pkg/cstor/pv_operation.go:207" pluginName=velero-blockstore-openebs restore=velero/k8s-20231001030525-20231004012958

To reinforce, this backup was created with the current 3.5.0 version. I'm trying to restore it with the code from this branch PR.

I believe it'll only works when you make the backup with this PR codebase as you're also backing up the PV and the CStorVolumeConfig (that's why you need the openebs environment variable).

I think this approach is unpractical as would break all previous backups. I'll leave it running and check tomorrow if I can restore a new backup with it.

As described in this PR, when we backed up the volume snapshot and tried to completely delete the entire namespace and perform one-click recovery, it did not work properly due to compatibility issues with velero and velero-plugin;

This PR is my fix for the one-click recovery failure problem. (do not change velero)

When we merge this pr, it does have to be backed up again.

If it is not recommended to merge this pr, we can try this non-one-click recovery solution:
Make two backups A and B:

A back up volume snapshot only
B back up all contents of namespace except volume snapshots.
A, B are restored one by one.

I haven't verified it...

@njuptlzf njuptlzf changed the title fix: pv cannot be restored correctly fix: pv cannot be restored correctly with one click Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

velero-plgun 3.5.0 failed to restore from minio: PVC{nginx-example/nginx-logs} is not bounded!
2 participants