Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact scan stuck in queue status unable to re-scan #20760

Closed
atchadwick opened this issue Jul 22, 2024 · 8 comments
Closed

artifact scan stuck in queue status unable to re-scan #20760

atchadwick opened this issue Jul 22, 2024 · 8 comments

Comments

@atchadwick
Copy link

Hi,

I'm using Harbor 2.11 and are some artifacts are stuck in a queue state for vuln scan by Trivy for a few days, when trying to stop the scan it comes up with the error no scan job for artifact. I have checked the job workers and there are no queued scans currently.

Due to this issue I am unable to stop the scan so that I can issue another one. Is it possible to force a stop somewhere, ideally if it can't find the scan job for the artifact it would reset the queue status so that it can be scanned again.

@chlins
Copy link
Member

chlins commented Jul 24, 2024

Maybe related with #19486.

@atchadwick
Copy link
Author

atchadwick commented Jul 24, 2024

The artifacts that are stuck even when deleted and a GC is ran when trying to upload the same artifact with a different tag get the output of:

624033d3e11d: Mounted from app12378/wiremock
c6e36f73d00a: Mounted from app12378/wiremock
501689fa7b84: Mounted from app12378/wiremock
37c89cb71f6f: Mounted from app12378/wiremock
0e0c99500308: Mounted from app12378/wiremock
08357c8b2220: Mounted from app12378/wiremock
f4462d5b2da2: Mounted from app12378/wiremock

and the scan is still stuck in queued state

Could it be possible the the layer its trying to mount is stuck in a scan and therefore blocking this one also?

@zyyw
Copy link
Contributor

zyyw commented Jul 29, 2024

Sorry, I can't fully understand your problem. Could you please elaborate it in more details?

Could you please share the error message when you are unable to stop scan it, including trivy.log and core.log?

@atchadwick
Copy link
Author

May be worth ignoring my last comment I think it was a knock on effect from the images stuck in queued, deleting the artifact stuck in queued, running a GC and uploading again resolved the upload issue for that image.

For the error message pressing the stop button when it was stuck in queued

image

I'm running on the v1.15.0 helm chart in a k8 cluster, when I trigger the stuck artifact scanjob to stop and cause the error there is nothing in the trivy/core logs

We had an issue were the bandwidth was slow and we had around 300 images queued for scanning, in that time trivy and job service pods restarted which is using ephemeral storage. Could this have caused the scanning issue stuck in queue? Is it possible that a force reset of a scan can be achieved even if it was just an API call.

@atchadwick
Copy link
Author

Just an update to this, I could see an error in the logs for the sweep

2024-08-07T13:00:03Z [INFO] [/pkg/task/sweep_job.go:150]: [IMAGE_SCAN] start to sweep, retain latest 1 executions
2024-08-07T13:16:32Z [INFO] [/pkg/task/sweep_job.go:160]: [IMAGE_SCAN] listed 921 candidate executions for sweep
| 2024-08-07T13:16:33Z [ERROR] [/pkg/task/sweep_job.go:110]: [IMAGE_SCAN] failed to run sweep, error: failed to delete executions: ERROR: update or delete on table "execution" violates foreign key constraint "task_execution_id_fkey" on table "task" (SQLSTATE 23503)

I have gone back to the image in my last comment and I can see it is now in an error state which allows me to run a scan again which has completed.

I poked around in the DB and I can see I have 1700 in mostly Error and a few Running

SELECT * FROM task WHERE execution_id IN (SELECT id FROM execution WHERE vendor_type = 'IMAGE_SCAN' AND status IN ('Running', 'Error', 'Pending'));

(1713 rows)

I have 21 scans stuck in Running which in the GUI shows as queued and these are not actually running in the job queue, looking at them in the gui I get the same as the screenshot above.

Output of a running example in the DB:

24096 | IMAGE_SCAN | 15688 | Running | | MANUAL | {"artifact":{"digest":"sha256:f6e30135a203881a0038f704aab515a664d9a9e786bc620f4918c9d0fb63f","id":15688,"project_id":273,"repository_name":"app06/martini/test-experience/test-experience-dr"},"operator":"robot app06 ab
","registration":{"id":1,"name":"Trivy"}} | 2024-07-18 09:08:28.902456 | | 5 | 2024-07-18 16:35:40

Is it okay/possible to set the Running as Error?

Do you have any recommendations on cleaning this up please?

@snoop2048
Copy link

facing the exact same issue here.

Copy link

github-actions bot commented Oct 7, 2024

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Oct 7, 2024
Copy link

github-actions bot commented Nov 7, 2024

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants