Download to PDF: HTTP 500 error on some wikis for some users
Closed, ResolvedPublicBUG REPORT
Actions

Assigned To

Authored By

	Jtneill
	Oct 3 2024, 10:29 PM

Description

"Download to PDF" on en.wv is returning error: "{"name":"HTTPError","message":"500","status":500,"detail":"Internal Server Error"}"

Client-Side workaround ---

As many users have been referred here, a potential workaround some clients may use is:

Install a virtual printer that creates PDF documents on your client
Use your browser Print function
Select the virtual printer from (1) above
Use the virtual printer save function

Details

	Subject	Repo	Branch	Lines /-
	chromium-render: Add cli flag to avoid flooding with crashpad processes	operations/deployment-charts	master	2 -1

Customize query in gerrit

Related Objects

Mentioned Here: T375521: Download as PDF on meta not working - 500 internal service error

Event Timeline

Jtneill created this task.Oct 3 2024, 10:29 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 3 2024, 10:29 PM

I just downloaded https://en.wikiversity.org/w/index.php?title=Wikiversity:Colloquium&direction=prev&oldid=2661311 as a PDF and it worked just fine.

Hmmm - some pages working OK for me too.

What about this one? (I get the error)

https://en.wikiversity.org/wiki/Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour

https://en.wikiversity.org/w/index.php?title=Special:DownloadAsPdf&page=Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour&action=show-download-screen

gives:

{"name":"HTTPError","message":"500","status":500,"detail":"Internal Server Error"}

And I cannot download the PDF.

Koavf's link works for me.

In T376438#10201443, @Pppery wrote:

Koavf's link works for me.

Even when you click on "Download"? The *page* renders properly, but the *PDF* gives the HTTP error.

Yes. Here's the PDF it gave me.

Motivation_and_emotion_Book_2024_Dopamine_and_social_behaviour-2.pdf445 KBDownload

Wow, weird. Thanks.

@Jtneill: Thanks for reporting this. For future reference, please use the bug report form (linked from the top of the task creation page) to create a bug report, and fill in all the sections in the template instead of deleting them, to avoid followup questions for more information and examples. Thanks.

If this is about "Download as PDF" on https://en.wikiversity.org/wiki/Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour , then it works for me...

Aklapper renamed this task from Download to PDF - Error on en.wv to Download to PDF: HTTP 500 error on en.wikiversity for some folks.Oct 4 2024, 6:31 AM

@Aklapper Thanks for testing and clarifying about bug reporting.

To confirm, I tested 3 browsers on a Windows 10 device and each gave a HTTP 500 error when clicking "Download" from https://en.wikiversity.org/w/index.php?title=Special:DownloadAsPdf&page=Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour&action=show-download-screen:

Google Chrome
Mozilla Firefox
Microsoft Edge

However, download pdf works using Chrome on an Android device.

A reader reported similar problem on en.wikipedia in ticket 2024100510006742

"
{"name": "HTTPError", "message":"500", "status":500, "detail":"Internal Server Error"}

I'm using macOS Sequoia 15.0 on my 2019 MacBook Pro. I get the same results using either Google Chrome browser, Safari browser or Arc browser.
"

I was unable to duplicate.

I tried to download the same article PDF on zhwiki through several VPN IP addresses. I found that I could download it from some regions (e.g. Germany) but not from others (e.g. Hong Kong). Even after I switched to using another browser, I still couldn't download the PDF.

I can share which IPs could or couldn't download the PDF privately if anyone needs that information. Perhaps data centers in some regions have the issue?

SCP-2000 renamed this task from Download to PDF: HTTP 500 error on en.wikiversity for some folks to Download to PDF: HTTP 500 error on some wikis for some folks.Oct 8 2024, 8:38 AM

ClaudineChionh subscribed.Oct 11 2024, 7:03 AM

Users are continuing to report this problem:

Screenshot_2024-10-11_104037_Wiki_Unable2Download.png (341×1 px, 16 KB)

Xaosflux renamed this task from Download to PDF: HTTP 500 error on some wikis for some folks to Download to PDF: HTTP 500 error on some wikis for some users.Oct 11 2024, 6:15 PM

I also experience it for https://en.wikiversity.org/api/rest_v1/page/pdf/Motivation_and_emotion/Book/2024/Dopamine_and_social_behaviour

I don't see anything obvious in https://grafana.wikimedia.org/d/U4TuF-lMk/proton?orgId=1

The response headers list x-cache: cp3069 miss, cp3069 miss

Logstash has lots of spawn process errors:

Error: Failed to launch the browser process!
/usr/bin/chromium: 5: /etc/chromium.d/extensions: Cannot fork
/usr/bin/chromium: 121: Cannot fork
/usr/bin/chromium: 123: Cannot fork
[826540:826540:1015/085235.180776:FATAL:spawn_subprocess.cc(237)] posix_spawn /usr/lib/chromium/chrome_crashpad_handler: Resource temporarily unavailable (11)

https://logstash.wikimedia.org/goto/ff20ff6a1109dc85f8a0fcd7cc2d9c0b

Adding content transform too.

This appears to be a rerun of T375521 - temporary fix last time was a roll restart, but there's clearly a deeper issue.

MSantos added a project: Essential-Work.Oct 17 2024, 2:29 PM

MSantos moved this task from Backlog to External Request on the Content-Transform-Team board.

I suspect that the issue is that we don't close or somehow we end up in a sitation with stale browser instances. Given the level of traffic/support of the pdf service would it be enough to just restart the service ?

Users continue to report this problem, such as in otrs ticket 2024101810000205. I was also able to reproduce this issue myself.

In T376438#10238271, @Jgiannelos wrote:

I suspect that the issue is that we don't close or somehow we end up in a sitation with stale browser instances. Given the level of traffic/support of the pdf service would it be enough to just restart the service ?

Works for me - the service can be restarted using the standard roll_restart pattern. We might see this come back soon, as this happened last month also.

Chromium is leaking processes, leaving chromium_crashpads lying around after a failure most likey:

root@wikikube-worker2070:/home/hnowlan# ps uax| grep chrome_crashpad | wc -l
115357

Done, I will keep an eye on the logs.

For what is worth, I also update the dashboard at https://grafana-rw.wikimedia.org/d/U4TuF-lMk/proton?orgId=1 to allow querying both DCs at once as well as selectively and fixed the per container saturation panels that were broken. I also removed the nodejs garbage collection panels as those metrics aren't being emitted anymore (support has been dropped at the service runner level IIRC)

Here are some people with similar experiences. A suggestion was to run with --single-process --disable-crashpad-for-testing (because crashpad disabling does not propagate to subprocesses)

Maybe this is related? https://github.com/GoogleChromeLabs/chrome-for-testing/issues/114

A similar issue was reported today by @Ederporto in #wikimedia-cloud for this url: https://pt.wikiversity.org/api/rest_v1/page/pdf/Wikidata_IOLab

Errors are clearly spiking again

https://grafana.wikimedia.org/d/U4TuF-lMk/proton?orgId=1&from=1730305791822&to=1730910591822

(I’d open a separate ticket, but im on my phone)

TheDJ triaged this task as Unbreak Now! priority.Wed, Nov 6, 4:31 PM

Looks like the same crashpad flood issue again. The service needs a restart, and I think we should implement the flags @TheDJ has mentioned.

Xaosflux updated the task description. (Show Details)Wed, Nov 6, 8:33 PM

Jgiannelos claimed this task.Thu, Nov 7, 11:00 AM

Jgiannelos edited projects, added Content-Transform-Team-WIP; removed Content-Transform-Team.

Jgiannelos moved this task from Backlog to In Progress on the Content-Transform-Team-WIP board.

I run a rolling restart in k8s. Regarding the chromium parameters we already pass the --single-process. I am taking a look at what the other parameter does in detail.

jijiki moved this task from Incoming 🐫 to Doing 😎 on the serviceops board.Thu, Nov 7, 11:39 AM

jijiki moved this task from Doing 😎 to serviceops-radar on the serviceops board.

jijiki edited projects, added serviceops-radar; removed serviceops.

Change #1088271 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/deployment-charts@master] chromium-render: Add cli flag to avoid flooding with crashpad processes

https://gerrit.wikimedia.org/r/1088271

gerritbot added a project: Patch-For-Review.Thu, Nov 7, 11:43 AM

jijiki moved this task from serviceops-radar to Production Errors 🚜 on the serviceops board.Thu, Nov 7, 12:36 PM

jijiki edited projects, added serviceops; removed serviceops-radar.

Jgiannelos moved this task from In Progress to Code Review on the Content-Transform-Team-WIP board.Thu, Nov 7, 5:25 PM

hmm. there is still some failures occurring above the average..

Screenshot 2024-11-08 at 10.14.42.png (536×1 px, 208 KB)

The error rate is quickly increasing again:

Screenshot 2024-11-19 at 14.37.52.png (554×1 px, 207 KB)

Change #1088271 merged by jenkins-bot:

[operations/deployment-charts@master] chromium-render: Add cli flag to avoid flooding with crashpad processes

https://gerrit.wikimedia.org/r/1088271

I got this on enwiki w/ Chromium v126 (Falklands War)

Reloading https://en.wikipedia.org/api/rest_v1/page/pdf/Falklands_War seemed to work.

@ihurbain just deployed the crashpad flag flip patch and (at least for now) Proton looks happier.

Maintenance_bot removed a project: Patch-For-Review.Tue, Nov 19, 2:30 PM

There have not been significant 5xx errors for 7 days now. Calling this fixed until proven otherwise, Thanks everyone !

	F57721570: Screenshot 2024-11-19 at 14.37.52.png
	Tue, Nov 19, 1:38 PM

	F57689130: Screenshot 2024-11-08 at 10.14.42.png
	Fri, Nov 8, 9:15 AM

	F57615845: Screenshot 2024-10-15 at 10.53.45.png
	Oct 15 2024, 8:54 AM

	F57606666: Screenshot_2024-10-11_104037_Wiki_Unable2Download.png
	Oct 11 2024, 6:14 PM

	F57587955: Motivation_and_emotion_Book_2024_Dopamine_and_social_behaviour-2.pdf
	Oct 4 2024, 3:05 AM

	F57721863: image.png
	Tue, Nov 19, 1:58 PM

	F57685996: IMG_2006.png
	Wed, Nov 6, 4:31 PM

Download to PDF: HTTP 500 error on some wikis for some usersClosed, ResolvedPublicBUG REPORTActions

Description

Details

Related Objects

Event Timeline

Download to PDF: HTTP 500 error on some wikis for some users
Closed, ResolvedPublicBUG REPORT
Actions