Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF/A docs #35

Merged
merged 5 commits into from
Sep 4, 2024
Merged

PDF/A docs #35

merged 5 commits into from
Sep 4, 2024

Conversation

reknih
Copy link
Member

@reknih reknih commented Jul 12, 2024

This PR is about adding PDF/A-oriented features to pdf-writer:

  • It adds comments about PDF/A requirements for attributes.
  • It adds compliance checks that the consumer can opt in to for some types
  • It adds the WMode to the CMap and CMap dictionary writers to support a cautious interpretation of clause 6.2.11.3.3 in ISO 19005-2:2011.

Because of the last change, the other attributes in table 124 of the PDF 1.7 spec were also added.

The return type of Catalog::output_intents has been changed from TypedArray<'_, Dict> to TypedArray<'_, OutputIntent> which is a breaking change.

Laurenz and I discussed whether returning results is the correct way to proceed here, as very few of the PdfaErrors are actionable. For example, a consumer cannot fix a DeviceN color space with more than 8 colorants and will just unwrap. We discussed whether collecting the compliance in the Chunk would be a better idea, however, this would be difficult because Content and UnicodeCmap are stand-alone structs with no Chunk reference. Furthermore, are consumers free to opt out from running these checks? Given these open questions, I have marked this PR as a draft and invite comments.

- CIDFont `FontDescriptor` attributes
- Horizontal / Vertical writing mode in CMap and CMap Dictionary
- Font descriptor overrides for character classes
@reknih reknih requested a review from laurmaedje July 12, 2024 15:24
@reknih
Copy link
Member Author

reknih commented Jul 12, 2024

cc @LaurenzV

@LaurenzV
Copy link
Contributor

LaurenzV commented Jul 12, 2024

Disclaimer: Apart from the Typst issue where you list some of the points needed to get the compliance, I'm not very familiar with them.

I personally feel like this crate is at the wrong level of abstraction to implement PDF/A compliance checks. pdf-writer operates at a very low-level, and all it should do is to provide all the necessary APIs to write PDFs (including what is necessary for certain compliance levels). Apart from some very basic validity checks that already exists (like duplicate references), I don't think it makes sense to implement validation at this level. For one thing because it could pollute the API for consumers that don't need it, but also because I don't think that all of the checks can happen in this crate, although as mentioned I don't know the specifics of the PDF/A spec (for example, iirc there is a clause that the .notdef glyph must not appear, I don't think it's sensible to check in pdf-writer whether a glyph actually exists in a font, as this would require parsing the font here). And having some checks being done in pdf-writer while others need to be done by the upstream consumer doesn't feel like a very clean separation and could lead to confusion.

So I think that for now, those checks should to be performed in Typst, and in the future all of this will hopefully be moved into a more higher-level PDF-drawing crate. But just my two cents!

Regarding q-nesting level, is it really as easy as checking the q's in a single content stream? I would have thought that the purpose of such a restriction is to prevent stack-explosion, but if the q-nesting level is just counted separately in each content stream, one could invoke an XObject instead and then basically reset the q-nesting level. Am I missing something here? As mentioned, unfortunately I don't have access to the spec.

@reknih
Copy link
Member Author

reknih commented Jul 12, 2024

I definitely can see your points!

Clause 6.1.13 of the PDF/A-2 standard just says "A conforming file shall not nest q/Q pairs by more than 28 nesting levels."
For the origin, of these requirements, it states that "this requirement makes normative a recommendation from ISO 32000-1:2008, C.2."

If we have a look at that table in the PDF 1.7 spec, it has this to say on q/Q-nesting:

PDF 1.7 Spec Appendix C, table 1 row on q/Q nesting

So I don't think referenced XObjects count, although I am not completely sure. I tried to check with the PDF/A-4 spec for clarification but it dropped the whole implementation limits clause. The PDF 2.0 spec also just states in its annex C that previous versions were limited to nesting depth 28 but this one is not so...

@NiklasEi
Copy link
Sponsor Contributor

Most of these changes seem very valuable. I especially like all the new notes regarding PDF/A compliance.

But I agree with @LaurenzV that the compliance checks seem out of place here. In my ("pdf-unexperianced") opinion, the checks should be separated from the other additions and added documentation. They could be discussed further, while the rest of this PR, in my opinion, could be merged as is.

Note though, that I am very new to PDF/A (and to be honest, to PDF in general).

@laurmaedje laurmaedje changed the title PDF/A PDF/A docs Sep 4, 2024
@laurmaedje laurmaedje marked this pull request as ready for review September 4, 2024 14:51
@laurmaedje laurmaedje merged commit e931e5d into main Sep 4, 2024
4 checks passed
@laurmaedje laurmaedje deleted the pdf-a branch September 4, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants