Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to redact text #4511

Open
lazyeights opened this issue Jul 6, 2024 · 7 comments
Open

A way to redact text #4511

lazyeights opened this issue Jul 6, 2024 · 7 comments
Labels
feature request New feature or request text Text layout, shaping, internationalization, etc.

Comments

@lazyeights
Copy link

Description

In some documents, especially legal documents, it is important to redact information with the content removed from the output pipeline so it can not be recovered. At the same time, it is important to provide a visual indication of the redacted information, and to do so in various formats. The most basic redaction is a black line that starts within a text paragraph to the end of the line. There is no indication of word length within the redacted text.

Some prior solutions tried to use the highlight function, but it does not hide the content (and does not work well with hide presently) and also does not hide the size of the redacted words, like here:

CleanShot 2024-07-05 at 21 24 48@2x

Use Case

The hide function seems to do exactly what is required, but its content is literally hidden and does not provide a visual indication of a deliberate redaction. It would be important for legal text if hide includes a fill argument that is customizable to provide this. #hide(fill: black, content) is the minimum needed, and should be easy to implement.

@lazyeights lazyeights added the feature request New feature or request label Jul 6, 2024
@lazyeights
Copy link
Author

It would probably make sense to also include the other standard highlight arguments like stroke, etc. For example, a draft document could mark proposed redactions like this:

CleanShot 2024-07-05 at 21 49 34@2x

@ludwig-austermann
Copy link

ludwig-austermann commented Jul 10, 2024

Isn't something like this already doing the job?

// simple redaction
#let redact(body, color: black) = box(hide(body), fill: color)
// redaction as in the issue
#let redactre(body, color: black) = {
  show regex("[^\s]"): it => redact(it, color: color)
  body
}

@laurmaedje laurmaedje added layout Related to layout, positioning, etc. text Text layout, shaping, internationalization, etc. proposal You think that something is a good idea or should work differently. and removed feature request New feature or request labels Jul 14, 2024
@laurmaedje laurmaedje changed the title Add fill argument to #hide A way to redact text Jul 14, 2024
@laurmaedje laurmaedje added feature request New feature or request and removed layout Related to layout, positioning, etc. proposal You think that something is a good idea or should work differently. labels Jul 14, 2024
@Enivex
Copy link
Collaborator

Enivex commented Jul 14, 2024

Isn't something like this already doing the job?

// simple redaction
#let redact(body, color: black) = box(hide(body), fill: color)
// redaction as in the issue
#let redactre(body, color: black) = {
  show regex("[^\s]"): it => redact(it, color: color)
  body
}

That does not hide the size of redacted words

@ludwig-austermann
Copy link

Then I do not understand what the intent is. By using "." as the regex expression, however, everything would be marked.

@lazyeights
Copy link
Author

The regex approach still leaks information in the PDF output because the boxes contain information about the size of the glyphs or words. See https://www.wired.com/story/redact-pdf-online-privacy/

“Even if you do the redaction, supposedly correctly, even if you remove the text, there’s a lot of latent information that is dependent on the content that was redacted, and even that can leak information,” Levchenko says. “If you redact a name in a PDF, if the attacker has any context—they know this is an American—they will be able to, with high probability, either recover that name or narrow it down to a very small list of candidates.”

Edact-Ray focuses on the size of glyphs (broadly, characters or letters) and their positioning. “It’s pretty clear to a lot of people that the letter ‘L’ is skinnier than a letter ‘M,’ and that if you redacted just the letter ‘L,’ then you might be able to tell it is different from a redaction with just the letter ‘M,’” Bland says. The tool is essentially able to automatically compare the size of the redaction and the position of the letters with a predefined “dictionary” of words to estimate what has been replaced.

The code below shows the several issues. The first paragraph shows how #hide properly spaces, but does not communicate that there was a redaction. The second shows #box breaks the line if the selected overlaps into the next line. The third paragraph shows how #regex leaks the size of the glyphs in letters or words into the PDF file (even if it appears as all black that runs together).

// simple redaction
#let redact(body, color: black) = box(hide(body), fill: white, stroke: red)

// redaction as in the issue
#let redactre(body, color: black) = {
  show regex("."): it => redact(it, color: color)
  body
}

#box(stroke: black, inset: 1em)[#lorem(10) #lorem(10) #lorem(5)]

`#hide`
#box(stroke: black, inset: 1em)[#lorem(10) #hide(lorem(10)) #lorem(5)]

`#box`
#box(stroke: black, inset: 1em)[#lorem(10) #redact(lorem(10)) #lorem(5)]

`#regex`
#box(stroke: black, inset: 1em)[#lorem(10) #redactre(lorem(10)) #lorem(5)]

CleanShot 2024-07-15 at 10 26 35@2x

@ludwig-austermann
Copy link

Well I guess one could create a broken/bad implementation by start & end point, text height, content size and line spacing measurements, but a good solution seems not possible to me with the current typst version.

@mxmerz
Copy link

mxmerz commented Jul 18, 2024

How about something like this?
The function redact measures the width of the text that is to be redacted (say, 3.213cm), then it rounds that width to an integer millimetre (32mm) and creates that many 1mm-wide black boxes to fill the space.

  • The output pdf should not contain the text (though I don't know enough about Typst's internals to guarantee that).
  • You have a black bar that breaks at lines, but only tells an adversary roughly how wide the redacted text was (no information about individual glyphs).

I also tried to measure how much space is left on the current line (with layout, here().position(), etc.) but didn't succeed. If this was known, we could output one appropriately-sized black box per line (instead of one box per millimetre).

#set page(paper: "a6")

#let redact-box(width) = {
  box(rect(height: 1em, width: width, fill: black, stroke: none))
}

#let redact(size: none, body) = context {
  if size != none {
    redact-box(size)
  } else {
    let redact_size = measure(body) // size of the text that is to be redacted
    let redacted = int(redact_size.width.mm()) // width in number of mm's
    // create a 1mm-box for each mm
    for i in range(redacted) {
      redact-box(1mm)
    }
  }
}

Pursuant to Federal Circuit Rule 47.4, counsel for Petitioner certifies
the following:
1. The full name of the party represented by me: #redact[Public Co.], Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim aeque doleamus animo, cum corpore #redact(size: 2cm)[dolemus], fieri tamen permagna accessio potest, si aliquod aeternum et infinitum impendere malum nobis opinemur.
2. Quod idem licet transferre in voluptatem, #redact[ut postea variari voluptas distinguique possit, augeri amplificarique non possit. At etiam Athenis, ut e patre audiebam] facete et urbane Stoicos irridente, statua est in quo a nobis philosophia defensa et collaudata est, cum id, quod maxime placeat, facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et.

hide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request text Text layout, shaping, internationalization, etc.
Projects
None yet
Development

No branches or pull requests

5 participants