Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create template function to remove non-printable characters #11255

Open
Winterhuman opened this issue Jul 16, 2023 · 5 comments
Open

Create template function to remove non-printable characters #11255

Winterhuman opened this issue Jul 16, 2023 · 5 comments
Labels

Comments

@Winterhuman
Copy link

Winterhuman commented Jul 16, 2023

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.112.3 extended linux/amd64 BuildDate=unknown

Does this issue reproduce with the latest release?

Yes.

Issue description

As far as I can tell, no combination of plainify, safeHTML, markdownify, or html{Une,E}scape can fully remove soft hyphens, and at best converts them to \uad, the only exception I found being anchorize which does successfully remove them.

I'm not sure which functions out of the above should be able to remove soft hyphens or not (my guess would be plainify, since it's meant to "strip HTML tags"), but for now, using replace .Var "­" "" is the only way other than anchorize that I've found.

@jmooring
Copy link
Member

jmooring commented Jul 16, 2023

{{ "_­_" | warnf "%[1]v (%[1]T)" }} --> ­ (string)
{{ "_­_" | htmlUnescape | warnf "%[1]v (%[1]T)" }}  --> _ ­_ (string)

If you need additional assistance, please use the forum (https://discourse.gohugo.io/) for questions and troubleshooting. We prefer to use GitHub for verified bugs and vetted enhancements. Thanks.

@jmooring
Copy link
Member

OK, I see it now. Reopening.

@jmooring jmooring reopened this Jul 16, 2023
@jmooring
Copy link
Member

jmooring commented Jul 16, 2023

First, the plainify function removes HTML tags (e.g., <strong>), not HTML entities (e.g., &amp;). It is doing the right thing.

Second, the htmlUnescape function "returns the given string with HTML escape codes un-escaped." This function is also doing the right thing.

So, there's no bug here, but possibly an enhancement: create a template function that removes non-printable characters. This gets a bit tricky because sometimes you may want to remove them, and at other times replace them with a space (e.g., tab should be replaced by space, but zero-width space should not). I think these would have to be special-cased.

For now I think you have three options:

  1. Use the replace function
  2. Use the replaceRE function
  3. Create a partial function that contains a slice of HTML entities to remove or replace.

@jmooring jmooring changed the title Few functions can remove &shy; Create template function to remove non-printable characters Jul 16, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@github-actions github-actions bot added the Stale label Jul 16, 2024
@Winterhuman
Copy link
Author

Still relevant

@github-actions github-actions bot removed the Stale label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants