Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: Check bilingual_str format strings at compile time #31074

Closed
wants to merge 7 commits into from

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Oct 11, 2024

Check translated strprintf calls at compile time to prevent tinyformat::format_error exceptions when the original string does not contain format specifiers matching the number of formatted arguments. Exceptions are still thrown if the translated strings do not contain the right number of format specifiers.

Most of the commits in this PR are code cleanup needed to make the compile time checks apply, but the relevant commits implementing the checks are simple:


This PR is alternative to #31061, that has messier syntax than #31061, but a cleaner implementation and a less confusing API. Unlike #31061, the _ and Untranslated functions continue to return bilingual_str structs with original and translated members instead of returning different structs with lit and translate() members. This approach is less confusing because it means there's a single type for representing bilingual values, and means that unlike #31061, code that is not doing string formatting does not need to change. Only code actually using tinyformat changes in this PR.

A drawback to this change compared to #31061 is that syntax for translated format strings is messier:

-    return strprintf(_("%s is set very high!"), optname);
     return strprintf(_<"%s is set very high!">(), optname);

But that's also a good way of showing that the string is used at compile time, and without this, there's no way to get the _ function to return a type that is compatible with bilingual_str.

Marking this as a draft because I need to think about differences of the two approaches. But I am inclined to think this approach is better because it is not fighting with compiler to be able evaluate function arguments at compile time, when this is supposed to be what template arguments are for https://old.reddit.com/r/cpp/comments/lxmv2z/allowing_parameters_of_consteval_function_to_be/gpp40lf/

MarcoFalke and others added 7 commits October 9, 2024 11:50
Previously, the std::string overload of tfm::format was selected, which
does no checks. Now, the ConstevalFormatString overload is picked.
Previously, the std::string overload of tfm::format was selected, which
does no checks. Now, the ConstevalFormatString overload is picked.

Also get rid of similar format string concatenations in init.cpp.

Co-Authored-By: Ryan Ofsky <[email protected]>
This passes the return value of _() directly to strprintf so the format
string can be checked at compile time in a future commit.
This is required for a future commit. Can be reviewed via the git
options --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space

Also move util::detail::Hex to a proper namespace instead of an inline
namespace so it doesn't conflict with the new util::detail namespace, and
won't create other problems for callers trying to use the inline namespaces.

Also fix a misleading comment in util_string_tests.cpp.

Co-Authored-By: Ryan Ofsky <[email protected]>
bilingual_lit is a subclass of the bilingual_str type and can be used in the
same way, but has additional information about the literal string embedded in
the type of the struct so it can be used for compile time checking.
-BEGIN VERIFY SCRIPT-
quote='"(?:[^"\\]|\\.)*"'
quotes="(?:$quote|\\s|[A-Z_]|\\\\\\n)*"
git grep -l '_('            | xargs perl -0777 -i -pe "s/strprintf\((\\W*)_\(($quotes)\)/strprintf(\1_<\2>()/gs"
git grep -l 'Untranslated(' | xargs perl -0777 -i -pe "s/strprintf\((\\W*)Untranslated\(($quotes)\)/strprintf(\1Untranslated<\2>()/gs"
-END VERIFY SCRIPT-
Replace tinyformat::format overload that used to accept bilingual_str with
tinyformat::format overload that only accepts bilingual_lit.
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 11, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #bitcoin-core/gui/762 (Update about logo icon (colour) to denote the chain type of the QT instance in About/ Help Message Window/ Dialog by pablomartin4btc)
  • #31072 (scripted-diff: Replace strprintf(Untranslated(...)) with Untranslated(strprintf(...)) by ryanofsky)
  • #31061 (refactor: Check translatable format strings at compile-time by maflcko)
  • #31042 (build: Rename PACKAGE_* variables to CLIENT_* by hebasto)
  • #30988 (Split CConnman by vasild)
  • #30928 (tinyformat: refactor: increase compile-time checks and don't throw for tfm::format_error by stickies-v)
  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
  • #29770 (index: Check all necessary block data is available before starting to sync by fjahr)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
  • #29492 (refactor: Remove redundant definitions by Empact)
  • #29278 (Wallet: Add maxfeerate wallet startup option by ismaelsadeeq)
  • #28792 (Embed default ASMap as binary dump header file by fjahr)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)
  • #28574 (wallet: optimize migration process, batch db transactions by furszy)
  • #27865 (wallet: Track no-longer-spendable TXOs separately by achow101)
  • #27286 (wallet: Keep track of the wallet's own transaction outputs in memory by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Oct 12, 2024

Will close this. There may be some useful ideas here but I don't think this is a clear win over #31061.

The specific thing that convinced me to close this PR was learning about constexpr parameter proposal P1045. Currently in c , it is possible to pass arguments known at compile-time to functions either as <> or () arguments, but there are significant benefits to using <> instead of () because:

  1. With () parameters, you can't use the parameter in any type expression or consteval expression, so you can only ever create values out of, it not types, and they can't be constexpr values.
  2. With () parameters, if the parameter is a template type like StringLiteral in this PR, the template arguments have have to be specified explicitly and can't reliably be deduced because function template argument deduction is less powerful than class template argument deduction.
  3. It may be better style to pass arguments that are only used at compile time as template rather than function parameters.

However if function arguments could be marked constexpr like the proposal allows, the first limitation would go away and the other two don't apply to this situation, so this PR would be complicating the syntax now because of a language limitation that might not exist in the future.

@ryanofsky ryanofsky closed this Oct 12, 2024
fanquake added a commit that referenced this pull request Dec 6, 2024
…ages

0184d33 scripted-diff: Replace strprintf(Untranslated) with Untranslated(strprintf) (Ryan Ofsky)
006e4d1 refactor: Use   instead of strformat to concatenate translated & untranslated strings (Ryan Ofsky)
831d2bf refactor: Don't embed translated string in untranslated string. (Ryan Ofsky)
0580219 refactor: Avoid concatenation of format strings (Ryan Ofsky)

Pull request description:

  This PR cleans up string formatting in the codebase so other PRs adding compile time checking can be simpler and easier to review (specifically #30928, #31061, #31074, and #31149).

  Currently these PRs are hard to review because in addition to changing formatting APIs, they have to update callers that are using the API's in unusual ways. Clean up these callers now so later PRs can be simpler. Specifically:

  - Use string literals instead of `std::string` format strings to enable more compile-time checking.
  - Avoid using untranslated bilingual strings as format strings. Use originals so they can by checked at compile time.
  - Favor `Untranslated(strprintf(...))` over `strprintf(Untranslated(...), ...)` for consistency and to prevent translated and untranslated strings from being unintentionally combined.

ACKs for top commit:
  maflcko:
    lgtm ACK 0184d33 🔹
  l0rinc:
    ACK 0184d33 - no overall difference because of the rebase

Tree-SHA512: 37eb771187d30977f5f054eddb82af6dd69878ace89cede72032bb389e57200898119f9fe486ce6903ebb00cb24648be215ab3e44842b3d206b35e26038da750
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants