Wikipedia:Lua string functions

This essay lists Lua string functions and describes the use of Lua script to handle text strings in Wikipedia pages.

Module:String

edit

The main set of Lua string-handling functions are stored in Module:String.

Performance considerations

edit

The string-search functions in Lua script can run extremely fast, comparing millions of characters per second. For example, a search of a 40,000-character article text, for 99 separate words (passed as 99 parameters in a template), ran within one second of Lua CPU clock time. That total time involved the pinpointing of the locations of each of those 99 words in the long text, as scattered across the text.

Limitations on strings

edit

Lua can process text strings in excess of 230,000 characters, which allows the formatted contents of article pages to be used as input to string searches. However, there have been some limitations in string contents, such as hidden strings enclosed in nowiki-tag or pre-tag elements.

There seem to have been "bugs" in Lua chopping text which contains wikitables, either with "{|" tokens or with "<table>" tags. Although Lua has been used to scan entire articles (with expansion of all infoboxes, span-tags, {convert}'s, category links, and navboxes), unless some wikitables are commented-out by "<!--...-->" or noinclude'd, then Lua's view of the article contents has seemed to stop at the first wikitable. That action seems like a bug, where Lua should allow all article-page data into a text string. Also, Lua cannot see inside a nowiki tag (nor inside a "<pre>" tag), which always has length 43 characters, and never reveals any contents between "<nowiki>...</nowiki>" but only text before/after the nowiki tags.

Example of <table>: Seems to work for small tables. Compare the effect of "<table>" with the Lua-based Template:Str_find, in searching the whole string length:

  • {{str_find|123456789012|78}} → 7
  • {{str_find|123456789012|90}} → 9
  • {{str_find|1234<table><tr><td>5678</td></tr></table>9012|78}} → 22
  • {{str_find|1234<table><tr><td>5678</td></tr></table>9012|90}} → 42
  • {{str_find|12345<span>67890</span>12|78}} → 13

Using the "<table>" tag formerly stopped the string. Also, inserting a wikitable "{|" in column 1 would produce a similar effect (since "{|" generates a "<table>" tag). However it seems correct now in this example.

Example of <nowiki>: Compare the effect of "<nowiki>" with the Lua-based Template:Str_len, in getting the whole string length:

  • Nw1: {{str_len|123456789012}} → 12
  • Nw2: {{str_len|1234<nowiki>5678</nowiki>9012}} → 42
  • Nw3: {{str_len|1234<nowiki>567890</nowiki>12}} → 40

Using the "<nowiki>" tag hides the text but counts as 34 characters. So, for the case Nw2, the length is 4 34 4=42, and Nw3 yielded 4 34 2=40 characters long.

The mw.text.unstrip function at MediaWiki's "Lua reference manual" (which is not yet live) may be of help with the nowiki tags. Also see: T47085).

See also

edit
English Wikipedia-specific resources