Wiktionary:Grease pit/2020/January

Self-transclusion on Proto-West Germanic pages

If you look at the transclusion list for *geutan, you see that it transcludes itself. Other Proto-West Germanic pages seem to have this as well. The source seems to be in the descendants list. Any idea why this is, and is it a problem? —Rua (mew) 11:27, 1 January 2020 (UTC)[reply]

@Rua: Ahh, that's because {{reconstructed}} checks the headers on the page using Module:reconstruction. I'll add documentation for this. — Eru·tuon 18:49, 1 January 2020 (UTC)[reply]

Searching for hard-coded taxonomic binomials in entries

There's no doubt some elegant way to do this in the search box, but it involves integrating a literal search for very basic wiki syntax with a regex, so I haven't been able to figure this out yet. I need to look for:

'' (two single quotes/apostrophes)

followed by:

exactly 1 uppercase A-Z

followed by:

1 or more lowercase a-z

followed by:

exactly 1 space

followed by:

2 or more lowercase a-z

followed by:

'' (two single quotes/apostrophes)

My best (very amateur) guess at a regex:

''[A-Z][a-z]* [a-z]*''

Could someone either show me how to do this, or provide a list of entries that I can use? Chuck Entz (talk) 06:56, 2 January 2020 (UTC)[reply]

@Chuck Entz search functionality aside, the regex should be ''[A-Z][a-z] [a-z]{2,}'' based on your description - if you put the regex into https://regexr.com/, you can see an explanation. --DannyS712 (talk) 07:05, 2 January 2020 (UTC)[reply]

So this works, but doesn't complete. It'll be easier if the words were longer, but its a start. See mw:Help:CirrusSearch for the technical details behind regex searching --DannyS712 (talk) 07:13, 2 January 2020 (UTC)[reply]

Here's a list of the possible taxonomic names from mainspace pages in the 2019-12-20 dump, using an equivalent Lua pattern, but excluding any matches preceded by three apostrophes, to exclude bolded text (imperfectly, as it doesn't account for combined bolding and italics). [Edit: For curiosity's sake, number of occurrences of each possible taxonomic binomial. The same census with hyphen-internal names included.] — Eru·tuon 07:44, 2 January 2020 (UTC)[reply]

I've lately just searched for species names for one genus at a time, a less-daunting, but inherently incomplete answer. And I haven't been leaving breadcrumbs, so I'll eventually be duplicating my efforts.

One can also improve the rate of yield for the search by including words like plant, species, and word fragments (using insource) like "-like". Sometimes there are specific contributors who have worked in specific languages so restricting searches to noun lemmas in those languages can also give a better yield. All such techniques that increase the search yield rate will omit many entries with species names.

Many 'naive' searches generate false positives such as from filenames for images, or {{trans-top}} headers. Trying to exclude all such things might be hard. But perhaps a dump run including preceeding (and succeeding?) characters (12?, 20?, 40?) of the potential species name instances would help. DCDuring (talk) 18:35, 2 January 2020 (UTC)[reply]

"nuqtaless form of" template

This template: {{nuqtaless form of}} is very useful but it should be limited to (I think) Devanagari scripts, specifically Hindi. It should also add to a category, such as Category:Hindi terms spelled without a nuqta (language is the second parameter), which should be a subcategory of Category:Hindi terms with irregular pronunciations (based on hi) because the spelling doesn't give enough information to pronounce the term correctly. If this is based on {{ru-noun-alt-ё}}, the idea is implemented there. Please also note that the Russian template transliterates as in the dictionary form - самолет->самолёт (samoljót) To keep things simple, I think it should be just Hindi. No need to to reuse it for other scripts. Arabic relaxed spelling forms (lack of hamza, etc.) could use a similar approach.

Here's an example: बगीचा (bagīcā) nuqtaless form of -> बग़ीचा (baġīcā). The dictionary form is "बग़ीचा" (with a nuqta (dot)) and the expected dictionary transliteration is "baġīcā" (not "bagīcā"). If it's there are too many cases of "spelling transliteration", then transliteration should be unchanged.

Calling @DerekWinters, AryamanA, Benwing2. --Anatoli T. ^{(обсудить}/^вклад) 11:03, 3 January 2020 (UTC)[reply]

@Atitarev: It also can be used for Punjabi and a host of languages closely related to Hindi (like Bhojpuri, Dogri, etc.). —Aryaman^A ^{(मुझसे बात करें • योगदान)} 01:00, 4 January 2020 (UTC)[reply]

@AryamanA: I see, thanks. The template should only deal with Indo-Aryan languages, which uses nuqta (dot) and may miss them in writing but they are pronounced as if it's written or (alternatively) as if they are spelled without it.

Two sets of pronunciations are automatically given by Module:hi-IPA, one resulting from the spelling with nuqta and one without:

बग़ीचा (baġīcā): /bə.ɡiː.t͡ʃɑː/ and /bə.ɣiː.t͡ʃɑː/, nuqtaless form: बगीचा (bagīcā)
ख़रगोश (xargoś): /kʰəɾ.ɡoːʃ/ and /xəɾ.ɡoːʃ/, nuqtaless form: खरगोश (khargoś)
क़लम (qalam): /kə.ləm/ and /qə.ləm/, nuqtaless form: कलम (kalam)
झ़ूझ़ (źūź): /zuːz/ and /ʒuːʒ/, nuqtaless form: झूझ (jhūjh)

However, these give only one pronunciation:

ज़रूर (zarūr): /zə.ɾuːɾ/, nuqtaless form: जरूर (jarūr)
फ़िल्म (film): /fɪlm/, nuqtaless form: फिल्म (philm)
बालतोड़ (bāltoṛ): /bɑːl.t̪oːɽ/, nuqtaless form: बालतोड (bāltoḍ)
अलीगढ़ (alīgaṛh): /ə.liː.ɡəɽʱ/, nuqtaless form: अलीगढ (alīgaḍh)

Only nuqta'ed letters क़ (qa), ख़ (xa), ग़ (ġa) and झ़ (źa) are given dual pronunciations but not ज़ (za), फ़ (fa), ड़ (ṛa) and ढ़ (ṛha). I think, at least ज़ (za) and less commonly फ़ (fa) are sometimes pronounced as /d͡ʒ/ and /pʰ/ accordingly. --Anatoli T. ^{(обсудить}/^вклад) 11:32, 4 January 2020 (UTC)[reply]

Can letters ज़ (za) and फ़ (fa) be given the Perso-Arabic treatment as well? (I don't know about ड़ (ṛa) and ढ़ (ṛha)). --Anatoli T. ^{(обсудить}/^вклад) 11:42, 4 January 2020 (UTC)[reply]

edit-filter false-positive?

I just tried to create 2 new dictionary entries. The first page was created without incident. Then whenever I try to add another dictionary entry, I get the error “Warning: This action has been automatically identified as harmful”....“A brief description of the abuse rule which your action matched is: various specific spammer habits.” This happens no matter what definition I write for the term. I even tried using the new-entry (NEC) wizard. (Eventually it stopped even offering me the NEC wizard as an option.) (As a test, I tried creating a third entry, and even though it did not meet Wiktionary policy, it allowed me to create it. I have since requested deletion of my test edit, but I still want to know why it won't let me create the entry that I was actually trying to create.) 66.82.144.143 23:47, 3 January 2020 (UTC)[reply]

The "various specific spammer habits" filter covers a lot of unrelated things. Can you show us the exact text you were trying to create? Use a site like PasteBin if you can't save it here. Equinox ◑ 00:16, 4 January 2020 (UTC)[reply]

Well, the entry in question that the IP has failed to create is Active Directory according to the abuse filter log. That's a trademark so it might not merit inclusion. — Eru·tuon 00:25, 4 January 2020 (UTC)[reply]

I believe we had a lot of trouble with new IP users creating entries with fake phone numbers for Microsoft (so that ignorant users may pick up the number from a search engine and ring up the fraudsters for support). So that's probably the issue here. I agree, AD is a brand and probably not includable here anyway. Equinox ◑ 00:37, 4 January 2020 (UTC)[reply]

~~I believe that~~ “Active Directory” ~~has entered the lexicon unless I misunderstand what is meant by “entered the lexicon.”~~ is a commonly-used term. In any case, if “Active Directory” is not inclusion-worthy then neither is the acronym that I initially added. 66.82.144.143 01:14, 4 January 2020 (UTC)[reply]

Add pronunciations to /definitions api

Hi,

This is the api I'm talking about: https://en.wiktionary.org/api/rest_v1/#/Page content/get_page_definition__term_

Adding these would be very useful

1) pronunciations

2) urls to the related audio clips of said pronunciations

I probably should be posting this in phabricator but, if I'm being honest, I'd rather not learn to navigate that thing just for this, so I came to throw ball here instead.

88.193.149.0 14:15, 4 January 2020 (UTC)[reply]

I don't know who maintains this API, but probably someone in MediaWiki. If you don't want to post on Phabricator and nobody responds here, you might also try posting on the talk page of this page on the MediaWiki wiki. — Eru·tuon 19:49, 4 January 2020 (UTC)[reply]

Spitballing about unencoded characters in pages' display-titles

I suspect editors more technically adept than me have already thought about this, but: is it possible, either via MediaWiki:UnsupportedTitles.js or via an in-entry template or a Lua module (perhaps of the sort which we use to verticalize Mongolian entries' titles), to put images (directly or via templates like {{biang}}) into the displayed title of entries like ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心麵? I've played around and it doesn't seem to work, which leads to my next idea: could we use MediaWiki:UnsupportedTitles.js to replace long titles like ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心麵 with a blank space, and then float an image of the correct title over that space? Would that be too prone to displaying incorrectly on different browsers/skins/phones to be a good idea? - -sche (discuss) 20:02, 10 January 2020 (UTC)[reply]

Missing gender options in TranslationAdder

The translation box generated by MediaWiki:Gadget-TranslationAdder.js is missing option for masculine personal, animate and inanimate genders used in Slavic languages. I don't have edit access to the file to add them. Can someone add them or give me access to that file? --Tweenk (talk) 00:05, 12 January 2020 (UTC)[reply]

@Tweenk: The gender data for the TranslationAdder is housed in MediaWiki:Gadget-TranslationAdder-Data.js. The current list for all or most of the Slavic languages seems to be ["m", "f", "n", "m-p", "f-p", "n-p", "impf", "pf"]. What should it be instead? — Eru·tuon 00:28, 12 January 2020 (UTC)[reply]

@Erutuon: For Polish it should be: ["m-pr", "m-an", "m-in", "f", "n", "m-p", "f-p", "n-p", "impf", "pf"]. The "m" option is ambiguous so it can be omitted. However, the gadget doesn't have support for these, they would have to be added there first. --Tweenk (talk) 09:14, 12 January 2020 (UTC)[reply]

@Erutuon, Tweenk: I considered requesting this in the past, just like perfective/imperfective verb distinctions (most but not all Slavic languages) but then thought it wasn't worth it. Here's why

Animacy of Slavic common nouns is worth keeping to entries. A huge number of translations are done by anonymous IP users who are clueless or careless. They'll just use a default or wrong genders. Too many options is worse.
Slavic noun animacy is mostly very predictable by the actual meaning - living things - creatures, humans, excluding plants are animate, everything else is inanimate. There are peculiar and infrequent exceptions, e.g. Russian ро́бот (róbot, “robot”) and мертве́ц (mertvéc, “dead man, corpse”) are animate. There are cases when nouns can be used both ways - dependent on senses or even the same sense can be used as both animate and inanimate. E.g. креве́тки (krevétki, “prawns”) can be inanimate if used as food.
Animacy of nouns in Slavic languages affects declensions, not the usage, unlike perfective/imperfective for verbs, which has a larger impact on how verbs are used but also their conjugation.

Please note that animacy is not implemented for ALL Slavic languages (headword templates and modules), although it's there for almost all. I believe animacy has no consequence for Bulgarian/Macedonian. I translations, only Polish nouns feature animacy, must be added by Tweenk. I usually don't bother adding animacy to Slavic nouns in translations but entries must have them. --Anatoli T. ^{(обсудить}/^вклад) 01:18, 13 January 2020 (UTC)[reply]

Esperanto in Module:IPA/data

Can somebody add the following code to Module:IPA/data?

data.phonemes["eo"] = {
	"a", "b", "d", "d͡ʒ", "e", "f", "h", "i", "j", "k",
	"l", "m", "n", "o", "p", "r", "s", "t", "t͡s", "t͡ʃ",
	"u", "v", "w", "x", "z", "ɡ", "ʃ", "ʒ",
	"ˈ", ".", " ",
	}

Robin van der Vliet (talk) (contribs) 00:14, 13 January 2020 (UTC)[reply]

@Robin van der Vliet: Done. — Eru·tuon 00:29, 13 January 2020 (UTC)[reply]

Right-to-left beginning quotation and citation templates

It’s time for these templates {{cite-meta}} or {{quote-web}}, or whatever one now is eager to deploy to catch quotes in markup, to somehow detect the writing direction. If the author names in addition to the title are Hebrew or Arabic or Persian, like perhaps everything else in the reference, the references are not readable at all if one does not add a left-to-right mark after the author parameters. And if one does so, one has to start reading somewhere in the middle because that’s where the author names start now, going to the left, and then on the right follows the title, when what they should so is start on the right side of the text block (even if containing left-to-right text, e.g. taxons). Example 1, Hebrew: at دَيْر (dayr) (does not look too bad because of the Right-to-Left parts being very short but still wrong). Example 2, Arabic: at عُشَر (ʕušar). The journal piece linked in the second example, as a model, by the way shows in its bibliography how Right-to-Left titles appear correctly.
A parameter to set direction might solve it perhaps, and perhaps its default value can be inferred from |lang= or |worklang=, parameters often required anyway. Fay Freak (talk) 21:37, 13 January 2020 (UTC) Also concerns {{sic}}, which looks unnecessarily bad without left-to-right-mark. Fay Freak (talk) 22:37, 28 January 2020 (UTC)[reply]

Requesting protection for Middle Chinese and Old Chinese pronunciation modules

I would like to request for the protection for all individual subpages placed under the following modules:

as well as the following pages:

The values in these pages should not be changed or modified unless there are typos or graphical variants, e.g. conversion of 爲 / 为 to 為 / 为.

If someone were to add, modify or remove any of the values, it would cause changes that would affect the output of several templates, and this is not easy to fix if multiple readings are involved, e.g.

{{zh-pron|mc=1 2|oc=1 2;2 3}}
{{ltc-l|id=2}}
{{och-l|id=2}}

It would be better for users to request for changes to be done on a case-by-case basis, like this Nov 2019 Tea room discussion. KevinUp (talk) 09:42, 14 January 2020 (UTC)[reply]

I've protected Module:zh-glyph/phonetic/list and Module:zh-glyph/phonetic. I can protect the rest with a Pywikibot script in about 10:15:49 (36949 protections at one operation per second), and add a notice to the documentation that directs users to propose changes in the Tea Room rather than on the talk pages. — Eru·tuon 22:55, 15 January 2020 (UTC)[reply]

@Erutuon: Thanks, please run the script when you have the time. Adding a notice to the documentation page would be good as well. KevinUp (talk) 06:30, 16 January 2020 (UTC)[reply]

Done at last! — Eru·tuon 05:11, 20 January 2020 (UTC)[reply]

second attributive form parameter in af-adj

Would it be possible to include a second, optional parameter for an attributive form in {{af-adj}}, and to include a qualifier parameter for this second attributive-form parameter? Some adjectives have two attestable attributive forms, often with one being less common than the other. ~~←₰-→~~ Lingo ^Bingo _Dingo (talk) 15:23, 14 January 2020 (UTC)[reply]

WOTD template edit link

Look at e.g. Wiktionary:Word of the day/Recycled pages/January. Each word has a pair of "edit, refresh" links, but clicking on the "edit" link takes you to the editing-view of a nonexistent page with one too many slashes, e.g. Wiktionary:Word_of_the_day//January_1 instead of Wiktionary:Word_of_the_day/January_1. - -sche (discuss) 09:54, 16 January 2020 (UTC)[reply]

Category:Chinese Han characters

Could someone remove ', B, K ,W, and 🌶 out of Category:Chinese Han characters ? They are definitely not Han characters. --OctraBot (talk) 04:04, 17 January 2020 (UTC)[reply]

PS I am Octahedron80 in bot mission at the moment.

Just noting that Justinrleung has fixed this (thanks!) by categorizing them by part of speech. - -sche (discuss) 02:54, 18 January 2020 (UTC)[reply]

Font size of headword line of Translingual symbols

I changed clx from using {{mul-symbol}} to {{head|mul|symbol}}, because {{mul-symbol}} was causing the headword to display at an awkwardly large font size, different from the size of the other headword on the page, and different from e.g. mlx (which already used {{head|mul|symbol}}). We should probably either make {{mul-symbol}} not display headwords so large if they consist only of (Latin-script?) letters (and punctuation?), or perhaps make it not display headwords so large at all (but that might cause issues for small symbols). Failing that, we could mass-edit entries like clx to consistently use {{head|mul|symbol}} or, if we actually prefer the embiggened display, then edit them to consistently use {{mul-symbol}} instead of the current inconsistent state of affairs. - -sche (discuss) 19:42, 17 January 2020 (UTC)[reply]

~~I have had browser-specific font-size problems with Latin Translingual characters. What browser do you use?~~ nvm. DCDuring (talk) 20:05, 17 January 2020 (UTC)[reply]

@-sche: {{mul-symbol}} manually sets the script code to Zsym if it hasn't been specified in the |sc= parameter; .Zsym has font-size: 150%; in MediaWiki:Common.css. So currently this can be fixed by setting |sc=Latn; but it would be better if it were automatic. — Eru·tuon 20:14, 17 January 2020 (UTC)[reply]

Is there any way the template could do script detection? It's not good to ignore script formatting/tagging just because the script is used in a symbol. Chuck Entz (talk) 20:29, 17 January 2020 (UTC)[reply]

@Chuck Entz: Sure, automatic script detection would be turned on by deleting the default script code in the template. (I've made the template assign Latin script if the characters are only ASCII letters, which is at least better.) The template would then choose the best script from among the scripts listed for mul in Module:languages/data3/m (the same procedure used for other languages), and end up using Latn for clx. I don't know how satisfactory this would be; the script codes Zsym and Zmth aren't Unicode things, so someone here on Wiktionary had to decide what characters they should include (in Module:scripts/data). — Eru·tuon 20:43, 17 January 2020 (UTC)[reply]

Thanks for fixing the immediate issue, Erutuon. - -sche (discuss) 22:59, 17 January 2020 (UTC)[reply]

@Chuck Entz: So, if {{mul-symbol}} used automatic script detection, a script other than None would be assigned in all but 180 of the 896 titles that use the template:

!?
~
€
₦
₪
₥
¢
฿
?
∑
≈
;)
₤
₠
₡
₫
₭
₴
₲
₱
₣
₰
₵
৲
৳
₢
៛
₳
∧
シ
^ ^
ℂ
μΩ
ⅼ
℗
∋
∌
϶
^^
⊽
‒
ℶ
∥
‖
≷
⨌
∟
∠
∢
∶
∡
∷
∹
∮
∰
∯
∱
≃
≄
≌
≑
≞
≶
⊌
⊎
⊍
⋆
∲
∳
≉
≐
≙
≘
⊈
⊉
⊩
⊮
⊬
⊹
⊺
⊿
ʇ
ˀ
ˠ
؋
ꜛ
ꜜ
→
ಠ ಠ
ʼ
႟
႞
⁺
←
Unsupported titles/Number sign
Unsupported titles/Vertical line
¬¬
=^.^=
=3
₶
↓
‌
‍
ʗ
=)
‛
⁻
₊
₋
₌
ᵣ
ˤ
ʖ
↑
῭
΅
ᵊ
₽
⁀
⸺
⸻
᾽
᾿
ʽ
῾
̔
𑄷
𑄶
𑄸
𑄹
𑄺
𑄻
𑄼
𑄽
𑄾
𑄿
¯\ (ツ) /¯
ᴞ
↔
⟿
Ᵽ
̓
Ꞓ
ಥ ಥ
॰
ʾ
ʿ
₾
ˉ
;-)
𑅀
𑅁
𑅂
𑅃
₿
⇵
((( )))
⁐
^.^
Unsupported titles/Greater than low line less than
≗
≖
ᵻ
ᵿ
↚
≊
≂
↛
◻
;(
⟨ ⟩
⑨
‗
˞
ᴝ
⊷
ੴ
ᴟ
￻

This would be fine for the ASCII titles, but some of the others might display better if they have script classes applied to them. — Eru·tuon 00:39, 18 January 2020 (UTC)[reply]

The best of both worlds would be to have the symbol codes used if the script detection fails. Unfortunately, that probably means it would have to be done independently. Is there a way to call a script-detection function, and then pass the appropriate script to {{head}} in order to avoid the latter unnecessarily repeating the script detection? Chuck Entz (talk) 00:57, 18 January 2020 (UTC)[reply]

@Chuck Entz: Yes, the script detection function can be called through Module:scripts/templates. When the |sc= parameter is present, {{head}} won't do script detection for the first headword. That provides a good way to solve this. Now the template uses normal script detection, but, by means of {{mul-symbol/script}}, replaces None with Zsym. That should be an improvement. — Eru·tuon 03:00, 18 January 2020 (UTC)[reply]

Strangeness with loadData vs. require

@Rua, Erutuon Maybe one of you two can help me figure out some weirdness I observed. I have been gradually adding more places to Module:place/shared-data, and I've been worrying that this will eventually cause some pages like Washington (which has > 30 invocations of {{place}}, each of which loads the module with require()) to run out of memory. The main data table in Module:place/shared-data has handler functions mixed in with data, so I have experimented with moving the subtables that are pure data into Module:place/shared-data/tables. When I did this, however, and replaced the inline subtables with references to Module:place/shared-data/tables, the memory of Washington *increased* from 25MB to 29.some MB. To do this I had to add a call to mw.loadData("Module:place/shared-data/tables"), an extra require("Module:table") and some calls to m_table.shallowcopy() on some data that would originate from Module:place/shared-data/tables, because it gets in-place modified by subpolity_value_handler() and similar functions. However, these changes don't materially affect the memory usage. I've verified that making the single-line change of replacing data = export.US_states, on line 1528 of Module:place/shared-data with data = m_shared_tables.us_states, (i.e. changing the reference to the subtable listing US states from elsewhere in Module:place/shared-data to Module:place/shared-data/tables) increases the memory from 25MB to around 27MB. I can't understand why this would happen; in both the before and after scenarios, Module:place/shared-data/tables is loaded with mw.loadData(); all that's changing is a single pointer. The only thing I can think of is that the metatable functions used to access the table are somehow using up memory, but I can't see why that would be the case. Any thoughts?

BTW what prompted this was I'm planning on adding a bunch of data on major cities to Module:place/shared-data (see Module:User:Benwing2/place/cities, containing the data as I have it so far). I assumed this would significantly increase the memory usage of pages that call {{place}} a lot; but when I tried adding the data to Module:place/shared-data, it only increased the memory usage of Washington by about 100K. This leads me to conclude I don't understand Lua memory usage very well. Benwing2 (talk) 20:54, 19 January 2020 (UTC)[reply]

Well, accessing subtables of a table loaded with mw.loadData has a cost. data = m_shared_tables.us_states, is different from data = export.US_states,, if export.US_states is a plain table defined elsewhere in the same module, and m_shared_tables has been loaded with mw.loadData and has not been indexed by "us_states" before: it calls an __index metamethod of the table returned by mw.loadData, which in turn wraps the underlying data with the dataWrapper function, which generates new tables and functions each time. And if another function accesses fields of m_shared_tables.us_states, or iterates over it with pairs or ipairs, the __index metamethod is called there as well. The wrapped tables are unique per call to mw.loadData; that is, if you save the output of mw.loadData and index it twice, it returns the same value; but if you load a module with mw.loadData and index it twice, it generates two different values. (See this demonstration.) So the more subtables accessed, the more memory used. The maximum memory would probably be used if the table were iterated over recursively.

There are 34 invocations of {{place}} in Washington so that's at least 34 * 3 calls to dataWrapper: for the top-level table of Module:place/shared-data/tables, the US states table, and the "Washington" subtable, in each invocation. I don't know if that fully explains the approximately 4 MB jump (dividing 4 MB by 34 * 3 equals almost 40 KB per call to dataWrapper, which seems like a lot), but it makes it somewhat plausible. mw.loadData avoids re-executing a module, but it only saves memory if the memory cost of wrapping each table accessed through dataWrapper is less than the memory cost of re-generating the whole module table – apparently not in this case.

Oh, there's also the overhead of validating data each time mw.loadData is called on a new module, probably only for the first invocation of a module on a page. — Eru·tuon 22:18, 19 January 2020 (UTC)[reply]

Thank you for your detailed analysis! I will have to go over it carefully to understand exactly what you're saying, but I see that accessing a subtable does indeed create garbage. It looks like I'll have to ditch the loadData() approach. Instead I might want to put the city data in another module if it starts to cause problems; it will only need to be loaded if someone references a holonym of type "city", which mostly only occurs with neighborhoods, suburbs and the like. OTOH this whole concern may be overblown; Mount Pleasant has 60 invocations of {{place}} and only uses 10MB. Benwing2 (talk) 23:12, 19 January 2020 (UTC)[reply]

Esperanto ordinal numbers

Instead of hardcoding some ordinal numbers in Module:eo-headword/exceptions, can somebody edit the function "getPOS" in Module:eo-headword, so that it recognizes all ordinal numbers as adjectives? Here are two regular expressions to recognize an Esperanto ordinal number: ^[0-9]{1,}-?a$ (return "adjectives") and ^[0-9]{1,}-?aj?n?$ (return "adjective forms"). Robin van der Vliet (talk) (contribs) 18:24, 22 January 2020 (UTC)[reply]

The following piece of code should fix it, it should be added on line 50 of Module:eo-headword. Can someone with the proper rights add it and test it?

	-- deal with ordinals
	if mw.ustring.match(word,"^[0-9]{1,}-?a$") then
		return "adjectives"
	elseif mw.ustring.match(word,"^[0-9]{1,}-?aj?n?$") then
		return "adjective forms"
	end

Robin van der Vliet (talk) (contribs) 12:30, 23 January 2020 (UTC)[reply]

@Robin van der Vliet: Done. I've converted the regex to a valid Lua pattern. — Eru·tuon 19:25, 24 January 2020 (UTC)[reply]

Module:etymology languages/data and Chinese dialects

Would adding a detailed Sinitic family tree to Module:etymology languages/data be allowed and or useful?

For example:

Mandarin (cmn)
- Beijing Mandarin (?) (family, 北京官話 (Běijīng guānhuà))
  - Jing-shi 片 (京師片)
    - Beijing Mandarin (?) (of the city of Beijing, 北京話 (Běijīnghuà))
  - Standard Chinese (?)
    - Putonghua (?)
    - Guoyu (?)
      (cf. the infobox at 中華民國國語 (“ROC Guoyu”))

or:

Teochew (zhx-teo)
- Shantou Teochew (?) (of the city of Shantou, 汕頭話 (Shàntóuhuà))

Similar data is already at Module:zh/data/dial, which belongs to {{zh-dial}}.

I previously did something similar specifically for Category:Philippine Hokkien; see the family tree at Category:Min Nan language.

—Suzukaze-c ◇◇ 10:04, 25 January 2020 (UTC)[reply]

Can we get "Who Wrote That" to work here?

See MW page on new tool in beta. It is a FF and Chrome browser extension for now and works for some pedias. I have tried it and it looks handy for patrolling and entry review. But it seems to require the WhoColor API extension. I don't know whether WhoColor API is specific to WP vs. other wikis. DCDuring (talk) 17:55, 25 January 2020 (UTC)[reply]

Lua overflows

When browsing through the 一 lemma - a very common character in East Asian languages with many possible languages - I noticed that the bottom was full of messages saying: "Lua error: not enough memory." Since the error appears consistently at the end of the page, and only there, it was clear to my layman's mind that there are simple more modules than some part of the system (Lua itself? The Wikimedia software? My browser?) can cope with; something I verified successfully by previewing only a section containing otherwise broken modules. I see the issue has already been raised above. Clearly this problem must be fixed one way or another, so I will be more than happy to report this problem here.

Should the problem be too fundamental to solve, then we must seriously consider simplifying some templates and using plain wiki syntax rather than Lua. Steinbach (talk) 19:28, 28 January 2020 (UTC)[reply]

This is not new, and has been much discussed for the past two to three years at least. See also [1]. Canonicalization (talk) 21:09, 28 January 2020 (UTC)[reply]

Bad categories like Category:Chinese data modulescmn-hom

@Erutuon Recently, a bunch of bad categories like Category:Chinese data modulescmn-hom, Category:Chinese data modulesglosses, Category:Chinese data moduleshak-pron/00, etc. have appeared in Special:WantedCategories. Each one is populated by a single module, e.g. Module:zh/data/cmn-hom for Category:Chinese data modulescmn-hom. I don't see any code in these modules or module documentation pages that adds these categories, so I'm baffled as to how these categories are getting added. Is this somehow a bug in the MediaWiki software itself? Benwing2 (talk) 06:26, 29 January 2020 (UTC)[reply]

@Benwing2: Oh, sorry, this was caused by my recent edits to Module:documentation. It is invoked by {{documentation}} on MediaWiki:Scribunto-doc-page-show, and automatically adds categories or documentation or both to certain modules using the module_regex table. I'd messed up the regex and cat fields for some of the Chinese data modules. I've fixed the problem and added documentation about the cause. Thanks for catching this. — Eru·tuon 08:19, 29 January 2020 (UTC)[reply]

@Erutuon Thanks for taking care of it. Benwing2 (talk) 01:34, 30 January 2020 (UTC)[reply]

Search engine for Template:IPA on Toolforge

I've made a search engine for {{IPA}} at Templatehoard/IPA, written in Rust. This has been my goal for a while. Searching is very fast and User:Surjection and I have already used the tool to find some entries that needed cleanup. For instance, I searched for instances of y in English transcriptions and corrected the ones in which j was intended.

The search engine uses CBOR stream template dumps generated from the latest dump, which comes out after the 1st and 20th of each month. The template dumps might be useful for bot owners; searching them is many times faster than using Pywikibot or another tool to download pages from the MediaWiki API and parse them, or even than using the XML dumps. If anybody wants it, I can generate dump files in JSONL format as well, because that is more widely supported.

User:Jberkel's wanted entry lists are currently generated from these template dumps.

I'm thinking of making a link template search engine, and perhaps a general template search engine. Link templates are fairly straightforward because there are definite "slots" that are used by many templates ("term", "alt", "id", "tr", "g"), but I'm not really sure how to make a search engine that would work for all templates because our templates have so many different configurations of parameters. — Eru·tuon 09:16, 29 January 2020 (UTC)[reply]

Neat! Only downside is that the autogenerated transcriptions ({{it-IPA}} etc.) are not indexed. But it's very helpful as a starting point to find manual transcriptions which can be converted into automatic ones. – Jberkel 09:54, 29 January 2020 (UTC)[reply]

Template:quote-wikipedia

Haven't checked, but I received this message: "there seems to be something wrong with the above-mentioned template. The first parameter for non-English quotations is not working. (see Vorabdruck) Could you please take a look at it and maybe fix it or, if not, ping someone else who can fix it. — Thanks in advance, Caligari ƆɐƀïиϠႵ 18:53, 30 January 2020 (UTC)" Equinox ◑ 21:16, 30 January 2020 (UTC)[reply]

Template: SI-unit

The template does not work correctly for liters/litres. Here's what it returns for deciliter/decilitre:

(metrology) An SI unit of volume equal to 10⁻¹ liters. Symbol: d
(metrology) An SI unit of volume equal to 10⁻¹ litres. Symbol: d

Should return "dl" if I'm not completely ignorant. Seems to work fine for other units, meter for example:

(metrology) An SI unit of length equal to 10⁻¹ meters. Symbol: dm

--Hekaheka (talk) 08:53, 31 January 2020 (UTC)[reply]

Commas in mnc-IPA

I feel like the backend module for {{mnc-IPA}} has to be fixed because of what happens if the template were added to the Manchu entry ᠵᠠᡵᡭᡡ (jarh'ū). --Apisite (talk) 10:40, 31 January 2020 (UTC)[reply]

It's happening at ᡬᠠᠨ (g'an) as well. For some reason, a stray apostrophe is being added to the IPA transcription. It doesn't seem to be representing either stress or an ejective consonant, so I don't know what it's doing, or how to fix Module:mnc-IPA. The module was created by Wyang, who is unfortunately no longer around. —Mahāgaja · talk 08:00, 1 February 2020 (UTC)[reply]

{{mnc-IPA}} ultimately (via the toIPA function in Module:mnc-IPA) works off the transliteration generated by Module:mnc-translit. I notice that some of the Manchu letters transliterated with a letter and apostrophe are said to be used in foreign words; maybe there's some phonetic difference between the apostrophed and un-apostrophed letters or maybe the apostrophe should just be removed by Module:mnc-IPA. — Eru·tuon 08:14, 1 February 2020 (UTC)[reply]

@Erutuon: I suppose it makes sense for different letters to be transliterated differently, but as RcAlex36 says below, they're pronounced the same. In this edit, Wyang made the module ignore the characters U 180B, U 180C, and U 180D; can something similar be done to make it ignore U 0027? —Mahāgaja · talk 08:38, 1 February 2020 (UTC)[reply]

@Mahagaja: Yep; done. — Eru·tuon 08:46, 1 February 2020 (UTC)[reply]

@RcAlex36, do you have any thoughts? It started happening at ᡬᠠᠨ (g'an) when you moved it to that spelling from the previous ᡤ᠋ᠠᠨ (gan). —Mahāgaja · talk 08:03, 1 February 2020 (UTC)[reply]

@Mahagaja The consonant is indeed pronounced as [k], at least according to Manchu alphabet. I don't know how to fix the error though. RcAlex36 (talk) 08:21, 1 February 2020 (UTC)[reply]

Streamline

I'm trying to test a new script and a potential future gadget that converts some pre-definition (according to EL) headings into collapsible boxes to make the UI less cluttered and give the main focus to definitions. The script is available at User:Surjection/streamline.js, and also supports converting post-EL headings if explicitly enabled. Any feedback is welcome, particularly considerations on whether this should be converted into a fully functional gadget that would be available under the preferences. — sur jec tion ⟨?⟩ 16:14, 31 January 2020 (UTC)[reply]

Module Errors and Module:sandbox

Is there a reason we have this categorize module errors in Category:Pages with module errors instead of Category:Pages with module errors/hidden? Chuck Entz (talk) 23:44, 31 January 2020 (UTC)[reply]

@Chuck Entz: I've assigned Module:sandbox to Category:Pages with module errors/hidden because it's a sandbox and user sandbox modules (Module:User:...) with errors are already put there. But I think "production" modules should show up in Category:Pages with module errors so that people notice as quickly as possible, because errors in them often indicate that there will be errors in entries and category pages. — Eru·tuon 21:35, 5 February 2020 (UTC)[reply]