Talk:CJK Unified Ideographs
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||
|
CJK Extension F
editRecently, there have been removals and insertions re Extension_F that caused discussion. Below, I will copy the content of my Talkpage discussion (edited to clarify who wrote the single post). Edit history: remove by Babelstone for CRYSTALBAL, new text by Johnkn63, removal by me, DePiep.
This is the text from my Talkpage, by quote:[1]
CJK Extension F
J63: I was wondering why you undid my edit regarding Ext F on CJK characters. The statement made was non speculative, it is not crystal ball, is is a fact that the IRG has called for preliminary proposals for their next meeting. http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg37/IRGN1810Resolutions.doc . The call itself is a past event. Furthermore this is significant because the last time a call was made for any significant number of glyphs was a decade ago. Johnkn63 (talk) 08:41, 8 February 2012 (UTC)
- DP: What you wrote is that there will be a meeting in the future, for which IRG members are asked for proposals for possible extensions. So what do we have now? Nothing substantial.
- J63: This is substantial it means that work on Ext F is starting. Check the IRG Principles and Procedures document. Item 3.1 the step is for the IRG to call for submissions to a Extension.Johnkn63 (talk) 05:46, 10 February 2012 (UTC)
- DP: What substance (say Ext F related nouns or facts for starters) do you get from an agendized meeting? And really what of that should be in WP?
- DP: You addition was not sourced. Actually, before deleting I searched for any IRG#38 paper to see what would be in there, but could not find one. The link you provide here (which is #37) mentions #38 as a scheduled meeting, when amongst other things "IRG sets the CJK_E submission (to WG2) date". That is E, not F. So this is all in process at best, but really F is just mentioned to get in process later on. Not a single fact about Extension F is produced yet. F is not near the beginning of the pipeline.
- J63: I stated clearly the resolution number 'IRG resolution M37.12', the very first google for this is the correct online word document IRGN1810Resolutions.doc . Doing a find for M37.12 immediately takes one to the correct place. I could I suppose have added in RGN1810Resolutions.doc, but from what you say you found the document but did not read as far as resolution 12 . Johnkn63 (talk) 05:46, 10 February 2012 (UTC)
- DP: Sourcing does not mean "you go look for yourself with some off hints". That I did is was a gesture from me, you cannot claim I did wrong. And then, even today there is no source except for the agenda you point at. Where there usable content for Ext F, I could have find it. I did not find it, because it does not exist.
- DP: Over at Unicode [2], Extension F is not mentioned, not even with the biggest reservation.
- DP: If the process (for E and F) itself is notable in the encyclopedia, it may be described on the Ideographic Rapporteur Group page. After all it is their workings. -DePiep (talk) 09:17, 8 February 2012 (UTC)
- J63: There is a large section on the page about Ext E you have left that untouched.
- DP: WP:OTHERSTUFFEXISTS. Is it sourced? Has it substance? Are there facts? Is it in the Unicode pipeline?
- J63: An earlier comment about Ext F which was speculative was removed, I then put in a non speculative comment on Ext F. The primary task of the IRG is to prepared the CJK extensions, it is the IRG that does the unification of [CJK Unified Ideographs], processing an extension usually takes them about five years, reporting when these stages are reached is reasonable. Since after being processed by the IRG it goes to WG2 for a year or two and then onto SC2, documenting this process under CJK Unified Ideographs makes sense . It is only once things get to SC2 that they can not be changed. Johnkn63 (talk) 05:46, 10 February 2012 (UTC)
- DP: Less speculative, maybe. For now, all we have is an agenda, no substance and no Unicode mentioning.
- DP: As I wrote, this process could be described in IRG. Or maybe on Han unification. Or in page Unicode pipeline. And I repeat that, to me, the pipeline processes can very well be a page or section in WP, especially the Han unifying process pipeline. The process is encyclopedic and interesting, but these current results are not in Unicode. -DePiep (talk) 08:39, 10 February 2012 (UTC)
- BabelStone: I agree that Wikipedia should not be describing stuff that has not happened yet. As an encyclopedia we should describe what Unicode is now, not what it may include in the future. Personally I would even remove the CJK-E stuff from the CJK Unified Ideographs page, although I woAuld support adding brief, sourced information about current state of CJK-E and CJK-F to the Ideographic Rapporteur Group page. And really this discussion should be taking place on the article talk page, not on a user talk page. BabelStone (talk) 13:50, 10 February 2012 (UTC)
- My only strong feeling is that as they stand both CJK-E and CJK-F are in the pipeline - albeit at near rather different ends, therefore any comments should be in the same place, that these comments should be both brief, factual and stable. Brief is relative to an articles length, in a long article brief may be a sentence or two. Being factual can often requires dates, they should be phrased in such a way that they they do not become inaccurate over time. 'At IRG #nn in yyyy submissions for Ext X with a total of ... characters where accepted' is something that is certainly factual and stable , whereas 'The project Ext X currently has ..... characters' is something which whilst 'factual' at the time of writing is not stable and once it changes becomes inaccurate. What is brief really depends on the length and scope of an article. As it stands the Ideographic Rapporteur Group page is a stub, a lot more work would be required there. In the past the inclusion of the current Ext E has required a lot of editing because comments where not in a stable form - it was earlier called Ext D, but then a new small block called Ext E was added and the page required changing. Before that the present Ext E character were part of Ext C, though what this page was like back then I do not know. Johnkn63 (talk) 23:31, 10 February 2012 (UTC)
- As this discussion shows there is an issue to be addressed. And there has been a lot of progress on that, which is good. I hope we all continue to edit and do so increasing well. Johnkn63 (talk) 23:31, 10 February 2012 (UTC)
- Regarding Ext E - whilst the content is somewhat better than the Ext F material removed by BabelStone this is accidental, it is not to written in a way that ensures that the content is now correct, or even easy to verify. For example the list of sources comes from IRGN1266, which is from 2006 and lists the sources used then, a large number of characters have been removed since then. In some cases all of the characters for a source may well have been removed, and which from memory I think is the case with several of the Korean sources. The Ext E material therefore has the same problems as the Ext F material removed by BabelStone namely much is speculative. The non speculative parts belong in an article that talks about the background and history of such CJK blocks. I could live with a brief one, or two sentence statement about Ext E on this page, or elsewhere and similarly a sentence on Ext F. If not on this page then where such information be found should be stated on this page. In short Ext E sources, must go. Ext E and Ext F brief stable comments only.Johnkn63 (talk) 00:25, 11 February 2012 (UTC)
Proper name?
editShouldn't it be just "CJK unified ideographs"? How is it a poper name? -DePiep (talk) 23:52, 6 December 2012 (UTC)
- The Unicode Standard capitalises the name. This is further seen in programs such as Microsoft Word (Insert -> Symbol -> (Asian text) -> CJK Unified Ideographs). -- 李博杰 | —Talk contribs email 00:57, 7 December 2012 (UTC)
Unicode 15.1 numbers
editI analyzed the the 24 January 2024 edit by @BabelStone: and it appears it mistakingly includes characters from the CJK Compatibility Ideographs Supplement block. I think these aren't "Unified" characters and therefore shouldn't be included in these numbers. I also think the total verbiage should read "The total number of characters (224,286) far exceeds the number of encoded CJK unified ideographs (97,680) as many characters have more than one source." I wanted to double-check before I do a revert in case I'm misunderstanding something. Thanks. DRMcCreedy (talk) 00:30, 6 August 2024 (UTC)
- Thanks for checking, yes I inadvertently included characters from the CJK Compatibility Ideographs Supplement block. I confirm that 224,286 is the correct figure. BabelStone (talk) 08:38, 6 August 2024 (UTC)
- No worries. Fixed. DRMcCreedy (talk) 15:29, 6 August 2024 (UTC)