TODO suggestion: Unicode codepoints above U+FFFF

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TODO suggestion: Unicode codepoints above U+FFFF

A.J.Mechelynck
I suggest adding a "todo" item to make gvim display Unicode codepoints above
U+FFFF as other than a question mark. Probably not with high priority (I
guess 3 to 5 would be adequate, unless CJK users prefer something higher).

Rationale: There are already some printable characters assigned to these
codepoints (including for instance some rare CJK characters). The way I see
it, sooner or later someone is going to make fonts for them (if it's not
already done), and sooner or later someone is going to use gvim to edit
files containing them. It can already be done, but it's not practical: gvim
displays only a question mark, and "ga" is required to ascertain which
character is actually there.

(I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
"/unicode\|utf" with ":set ignorecase smartcase" and got no relevant hits.)


Best regards,
Tony


Reply | Threaded
Open this post in threaded view
|

Re: TODO suggestion: Unicode codepoints above U+FFFF

Bram Moolenaar

Tony Mechelynck wrote:

> I suggest adding a "todo" item to make gvim display Unicode codepoints
> above U+FFFF as other than a question mark. Probably not with high
> priority (I guess 3 to 5 would be adequate, unless CJK users prefer
> something higher).
>
> Rationale: There are already some printable characters assigned to
> these codepoints (including for instance some rare CJK characters).
> The way I see it, sooner or later someone is going to make fonts for
> them (if it's not already done), and sooner or later someone is going
> to use gvim to edit files containing them. It can already be done, but
> it's not practical: gvim displays only a question mark, and "ga" is
> required to ascertain which character is actually there.
>
> (I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
> "/unicode\|utf" with ":set ignorecase smartcase" and got no relevant
> hits.)

Most of Vim can handle Unicode characters above 0xffff.  The code
already recognizes characters 0x20000 to 0x2fffd as double-width.  It's
the displaying code that has some trouble.  Esp. for Win32, since it
uses UTF-16, which is very clumsy.

The display code can be adjusted as soon as there is a font to try out
if it actually works.  It should already work for GTK 2 without any
changes.

I rather see that Microsoft supports UTF-8, but that probably won't
happen...

--
SOLDIER: Where did you get the coconuts?
ARTHUR:  Through ... We found them.
SOLDIER: Found them?  In Mercea.  The coconut's tropical!
                 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

Re: TODO suggestion: Unicode codepoints above U+FFFF

Mike Williams
Bram Moolenaar did utter on 28/07/2005 10:53:

> Tony Mechelynck wrote:
>
>
>>I suggest adding a "todo" item to make gvim display Unicode codepoints
>>above U+FFFF as other than a question mark. Probably not with high
>>priority (I guess 3 to 5 would be adequate, unless CJK users prefer
>>something higher).
>>
>>Rationale: There are already some printable characters assigned to
>>these codepoints (including for instance some rare CJK characters).
>>The way I see it, sooner or later someone is going to make fonts for
>>them (if it's not already done), and sooner or later someone is going
>>to use gvim to edit files containing them. It can already be done, but
>>it's not practical: gvim displays only a question mark, and "ga" is
>>required to ascertain which character is actually there.
>>
>>(I searched todo.txt dated 2005 Jul 25 for 7.00aa using the command
>>"/unicode\|utf" with ":set ignorecase smartcase" and got no relevant
>>hits.)
>
>
> Most of Vim can handle Unicode characters above 0xffff.  The code
> already recognizes characters 0x20000 to 0x2fffd as double-width.  It's
> the displaying code that has some trouble.  Esp. for Win32, since it
> uses UTF-16, which is very clumsy.
>
> The display code can be adjusted as soon as there is a font to try out
> if it actually works.  It should already work for GTK 2 without any
> changes.
>
> I rather see that Microsoft supports UTF-8, but that probably won't
> happen...

Has any tested with a UTF-16 surrogate pair?  If so and it failed you
most likely have to start rootling around the font file to find
alternate encoding maps, extract the glyph id and render using that
rather than the encoded character.  TT and/or OT fonts can support
encodings beyond the BMP.

TTFN

Mike
--
No matter how far you've gone down the wrong road, turn back.
Reply | Threaded
Open this post in threaded view
|

Re: TODO suggestion: Unicode codepoints above U+FFFF

A.J.Mechelynck
In reply to this post by Bram Moolenaar
----- Original Message -----
From: "Bram Moolenaar" <[hidden email]>
To: "Tony Mechelynck" <[hidden email]>
Cc: <[hidden email]>
Sent: Thursday, July 28, 2005 11:53 AM
Subject: Re: TODO suggestion: Unicode codepoints above U+FFFF
[...]

> Most of Vim can handle Unicode characters above 0xffff.  The code
> already recognizes characters 0x20000 to 0x2fffd as double-width.  It's
> the displaying code that has some trouble.  Esp. for Win32, since it
> uses UTF-16, which is very clumsy.
>
> The display code can be adjusted as soon as there is a font to try out
> if it actually works.  It should already work for GTK 2 without any
> changes.

In the absence of a font, how hard would it be to display (without the
quotes) "<123456>" rather than the present "?" -- or maybe an option similar
to 'isprint' would be in order to force <hex> display of user-defined
character ranges (for which the user knows that he hasn't got proper
glyphs)? 'isnoprint' maybe, which would apply only to multibyte character
ranges above what 'isprint' handles?

Anyway, if _anyone_ on this list knows of a CJK font for Windows which
includes all CJK characters, also the "extensions" above and below the basic
range, please speak up. I have a UTF-8 file -- based on the Unicode
Consortium's Unihan.txt, which means I can use it privately but not
distribute it :-( -- with which to test it. At the moment I mainly use
MingLiU (a "Traditional Chinese" font) for Chinese, but it hasn't got the
CJK extensions.
>
> I rather see that Microsoft supports UTF-8, but that probably won't
> happen...

Some programs, like WordPad or NT-series Notepad, can read UTF-8 files if
they have a BOM. Notepad (but not the 9x version) can even write UTF-8 (with
BOM). (I don't know how they represent the data internally though. I suspect
UTF-16le.) Microsoft is the tortoise to Vim's <strike>hare</strike> jet
rocket, to be sure, but I think we shouldn't despair.


Best regards,
Tony.