Suggestion: Redefine \Uxxxxx in double-quoted strings

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestion: Redefine \Uxxxxx in double-quoted strings

Tony Mechelynck

Vim is now capable of displaying any Unicode codepoint for which the
installed 'guifont' has a glyph, even outside the BMP (i.e., even above
U+FFFF), but there's no easy way to represent those "high" codepoints by
Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
more than four hex digits.

I propose to keep "\uxxxx" at its present meaning, but extend
"\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
mode, or at least up to the value \U10FFFF, above which the Unicode
Consortium has decided that "there never shall be a valid Unicode
codepoint at any future time".

I'm aware that this is an "incompatible" change, but I believe the risk
is low compared with the advantages (as a sidenote, many rare CJK
characters lie in plane 2, in the "CJK Unified Extension B" range
U+20000-U+2A6DF).

The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
(in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing such a
string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
character 𠀀 (and, yes, I've set my mailer to send this post as UTF-8 so
if yours is "well-behaved" it should display that character properly).


Best regards,
Tony.
--
Although the moon is smaller than the earth, it is farther away.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

Bram Moolenaar


Tony Mechelynck wrote:

> Vim is now capable of displaying any Unicode codepoint for which the
> installed 'guifont' has a glyph, even outside the BMP (i.e., even above
> U+FFFF), but there's no easy way to represent those "high" codepoints by
> Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
> more than four hex digits.
>
> I propose to keep "\uxxxx" at its present meaning, but extend
> "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
> hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
> mode, or at least up to the value \U10FFFF, above which the Unicode
> Consortium has decided that "there never shall be a valid Unicode
> codepoint at any future time".
>
> I'm aware that this is an "incompatible" change, but I believe the risk
> is low compared with the advantages (as a sidenote, many rare CJK
> characters lie in plane 2, in the "CJK Unified Extension B" range
> U+20000-U+2A6DF).
>
> The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
> (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing such a
> string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
> character 𠀀 (and, yes, I've set my mailer to send this post as UTF-8 so
> if yours is "well-behaved" it should display that character properly).

It does cause problems for something like "\U12345" which would now be
the character 0x1234 followed by the character 5.  After the change it
would become one character 0x12345.

I don't see a convenient alternative though.  Anyone?

--
Even got a Datapoint 3600(?) with a DD50 connector instead of the
usual DB25...  what a nightmare trying to figure out the pinout
for *that* with no spex...

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

krbeesley
In reply to this post by Tony Mechelynck


On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:

>
> Vim is now capable of displaying any Unicode codepoint for which the
> installed 'guifont' has a glyph, even outside the BMP (i.e., even  
> above
> U+FFFF),

Tony,

Good news.

Many may not know that MacVim has been doing this rather well for  
quite a while.
I routinely edit texts in Deseret Alphabet and Shaw (Shavian)  
Alphabet, which lie in the
supplementary area.


> but there's no easy way to represent those "high" codepoints by
> Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no
> more than four hex digits.
>
> I propose to keep "\uxxxx" at its present meaning, but extend
> "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8
> hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert
> mode, or at least up to the value \U10FFFF,

Sounds good.

\Uxxxxxxxx is also the Python convention for representing  
supplementary characters in strings.
I think it requires exactly 8 hex digits, just as \uxxxx requires  
exactly four, but I'm willing to be
corrected.

The other reasonable convention is the Perl-like \x{x...}, (the prefix  
\x is literally backslash,
small X) which, being delimited with curly braces, can contain any  
number of hex digits
without confusing the tokenization.  But your proposal is more in line  
with what Vim has
already.

>
>
> I'm aware that this is an "incompatible" change, but I believe the  
> risk
> is low compared with the advantages

For what it's worth, I agree.

> The notation "\<Char-0x20000>" or "\<Char-131072>" doesn't work: here
> (in my GTK2/Gnome2 gvim with 'encoding' set to UTF-8), ":echo"ing  
> such a
> string displays <f0><a0><80><fe>X<80><fe>X instead of just the one CJK
> character 𠀀 (and, yes, I've set my mailer to send this post as  
> UTF-8 so
> if yours is "well-behaved" it should display that character properly).

In MacVim, at least, supplementary code point values can appear  
usefully in <Char- > in keymap files.
Entries like the following appear in my deseret-sampa_utf-8.vim keymap  
file.  It all works great.

"in       out                        comment
i    <Char-0x10428>      DESERET SMALL LETTER LONG I  (e.g. i in  
machine)
e  <Char-0x10429>       DESERET SMALL LETTER LONG E  (e.g. a in make)
A  <Char-0x1042A>      DESERET SMALL LETTER LONG A  (e.g. a in father)
O  <Char-0x1042B>      DESERET SMALL LETTER LONG AH  (e.g. a in call,  
au in caught, British/USEastCoastCity pronunciation)
o  <Char-0x1042C>       DESERET SMALL LETTER LONG O    (e.g. oa in boat)
u  <Char-0x1042D>       DESERET SMALL LETTER LONG OO  (e.g. oo in boot)

Thanks to all those developers who have toiled to handle Unicode in Vim.

Ken

******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA






--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

Tony Mechelynck

On 06/04/09 22:18, Kenneth Reid Beesley wrote:

>
>
> On 6 Apr 2009, at 12:22, Tony Mechelynck wrote:
>
>>
>> Vim is now capable of displaying any Unicode codepoint for which the
>> installed 'guifont' has a glyph, even outside the BMP (i.e., even
>> above
>> U+FFFF),
>
> Tony,
>
> Good news.
>
> Many may not know that MacVim has been doing this rather well for
> quite a while.
> I routinely edit texts in Deseret Alphabet and Shaw (Shavian)
> Alphabet, which lie in the
> supplementary area.
[...]

It's actually patch 7.1.116 (30-Nov-2007). So no news-breaking scoop
anymore, but as long as Vim's support of Unicode outside the BMP was
less than optimal, the problem I'm raising in this thread might have
made itself felt less acutely.


Best regards,
Tony.
--
Joe's sister puts spaghetti in her shoes!

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

Tony Mechelynck
In reply to this post by krbeesley

On 06/04/09 22:18, Kenneth Reid Beesley wrote:
[...]
> In MacVim, at least, supplementary code point values can appear
> usefully in<Char->  in keymap files.
> Entries like the following appear in my deseret-sampa_utf-8.vim keymap
> file.  It all works great.
[...]

In keymap files, it seems to work on Linux too (I use it in my owncoded
"phonetic" keymaps for Arabic and Russian); but I was talking of
double-quoted strings.

These Arabic and Russian keymaps aren't above U+FFFF but anywhere above
0x7F the <Char- > notation gives me problems inside double-quoted
strings. I believe this is related to the documented fact that "\xnn"
doesn't give valid UTF-8 values above 0x7F -- use "\u00nn" instead.


Best regards,
Tony.
--
If God is perfect, why did He create discontinuous functions?

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion: Redefine \Uxxxxx in double-quoted strings

John (Eljay) Love-Jensen
In reply to this post by Bram Moolenaar
Re: Suggestion: Redefine \Uxxxxx in double-quoted strings Hi Tony,

> I don't see a convenient alternative though.  Anyone?

/Uxxxx
/uxxxx
/U{x}
/U{xx}
/U{xxx}
/U{xxxx}
/U{xxxxx}
/U{xxxxxx}
/U{xxxxxxx}
/U{xxxxxxxx}

--Eljay

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---