Combining characters U+035x are not supported?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Combining characters U+035x are not supported?

Yong-Jhen Hong-4
Happy new year, everyone!

I use Vim to edit an input method table these days, and find that
it doesn't work well with one combining character, Unicode U+0358.
Using g8 on "capital O with dot above right" shows "4f" only,
and the dot is another character ("cd 98").

I look from the source to see how Vim check if a character is a
combining
character, and find the following code:

<code>
/*
 * Return TRUE if "c" is a composing UTF-8 character.  This means it
will be
 * drawn on top of the preceding character.
 * Based on code from Markus Kuhn.
 */
    int
utf_iscomposing(c)
    int         c;
{
    /* sorted list of non-overlapping intervals */
    static struct interval combining[] =
    {
        {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
0x0489},
</code>

Is there a reason why characters from 0x0350 to 0x035f are skipped?
It looks like to me that characters in Unicode block
'Combining Diacritical Marks', range from 0x0300 to 0x036f,
should all be combing characters.

A small patch likes this works well to me,
using g8 on "capital O with dot above right" shows "4f + cd 98" now:

<patch>
--- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
+++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
@@ -1976,7 +1976,7 @@
     /* sorted list of non-overlapping intervals */
     static struct interval combining[] =
     {
-       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
0x0489},
+       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
        {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
0x05bf},
        {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
0x0658},
        {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
0x06e8},
</patch>

Regards,
Iông Chun

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Tony Mechelynck
On 02/01/10 04:47, Iông Chun wrote:

> Happy new year, everyone!
>
> I use Vim to edit an input method table these days, and find that
> it doesn't work well with one combining character, Unicode U+0358.
> Using g8 on "capital O with dot above right" shows "4f" only,
> and the dot is another character ("cd 98").
>
> I look from the source to see how Vim check if a character is a
> combining
> character, and find the following code:
>
> <code>
> /*
>   * Return TRUE if "c" is a composing UTF-8 character.  This means it
> will be
>   * drawn on top of the preceding character.
>   * Based on code from Markus Kuhn.
>   */
>      int
> utf_iscomposing(c)
>      int         c;
> {
>      /* sorted list of non-overlapping intervals */
>      static struct interval combining[] =
>      {
>          {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> 0x0489},
> </code>
>
> Is there a reason why characters from 0x0350 to 0x035f are skipped?
> It looks like to me that characters in Unicode block
> 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
> should all be combing characters.
>
> A small patch likes this works well to me,
> using g8 on "capital O with dot above right" shows "4f + cd 98" now:
>
> <patch>
> --- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
> +++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
> @@ -1976,7 +1976,7 @@
>       /* sorted list of non-overlapping intervals */
>       static struct interval combining[] =
>       {
> -       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> 0x0489},
> +       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
>          {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
> 0x05bf},
>          {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
> 0x0658},
>          {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
> 0x06e8},
> </patch>
>
> Regards,
> Iông Chun
>

According to the current version (5.2) of the Unicode Standard, all
codepoints U+0300 to U+036F are indeed combining characters AFAICT, see
http://www.unicode.org/charts/PDF/U0300.pdf

However, those in the range U+0350 to U+035F are particularly
"esoteric", and I believe that it is possible that they were added in a
relatively recent version of the Standard; previous versions (including,
maybe, the one which was current when that module was written) would
then have these codepoints "undefined".

BTW, I notice in a comment at lines 29-30 of that same module:

>  *    To make things complicated, up to two composing characters
>  *    are allowed.  These are drawn on top of the first char.

This is now only true with the default settings. The 'maxcombine' option
was added (relatively recently) to allow displaying (if the user sets a
non-default value) up to 6 combining characters on top of each spacing
character; even more than that can be "edited but not displayed".
Shouldn't that comment be updated?


Best regards,
Tony.
--
Children are unpredictable.  You never know what inconsistency they're
going to catch you in next.
                -- Franklin P. Jones

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Yong-Jhen Hong-4
Hi Tony and list,

I understand now.
Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
are added in Unicode 4.0.
Characters ranged from U+0358 (which I used) to U+035C are added even
later in Unicode 4.1.
Check this from this docuent: http://www.unicode.org/Public/UNIDATA/DerivedAge.txt

I use Vim version 7.2.323, its combining character table seems contain
most of Unicode 4.0,
but without U+0350 to U+0357 and U+035D to U+035F.

I will make a patch to add those additional combining characters,
according to Unicode 5.2.

Iông Chun

On 1月2日, 下午1時25分, Tony Mechelynck <[hidden email]>
wrote:

> On 02/01/10 04:47, I ng Chun wrote:
>
>
>
> > Happy new year, everyone!
>
> > I use Vim to edit an input method table these days, and find that
> > it doesn't work well with one combining character, Unicode U+0358.
> > Using g8 on "capital O with dot above right" shows "4f" only,
> > and the dot is another character ("cd 98").
>
> > I look from the source to see how Vim check if a character is a
> > combining
> > character, and find the following code:
>
> > <code>
> > /*
> >   * Return TRUE if "c" is a composing UTF-8 character.  This means it
> > will be
> >   * drawn on top of the preceding character.
> >   * Based on code from Markus Kuhn.
> >   */
> >      int
> > utf_iscomposing(c)
> >      int         c;
> > {
> >      /* sorted list of non-overlapping intervals */
> >      static struct interval combining[] =
> >      {
> >          {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> > 0x0489},
> > </code>
>
> > Is there a reason why characters from 0x0350 to 0x035f are skipped?
> > It looks like to me that characters in Unicode block
> > 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
> > should all be combing characters.
>
> > A small patch likes this works well to me,
> > using g8 on "capital O with dot above right" shows "4f + cd 98" now:
>
> > <patch>
> > --- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
> > +++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
> > @@ -1976,7 +1976,7 @@
> >       /* sorted list of non-overlapping intervals */
> >       static struct interval combining[] =
> >       {
> > -       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> > 0x0489},
> > +       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
> >          {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
> > 0x05bf},
> >          {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
> > 0x0658},
> >          {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
> > 0x06e8},
> > </patch>
>
> > Regards,
> > I ng Chun
>
> According to the current version (5.2) of the Unicode Standard, all
> codepoints U+0300 to U+036F are indeed combining characters AFAICT, seehttp://www.unicode.org/charts/PDF/U0300.pdf
>
> However, those in the range U+0350 to U+035F are particularly
> "esoteric", and I believe that it is possible that they were added in a
> relatively recent version of the Standard; previous versions (including,
> maybe, the one which was current when that module was written) would
> then have these codepoints "undefined".
>
> BTW, I notice in a comment at lines 29-30 of that same module:
>
> >  *             To make things complicated, up to two composing characters
> >  *             are allowed.  These are drawn on top of the first char.
>
> This is now only true with the default settings. The 'maxcombine' option
> was added (relatively recently) to allow displaying (if the user sets a
> non-default value) up to 6 combining characters on top of each spacing
> character; even more than that can be "edited but not displayed".
> Shouldn't that comment be updated?
>
> Best regards,
> Tony.
> --
> Children are unpredictable.  You never know what inconsistency they're
> going to catch you in next.
>                 -- Franklin P. Jones

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Yong-Jhen Hong-4
Hi all,

This is a patch to update combining character table from Unicode 5.2.
I add these by eyes and hands, so there might be errors ;)

<patch>
--- work/vim72/src/mbyte.c.bak 2010-01-02 13:33:40.000000000 +0800
+++ work/vim72/src/mbyte.c 2010-01-02 18:33:49.000000000 +0800
@@ -1976,35 +1976,64 @@
     /* sorted list of non-overlapping intervals */
     static struct interval combining[] =
     {
- {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
0x0489},
- {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
0x05bf},
- {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
0x0658},
- {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
0x06e8},
- {0x06ea, 0x06ed}, {0x0711, 0x0711}, {0x0730, 0x074a}, {0x07a6,
0x07b0},
- {0x0901, 0x0903}, {0x093c, 0x093c}, {0x093e, 0x094d}, {0x0951,
0x0954},
+ {0x0300, 0x036f},
+ {0x0483, 0x0487}, {0x0488, 0x0489},
+ {0x0591, 0x05bd}, {0x05bf, 0x05bf}, {0x05c1, 0x05c2}, {0x05c4,
0x05c5},
+ {0x05c7, 0x05c7},
+ {0x0610, 0x061a}, {0x064b, 0x065e}, {0x0670, 0x0670}, {0x06d6,
0x06dc},
+ {0x06de, 0x06e4}, {0x06e7, 0x06e8}, {0x06ea, 0x06ed},
+ {0x0711, 0x0711}, {0x0730, 0x074a}, {0x07a6, 0x07b0}, {0x07eb,
0x07f3},
+ {0x0816, 0x0819}, {0x081b, 0x0823}, {0x0825, 0x0827}, {0x0829,
0x082d},
+ {0x0900, 0x0903}, {0x093c, 0x093c}, {0x093e, 0x094e}, {0x0951,
0x0955},
  {0x0962, 0x0963}, {0x0981, 0x0983}, {0x09bc, 0x09bc}, {0x09be,
0x09c4},
  {0x09c7, 0x09c8}, {0x09cb, 0x09cd}, {0x09d7, 0x09d7}, {0x09e2,
0x09e3},
  {0x0a01, 0x0a03}, {0x0a3c, 0x0a3c}, {0x0a3e, 0x0a42}, {0x0a47,
0x0a48},
- {0x0a4b, 0x0a4d}, {0x0a70, 0x0a71}, {0x0a81, 0x0a83}, {0x0abc,
0x0abc},
- {0x0abe, 0x0ac5}, {0x0ac7, 0x0ac9}, {0x0acb, 0x0acd}, {0x0ae2,
0x0ae3},
- {0x0b01, 0x0b03}, {0x0b3c, 0x0b3c}, {0x0b3e, 0x0b43}, {0x0b47,
0x0b48},
- {0x0b4b, 0x0b4d}, {0x0b56, 0x0b57}, {0x0b82, 0x0b82}, {0x0bbe,
0x0bc2},
- {0x0bc6, 0x0bc8}, {0x0bca, 0x0bcd}, {0x0bd7, 0x0bd7}, {0x0c01,
0x0c03},
- {0x0c3e, 0x0c44}, {0x0c46, 0x0c48}, {0x0c4a, 0x0c4d}, {0x0c55,
0x0c56},
- {0x0c82, 0x0c83}, {0x0cbc, 0x0cbc}, {0x0cbe, 0x0cc4}, {0x0cc6,
0x0cc8},
- {0x0cca, 0x0ccd}, {0x0cd5, 0x0cd6}, {0x0d02, 0x0d03}, {0x0d3e,
0x0d43},
- {0x0d46, 0x0d48}, {0x0d4a, 0x0d4d}, {0x0d57, 0x0d57}, {0x0d82,
0x0d83},
- {0x0dca, 0x0dca}, {0x0dcf, 0x0dd4}, {0x0dd6, 0x0dd6}, {0x0dd8,
0x0ddf},
- {0x0df2, 0x0df3}, {0x0e31, 0x0e31}, {0x0e34, 0x0e3a}, {0x0e47,
0x0e4e},
- {0x0eb1, 0x0eb1}, {0x0eb4, 0x0eb9}, {0x0ebb, 0x0ebc}, {0x0ec8,
0x0ecd},
+ {0x0a4b, 0x0a4d}, {0x0a51, 0x0a51}, {0x0a70, 0x0a71}, {0x0a75,
0x0a75},
+ {0x0a81, 0x0a83}, {0x0abc, 0x0abc}, {0x0abe, 0x0ac5}, {0x0ac7,
0x0ac9},
+ {0x0acb, 0x0acd}, {0x0ae2, 0x0ae3},
+ {0x0b01, 0x0b03}, {0x0b3c, 0x0b3c}, {0x0b3e, 0x0b44}, {0x0b47,
0x0b48},
+ {0x0b4b, 0x0b4d}, {0x0b56, 0x0b57}, {0x0b62, 0x0b63}, {0x0b82,
0x0b82},
+ {0x0bbe, 0x0bc2}, {0x0bc6, 0x0bc8}, {0x0bca, 0x0bcd}, {0x0bd7,
0x0bd7},
+ {0x0c01, 0x0c03}, {0x0c3e, 0x0c44}, {0x0c46, 0x0c48}, {0x0c4a,
0x0c4d},
+ {0x0c55, 0x0c56}, {0x0c62, 0x0c63}, {0x0c82, 0x0c83}, {0x0cbc,
0x0cbc},
+ {0x0cbe, 0x0cc4}, {0x0cc6, 0x0cc8}, {0x0cca, 0x0ccd}, {0x0cd5,
0x0cd6},
+ {0x0ce2, 0x0ce3},
+ {0x0d02, 0x0d03}, {0x0d3e, 0x0d43}, {0x0d46, 0x0d48}, {0x0d4a,
0x0d4d},
+ {0x0d57, 0x0d57}, {0x0d82, 0x0d83}, {0x0dca, 0x0dca}, {0x0dcf,
0x0dd4},
+ {0x0dd6, 0x0dd6}, {0x0dd8, 0x0ddf}, {0x0df2, 0x0df3},
+ {0x0e31, 0x0e31}, {0x0e34, 0x0e3a}, {0x0e47, 0x0e4e}, {0x0eb1,
0x0eb1},
+ {0x0eb4, 0x0eb9}, {0x0ebb, 0x0ebc}, {0x0ec8, 0x0ecd},
  {0x0f18, 0x0f19}, {0x0f35, 0x0f35}, {0x0f37, 0x0f37}, {0x0f39,
0x0f39},
  {0x0f3e, 0x0f3f}, {0x0f71, 0x0f84}, {0x0f86, 0x0f87}, {0x0f90,
0x0f97},
- {0x0f99, 0x0fbc}, {0x0fc6, 0x0fc6}, {0x102c, 0x1032}, {0x1036,
0x1039},
- {0x1056, 0x1059}, {0x1712, 0x1714}, {0x1732, 0x1734}, {0x1752,
0x1753},
- {0x1772, 0x1773}, {0x17b6, 0x17d3}, {0x17dd, 0x17dd}, {0x180b,
0x180d},
- {0x18a9, 0x18a9}, {0x1920, 0x192b}, {0x1930, 0x193b}, {0x20d0,
0x20ea},
- {0x302a, 0x302f}, {0x3099, 0x309a}, {0xfb1e, 0xfb1e}, {0xfe00,
0xfe0f},
- {0xfe20, 0xfe23},
+ {0x0f99, 0x0fbc}, {0x0fc6, 0x0fc6},
+ {0x102b, 0x103e}, {0x1056, 0x1059}, {0x105e, 0x1060}, {0x1062,
0x1064},
+ {0x1067, 0x106d}, {0x1071, 0x1074}, {0x1082, 0x108d}, {0x108f,
0x108f},
+ {0x109a, 0x109d},
+ {0x135f, 0x135f},
+ {0x1712, 0x1714}, {0x1732, 0x1734}, {0x1752, 0x1753}, {0x1772,
0x1773},
+ {0x17b6, 0x17d3}, {0x17dd, 0x17dd},
+ {0x180b, 0x180d}, {0x18a9, 0x18a9},
+ {0x1920, 0x192b}, {0x1930, 0x193b}, {0x19b0, 0x19c0}, {0x19c8,
0x19c9},
+ {0x1a17, 0x1a1b}, {0x1a55, 0x1a5e}, {0x1a60, 0x1a7c}, {0x1a7f,
0x1a7f},
+ {0x1b00, 0x1b04}, {0x1b34, 0x1b44}, {0x1b6b, 0x1b73}, {0x1b80,
0x1b82},
+ {0x1ba1, 0x1baa},
+ {0x1c24, 0x1c37}, {0x1cd0, 0x1cd2}, {0x1cd4, 0x1ce8}, {0x1ced,
0x1ced},
+ {0x1cf2, 0x1cf2},
+ {0x1dc0, 0x1de6}, {0x1dfd, 0x1dff},
+ {0x20d0, 0x20f0},
+ {0x2cef, 0x2cf1},
+ {0x2de0, 0x2dff},
+ {0x302a, 0x302f}, {0x3099, 0x309a},
+ {0xa66f, 0xa672}, {0xa67c, 0xa67d}, {0xa6f0, 0xa6f1},
+ {0xa802, 0xa802}, {0xa806, 0xa806}, {0xa80b, 0xa80b}, {0xa823,
0xa827},
+ {0xa880, 0xa881}, {0xa8b4, 0xa8c4}, {0xa8e0, 0xa8f1},
+ {0xa926, 0xa92d}, {0xa947, 0xa953}, {0xa980, 0xa983}, {0xa9b3,
0xa9c0},
+ {0xaa29, 0xaa36}, {0xaa43, 0xaa43}, {0xaa4c, 0xaa4d}, {0xaa7b,
0xaa7b},
+ {0xaab0, 0xaab0}, {0xaab2, 0xaab4}, {0xaab7, 0xaab8}, {0xaabe,
0xaabf},
+ {0xaac1, 0xaac1},
+ {0xabe3, 0xabea}, {0xabec, 0xabed},
+ {0xfb1e, 0xfb1e},
+ {0xfe00, 0xfe0f}, {0xfe20, 0xfe26},
     };

     return intable(combining, sizeof(combining), c);
</patch>


Regards,
Iông Chun

On 1月2日, 下午4時30分, Iông Chun <[hidden email]> wrote:

> Hi Tony and list,
>
> I understand now.
> Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
> are added in Unicode 4.0.
> Characters ranged from U+0358 (which I used) to U+035C are added even
> later in Unicode 4.1.
> Check this from this docuent:http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
>
> I use Vim version 7.2.323, its combining character table seems contain
> most of Unicode 4.0,
> but without U+0350 to U+0357 and U+035D to U+035F.
>
> I will make a patch to add those additional combining characters,
> according to Unicode 5.2.
>
> Iông Chun
>
> On 1月2日, 下午1時25分, Tony Mechelynck <[hidden email]>
> wrote:
>
> > On 02/01/10 04:47, I ng Chun wrote:
>
> > > Happy new year, everyone!
>
> > > I use Vim to edit an input method table these days, and find that
> > > it doesn't work well with one combining character, Unicode U+0358.
> > > Using g8 on "capital O with dot above right" shows "4f" only,
> > > and the dot is another character ("cd 98").
>
> > > I look from the source to see how Vim check if a character is a
> > > combining
> > > character, and find the following code:
>
> > > <code>
> > > /*
> > >   * Return TRUE if "c" is a composing UTF-8 character.  This means it
> > > will be
> > >   * drawn on top of the preceding character.
> > >   * Based on code from Markus Kuhn.
> > >   */
> > >      int
> > > utf_iscomposing(c)
> > >      int         c;
> > > {
> > >      /* sorted list of non-overlapping intervals */
> > >      static struct interval combining[] =
> > >      {
> > >          {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> > > 0x0489},
> > > </code>
>
> > > Is there a reason why characters from 0x0350 to 0x035f are skipped?
> > > It looks like to me that characters in Unicode block
> > > 'Combining Diacritical Marks', range from 0x0300 to 0x036f,
> > > should all be combing characters.
>
> > > A small patch likes this works well to me,
> > > using g8 on "capital O with dot above right" shows "4f + cd 98" now:
>
> > > <patch>
> > > --- work/vim72/src/mbyte.c~     2010-01-02 10:18:01.000000000 +0800
> > > +++ work/vim72/src/mbyte.c      2010-01-02 11:19:24.000000000 +0800
> > > @@ -1976,7 +1976,7 @@
> > >       /* sorted list of non-overlapping intervals */
> > >       static struct interval combining[] =
> > >       {
> > > -       {0x0300, 0x034f}, {0x0360, 0x036f}, {0x0483, 0x0486}, {0x0488,
> > > 0x0489},
> > > +       {0x0300, 0x036f}, {0x0483, 0x0486}, {0x0488, 0x0489},
> > >          {0x0591, 0x05a1}, {0x05a3, 0x05b9}, {0x05bb, 0x05bd}, {0x05bf,
> > > 0x05bf},
> > >          {0x05c1, 0x05c2}, {0x05c4, 0x05c4}, {0x0610, 0x0615}, {0x064b,
> > > 0x0658},
> > >          {0x0670, 0x0670}, {0x06d6, 0x06dc}, {0x06de, 0x06e4}, {0x06e7,
> > > 0x06e8},
> > > </patch>
>
> > > Regards,
> > > I ng Chun
>
> > According to the current version (5.2) of the Unicode Standard, all
> > codepoints U+0300 to U+036F are indeed combining characters AFAICT, seehttp://www.unicode.org/charts/PDF/U0300.pdf
>
> > However, those in the range U+0350 to U+035F are particularly
> > "esoteric", and I believe that it is possible that they were added in a
> > relatively recent version of the Standard; previous versions (including,
> > maybe, the one which was current when that module was written) would
> > then have these codepoints "undefined".
>
> > BTW, I notice in a comment at lines 29-30 of that same module:
>
> > >  *             To make things complicated, up to two composing characters
> > >  *             are allowed.  These are drawn on top of the first char.
>
> > This is now only true with the default settings. The 'maxcombine' option
> > was added (relatively recently) to allow displaying (if the user sets a
> > non-default value) up to 6 combining characters on top of each spacing
> > character; even more than that can be "edited but not displayed".
> > Shouldn't that comment be updated?
>
> > Best regards,
> > Tony.
> > --
> > Children are unpredictable.  You never know what inconsistency they're
> > going to catch you in next.
> >                 -- Franklin P. Jones

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Tony Mechelynck
In reply to this post by Yong-Jhen Hong-4
On 02/01/10 09:30, Iông Chun wrote:

> Hi Tony and list,
>
> I understand now.
> Characters ranged from U+0350 to U+0357, and from U+035D to U+035F,
> are added in Unicode 4.0.
> Characters ranged from U+0358 (which I used) to U+035C are added even
> later in Unicode 4.1.
> Check this from this docuent: http://www.unicode.org/Public/UNIDATA/DerivedAge.txt
>
> I use Vim version 7.2.323, its combining character table seems contain
> most of Unicode 4.0,
> but without U+0350 to U+0357 and U+035D to U+035F.
>
> I will make a patch to add those additional combining characters,
> according to Unicode 5.2.
>
> Iông Chun
>
I'm attaching an extract from the current UnicodeData.txt file where
I've extracted all codepoints with a nonzero Canonical_Combining_Class
(field 3, counting the first field [codepoint number] as field 0). I'm
*not* sure that this property coincides with the "combining character"
property in the Vim sense, but it's the best I've found. You can check
any discrepancies by means of
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
two fields are the codepoint number and name).

This was obtained by applying :redir to the output of

        silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p

meaning: print all lines containing, at the start of a line, three times
(zero or more non-semicolons plus one semicolon) not followed by (a zero
then a semicolon).


Best regards,
Tony.
--
"I do not know myself, and God forbid that I should."
                -- Johann Wolfgang von Goethe

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php

combining.txt (48K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Yong-Jhen Hong-4
Hi Tony,

On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:

> I'm attaching an extract from the current UnicodeData.txt file where
> I've extracted all codepoints with a nonzero Canonical_Combining_Class
> (field 3, counting the first field [codepoint number] as field 0). I'm
> *not* sure that this property coincides with the "combining character"
> property in the Vim sense, but it's the best I've found. You can check
> any discrepancies by means of
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
> two fields are the codepoint number and name).
>
> This was obtained by applying :redir to the output of
>
>     silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
>
> meaning: print all lines containing, at the start of a line, three
> times (zero or more non-semicolons plus one semicolon) not followed by
> (a zero then a semicolon).
>
>
> Best regards,
> Tony.
I should also make use of UnicodeData.txt, instead of looking into every
added code point,
and check the code charts ;)

About Canonical_Combining_Class, from the Standard version 5.2, D52,
item#2, I read:
<quote>
All characters with non-zero canonical combining class are combining charac-
ters, but the reverse is not the case: there are combining characters
with a zero
canonical combining class.
</quote>

and item#1:
<quote>
Combining characters consist of all characters with the General Category
val-
ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
Mark (Me).
</quote>

and D53:
<quote>
Nonspacing mark: A combining character with the General Category of
Nonspacing
Mark (Mn) or Enclosing Mark (Me).
</quote>

I don't know if Vim has different rule for display and semantic, in
checking of
combing characters. If no, I think the table could just contain those
nonspacing ones now.

I attach the list of those Mn and Me ones, without code points of value
larger than U+FFFF.

Regards,
Iông Chun

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php

combining3.txt (84K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Tony Mechelynck
On 02/01/10 15:47, Iông Chun wrote:

> Hi Tony,
>
> On 2010-01-02 07:01 ē-po͘, Tony Mechelynck wrote:
>> I'm attaching an extract from the current UnicodeData.txt file where
>> I've extracted all codepoints with a nonzero Canonical_Combining_Class
>> (field 3, counting the first field [codepoint number] as field 0). I'm
>> *not* sure that this property coincides with the "combining character"
>> property in the Vim sense, but it's the best I've found. You can check
>> any discrepancies by means of
>> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (where the first
>> two fields are the codepoint number and name).
>>
>> This was obtained by applying :redir to the output of
>>
>> silent %g/^\%([^;]*;\)\{3}\%(0;\)\@!/p
>>
>> meaning: print all lines containing, at the start of a line, three
>> times (zero or more non-semicolons plus one semicolon) not followed by
>> (a zero then a semicolon).
>>
>>
>> Best regards,
>> Tony.
>
> I should also make use of UnicodeData.txt, instead of looking into every
> added code point,
> and check the code charts ;)
>
> About Canonical_Combining_Class, from the Standard version 5.2, D52,
> item#2, I read:
> <quote>
> All characters with non-zero canonical combining class are combining
> charac-
> ters, but the reverse is not the case: there are combining characters
> with a zero
> canonical combining class.
> </quote>
>
> and item#1:
> <quote>
> Combining characters consist of all characters with the General Category
> val-
> ues of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing
> Mark (Me).
> </quote>
>
> and D53:
> <quote>
> Nonspacing mark: A combining character with the General Category of
> Nonspacing
> Mark (Mn) or Enclosing Mark (Me).
> </quote>
>
> I don't know if Vim has different rule for display and semantic, in
> checking of
> combing characters. If no, I think the table could just contain those
> nonspacing ones now.
>
> I attach the list of those Mn and Me ones, without code points of value
> larger than U+FFFF.
>
> Regards,
> Iông Chun
>

Why without codepoint values higher than U+FFFF? Nowadays gvim can
diplay them (which wasn't the case when I started studying Unicode with
gvim 6.x).


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
236. You start saving URL's in your digital watch.

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Yong-Jhen Hong-4
On 2010/01/03 00:24, Tony Mechelynck wrote:
> Why without codepoint values higher than U+FFFF? Nowadays gvim can
> diplay them (which wasn't the case when I started studying Unicode
> with gvim 6.x).
>
>
> Best regards,
> Tony.

Because:
<code>
struct interval
{
     unsigned short first;
     unsigned short last;
};
</code>
;)

I guess the type can be "int" instead of "unsigned short" now.
The patch with all Mn and Me character ranges is attached.

Regards,
Iông Chun

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php

vim-patch-combchars (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Tony Mechelynck
On 03/01/10 03:54, Iông Chun wrote:

> On 2010/01/03 00:24, Tony Mechelynck wrote:
>> Why without codepoint values higher than U+FFFF? Nowadays gvim can
>> diplay them (which wasn't the case when I started studying Unicode
>> with gvim 6.x).
>>
>>
>> Best regards,
>> Tony.
>
> Because:
> <code>
> struct interval
> {
> unsigned short first;
> unsigned short last;
> };
> </code>
> ;)
>
> I guess the type can be "int" instead of "unsigned short" now.
> The patch with all Mn and Me character ranges is attached.
>
> Regards,
> Iông Chun
>

I see. I suspect other size changes may have to be done then, not only
where the structure is defined but possibly where it is used. I hope
Bram is following this whole thread.

Best regards,
Tony.
--
"A Mormon is a man that has the bad taste and the religion to do what a
good many other people are restrained from doing by conscientious
scruples and the police."
                -- Mr. Dooley

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Bram Moolenaar

Tony Mechelynck wrote:

> On 03/01/10 03:54, Iông Chun wrote:
> > On 2010/01/03 00:24, Tony Mechelynck wrote:
> >> Why without codepoint values higher than U+FFFF? Nowadays gvim can
> >> diplay them (which wasn't the case when I started studying Unicode
> >> with gvim 6.x).
> >>
> >>
> >> Best regards,
> >> Tony.
> >
> > Because:
> > <code>
> > struct interval
> > {
> > unsigned short first;
> > unsigned short last;
> > };
> > </code>
> > ;)
> >
> > I guess the type can be "int" instead of "unsigned short" now.
> > The patch with all Mn and Me character ranges is attached.
> >
> > Regards,
> > Iông Chun
> >
>
> I see. I suspect other size changes may have to be done then, not only
> where the structure is defined but possibly where it is used. I hope
> Bram is following this whole thread.

There is a script to generate these tables from the Unicode table.
I think Markus Kuhn had this.  But it should be easy to reproduce with
Vim script.

Changing all these tables from short to int makes the memory use higher.
But adding code to handle two tables won't be much smaller.

--
hundred-and-one symptoms of being an internet addict:
77. The phone company asks you to test drive their new PBX system

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Combining characters U+035x are not supported?

Tony Mechelynck
On 04/01/10 20:17, Bram Moolenaar wrote:
[...]
>
> There is a script to generate these tables from the Unicode table.
> I think Markus Kuhn had this.  But it should be easy to reproduce with
> Vim script.
>
[...]

Yes indeed: this Unidata.txt file is meant to be machine-readable, and
with the power of Vim regexps at our disposal, extracting the needed
data should be a breeze.


Best regards,
Tony.
--
Her locks an ancient lady gave
Her loving husband's life to save;
And men -- they honored so the dame --
Upon some stars bestowed her name.

But to our modern married fair,
Who'd give their lords to save their hair,
No stellar recognition's given.
There are not stars enough in heaven.

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php