Combining diacritical marks display as separate character

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Combining diacritical marks display as separate character

Sven Siegmund

Hello, I have just installed gVim 7.2 on Windows XP SP3 and have set
utf-8 as the default encoding and a good unicode monospace font
(DejaVu Sans Mono) as the guifont.

gVim 7.2 has problems rendering combining diacritical marks on
characters for which there is no dedicated unicode codepoint
containing them with that diacritics. I can imagine why that is.

When I try to type "n" and then the U+0302 combing circumflex "^" I
get "n^" displayed instead of "n̂" (n with a circumflex on it). I can
imagine why this happens: "n" with a combining "^" are technically two
characters, two unicode codepoints. Its just OpenType features and the
font renderer of the OS (in Windows it is Uniscribe) which don't let
them display adjacently but overlap them.

gVim does not use Uniscribe for rendering the font displayed. It is
more low-level. It has very rigid rules to display a given number of
characters/code-points per line and sticks to it. Hence it is forced
to display "n" with combined "^" as two separate characters.

But then I wonder how can you use gVim to write scripts where such
combining of unicode-codepoints or reordering of letters (like in the
devanagari script) or LRT-RTL changes happen. Is there a solution?

Thanks for your answers.
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

Tony Mechelynck

On 12/03/09 09:09, Sven Siegmund wrote:

> Hello, I have just installed gVim 7.2 on Windows XP SP3 and have set
> utf-8 as the default encoding and a good unicode monospace font
> (DejaVu Sans Mono) as the guifont.
>
> gVim 7.2 has problems rendering combining diacritical marks on
> characters for which there is no dedicated unicode codepoint
> containing them with that diacritics. I can imagine why that is.
>
> When I try to type "n" and then the U+0302 combing circumflex "^" I
> get "n^" displayed instead of "n̂" (n with a circumflex on it). I can
> imagine why this happens: "n" with a combining "^" are technically two
> characters, two unicode codepoints. Its just OpenType features and the
> font renderer of the OS (in Windows it is Uniscribe) which don't let
> them display adjacently but overlap them.
>
> gVim does not use Uniscribe for rendering the font displayed. It is
> more low-level. It has very rigid rules to display a given number of
> characters/code-points per line and sticks to it. Hence it is forced
> to display "n" with combined "^" as two separate characters.
>
> But then I wonder how can you use gVim to write scripts where such
> combining of unicode-codepoints or reordering of letters (like in the
> devanagari script) or LRT-RTL changes happen. Is there a solution?
>
> Thanks for your answers.

I don't have any problems with recent gvim versions (currently 7.2.141
but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.
-- Well, of course I can't reproduce your case exactly since I'm on
Linux. I'm currently typing a Russian dictionary with lots of combining
acute accents (U+0301), which Vim correctly displays over the preceding
spacing Cyrillic vowel. However IIRC even when I was on W98 with Windows
6.1 it could display combining characters correctly in Unicode, using a
"Courier New" font -- that's when I started my frontpage
http://users.skynet.be/antoine.mechelynck/ where you can see several
scripts on a single page, one of them vocalized Arabic. Since then,
Unicode rendering has gone progressively better, not worse, over the years.

Let me try n + U+0302 ... yep, I get the correct overprint, in my
default font, which happens to be "Bitstream Vera Sans Mono", very
similar to DejaVu IIUC.

Current versions of gvim can display (by default) two combining
characters on any spacing character, which is usually enough for Arabic,
even IIUC Coranic Arabic, but not always for fully cantillated Hebrew;
or (by a nondefault 'maxcombine' setting) up to 6 combining characters
over a single spacing character, which is usually more than you'd need.
But (IIUC) only if 'encoding' is set to UTF-8. You can set this even if
you don't tell Windows to use Unicode everywhere, provided that you set
it near the top of your vimrc. See
http://vim.wikia.com/wiki/Working_with_Unicode for details.

I'm not sure Vim does devanagari.

It can do Hebrew or Arabic but not with true bidi: what Vim does is give
you the option of displaying any window in either all RTL or all LTR.
You can even have the same file in split-windows, one of them LTR (with
English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
which is the default, but English wrong).


Which exact version and patchlevel of gvim are you using? You might want
to copy the first handful of lines from the output of ":version" (until
the line with "Features included (+) or not (-)") -- see ":help :redir"
about how to capture that kind of output. Also, when you type

        :echo has('multi_byte')

what answer do you get? If it's zero, you're in trouble.

Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
think you're in trouble -- cDEFAULT is usually better IMHO.


Best regards,
Tony.
--
        "Seven years and six months!"  Humpty Dumpty repeated
thoughtfully.  "An uncomfortable sort of age.  Now if you'd asked MY
advice, I'd have said `Leave off at seven' -- but it's too late now."
        "I never ask advice about growing,"  Alice said indignantly.
        "Too proud?" the other enquired.
        Alice felt even more indignant at this suggestion.  "I mean,"
she said, "that one can't help growing older."
        "ONE can't, perhaps," said Humpty Dumpty; "but TWO can.  With
proper assistance, you might have left off at seven."
                -- Lewis Carroll

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

Ron Aaron-2

On Mar 12, 11:53 am, Tony Mechelynck <[hidden email]>
wrote:
> I don't have any problems with recent gvim versions (currently 7.2.141
> but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.

I use it on Windows and Linux, and it works well on both.

> It can do Hebrew or Arabic but not with true bidi: what Vim does is give
> you the option of displaying any window in either all RTL or all LTR.
> You can even have the same file in split-windows, one of them LTR (with
> English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
> and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
> which is the default, but English wrong).

That is, in fact, what I regularly do.  I open a bilingual (English
and Hebrew) file, split the window, and have one be LTR and the other
RTL.  Then I use XeLaTex to produce really nice output :)

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

Sven Siegmund
In reply to this post by Tony Mechelynck

Hello, thanks for the details,

On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
<[hidden email]> wrote:
> Current versions of gvim can display (by default) two combining
> characters on any spacing character, which is usually enough for Arabic,

Yep, two combining marks are enough for me.

> Which exact version and patchlevel of gvim are you using? You might want
> to copy the first handful of lines from the output of ":version" (until
> the line with "Features included (+) or not (-)") -- see ":help :redir"
> about how to capture that kind of output. Also, when you type

VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug  9 2008 18:46:22)
MS-Windows 32-bit GUI version with OLE support
Compiled by Bram@KIBAALE
Big version with GUI.

>        :echo has('multi_byte')
1

> Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
> think you're in trouble -- cDEFAULT is usually better IMHO.

"unicode encoding:
set enc=utf-8

"set gui font
set guifont=DejaVu_Sans_Mono:h11:cDEFAULT

set nocompatible
source $VIMRUNTIME/vimrc_example.vim
...
...
...

I explored the problem further. There is something wrong with gvim
interpreting deadkeys of the Windows-Keyboard layout. I could not type
"n" with combined circumflex because I tried to map the combining
circumflex on a dead key of my windows keyboard layout. When I map the
combining circumflex to another key it works and it gets displayed
well in gvim.

I will explore the problems of remapping the dead keys of the windows
keyboard layout later. So far I could not google anything about this
issue in gvim in Windows.

S.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

krbeesley


On 12 Mar 2009, at 07:51, Sven Siegmund wrote:

>
> Hello, thanks for the details,
>
> On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
> <[hidden email]> wrote:
>> Current versions of gvim can display (by default) two combining
>> characters on any spacing character, which is usually enough for  
>> Arabic,
>
> Yep, two combining marks are enough for me.
>
>> Which exact version and patchlevel of gvim are you using? You might  
>> want
>> to copy the first handful of lines from the output of  
>> ":version" (until
>> the line with "Features included (+) or not (-)") -- see  
>> ":help :redir"
>> about how to capture that kind of output. Also, when you type
>
> VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug  9 2008 18:46:22)
> MS-Windows 32-bit GUI version with OLE support
> Compiled by Bram@KIBAALE
> Big version with GUI.
>
>>        :echo has('multi_byte')
> 1
>
>> Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
>> think you're in trouble -- cDEFAULT is usually better IMHO.
>
> "unicode encoding:
> set enc=utf-8
>
> "set gui font
> set guifont=DejaVu_Sans_Mono:h11:cDEFAULT
>
> set nocompatible
> source $VIMRUNTIME/vimrc_example.vim
> ...
> ...
> ...
>
> I explored the problem further. There is something wrong with gvim
> interpreting deadkeys of the Windows-Keyboard layout. I could not type
> "n" with combined circumflex because I tried to map the combining
> circumflex on a dead key of my windows keyboard layout. When I map the
> combining circumflex to another key it works and it gets displayed
> well in gvim.
>
> I will explore the problems of remapping the dead keys of the windows
> keyboard layout later. So far I could not google anything about this
> issue in gvim in Windows.
>
> S.
>
> >


I'm using MacVim Snapshot 43, with DejaVu Sans Mono, and the handling  
of Unicode, including the rendering of letters with combining  
diacritical marks, is surprisingly good.

n+0x0302

displays perfectly for me, with a circumflex placed nicely above the  
'n'.   I sometimes work with orthographies for Native American  
languages, which sometimes require two combining diacritics on the  
same letter, and MacVim again does well.  This is one of the (several)  
reasons that I made the painful move from emacs to vim.

Ken

******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA






--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

Tony Mechelynck
In reply to this post by Sven Siegmund

On 12/03/09 14:51, Sven Siegmund wrote:
> Hello, thanks for the details,

My pleasure.

Beware: I'm going to send this email in UTF-8 because of the text I'll
be typing into it.

>
> On Thu, Mar 12, 2009 at 10:53 AM, Tony Mechelynck
> <[hidden email]>  wrote:
[...]
>> Which exact version and patchlevel of gvim are you using? You might want
>> to copy the first handful of lines from the output of ":version" (until
>> the line with "Features included (+) or not (-)") -- see ":help :redir"
>> about how to capture that kind of output. Also, when you type
> VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Aug  9 2008 18:46:22)
> MS-Windows 32-bit GUI version with OLE support
> Compiled by Bram@KIBAALE
> Big version with GUI.

This means 7.2.0. I would recommend that you install a more recent
bugfixed versions, for instance (for Windows) one of Steve Hall's
distributions at
https://sourceforge.net/project/showfiles.php?group_id=43866&package_id=39721 
-- click the clipboard-like icon next to a download link to see when
that build was compiled and what features are included.

I'm not asying that a more recent build will necessarily cure _this_
problem, but it is always worth doing, since it might cure _other_
problems which you might be having. At
http://ftp.vim.org/pub/vim/patches/7.2/README you can see a text file
with a one-line description of every bugfix published sofar for Vim 7.2
-- and whenever a new bugfix gets published, that README file is updated
at the same time.

>
>>         :echo has('multi_byte')
> 1

Good. Nonzero means "feature is present".

>
>> Also, what is your _full_ 'guifont' setting? If it ends in cANSI, I
>> think you're in trouble -- cDEFAULT is usually better IMHO.
> "unicode encoding:
> set enc=utf-8
>
> "set gui font
> set guifont=DejaVu_Sans_Mono:h11:cDEFAULT

this ought to be all right.

>
> set nocompatible
> source $VIMRUNTIME/vimrc_example.vim
> ...
> ...
> ...
>
> I explored the problem further. There is something wrong with gvim
> interpreting deadkeys of the Windows-Keyboard layout. I could not type
> "n" with combined circumflex because I tried to map the combining
> circumflex on a dead key of my windows keyboard layout. When I map the
> combining circumflex to another key it works and it gets displayed
> well in gvim.

Aha! To enter any Unicode codepoint by its Unicode codepoint number in
Vim, use the method described at |i_CTRL-V_digit|. Or if you frequently
use some particular codepoints, you might want to use a keymap -- either
a preexisting one if you find one that suits you, or else you can build
your own: it isn't very hard once you get the hang of it. The
"accents.vim" and "esperanto.vim" keymaps (in $VIMRUNTIME/keymap/) are
small examples showing how keymaps are built. The relevant help is at
|keymap-file-format|.

-- Note that if you build your own keymap it should NOT go into
$VIMRUNTIME/keymap/ (where any upgrade may silently destroy it) but into
either $VIM/vimfiles/keymap/ (if you want to be able to access it from
any Windows login name) or $HOME/vimfiles/keymap/ (to restrict it to one
login name, since every "user" has a different $HOME directory). Create
the needed directory, and maybe its parent too, if they don't yet exist.

Of course Vim must see the keypress in order to act on it, and I suspect
that Windows dead keas are retained by Windows (and not given to Vim)
until you press something else (with which Windows, not Vim, will
combine the "dead key"). And since "Unicode combining characters" must
go _after_  the spacing character to which they apply, they are not
really "dead keys" in the usual typewriter meaning of the expression: on
my Belgian keyboard I hit "dead-circumflex" followed by c to get the
_precombined_ Esperanto consonant ĉ (U+0109 LATIN SMALL LETTER C WITH
CIRCUMFLEX) but in Vim I type c first and ^Vu0302 afterwards to get the
_composite_ codepoints ĉ  [i.e. c (U+0063 LATIN SMALL LETTER C) followed
by "dead-circumflex" (U+0302 COMBINING CIRCUMFLEX ACCENT)] which
SeaMonkey 2.0b1pre erroneously does not overprint in the mail
composition window -- I don't know about your mailer.

>
> I will explore the problems of remapping the dead keys of the windows
> keyboard layout later. So far I could not google anything about this
> issue in gvim in Windows.
>
> S.

As far as I know, everything, but  _everything_ about Vim behaviour is
in the help. (Obviously, the fine points of _Windows_ behaviour are not
in the _Vim_ help.) To find your precious needle (any needle) in the Vim
help^H^H^H^Hhaystack (which is admittedly a huge one), use the following
starting points (magnets, if you will ;-) since sewing needles are
usually made of steel):

        :help
        :help :help
        :help {subject}
                where {subject} means exactly open-brace, small-ess,
                small-you, small-bee, small-jay, small-eeh, small-cee,
                small-tee, close-brace. No fancy replacing (yet).
        :help :helpgrep

which will explain progressively more complex methods of finding your
way about the help.



Best regards,
Tony.
--
Mustgo, n.:
        Any item of food that has been sitting in the refrigerator so
long it has become a science project.
                -- Sniglets, "Rich Hall & Friends"

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Combining diacritical marks display as separate character

Tony Mechelynck
In reply to this post by Ron Aaron-2

On 12/03/09 11:56, Ron Aaron wrote:

> On Mar 12, 11:53 am, Tony Mechelynck<[hidden email]>
> wrote:
>> I don't have any problems with recent gvim versions (currently 7.2.141
>> but it already worked last week) and GTK2 2.14.4-8.6.2 on openSUSE 11.1.
> I use it on Windows and Linux, and it works well on both.
>
>> It can do Hebrew or Arabic but not with true bidi: what Vim does is give
>> you the option of displaying any window in either all RTL or all LTR.
>> You can even have the same file in split-windows, one of them LTR (with
>> English OK but Arabic or Hebrew wrong) and the other RTL (with Hebrew
>> and/or Arabic OK, including Arabic joining forms if 'arabicshape' is on
>> which is the default, but English wrong).
> That is, in fact, what I regularly do.  I open a bilingual (English
> and Hebrew) file, split the window, and have one be LTR and the other
> RTL.  Then I use XeLaTex to produce really nice output :)

What I use to produce real nice true-bidi output is my browser --
SeaMonkey 2.0b1pre, but Firefox 3 (3.0 or 3.1 I'm not sure) uses
identically the same rendering engine, and any "good" browser ought to
do well, which is not to say all of them indeed do, for the kind of
files which I use, namely HTML and plain text.


Best regards,
Tony.
--
There was a plane crash over mid-ocean, and only three survivors were
left in the life-raft: the Pope, the President, and Mayor Daley.
Unfortunately, it was a one-man life-raft, and quickly sinking, so they
started debating who should be allowed to stay.

The Pope pointed out that he was the spiritual leader of millions all
over the world, the President explained that if he died then America
would be stuck with the Vice-President, and so forth.  Then Mayor Daley
said, "Look!  We're not solving anything like this!  The only fair
thing to do is to vote on it."  So they did, and Mayor Daley won by 97
votes.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---