Real displayed width of a character

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Real displayed width of a character

Jehan Pagès
Hi all,

I have a question about "displayed width" (and not encoding length!) of a character. How does vim "decide" the width of a character, in term of number of columns? Does it use some function like "wcwidth" (POSIX function)? Some home-made similar function?

The reason I ask this is that some characters sometimes would be single or double column depending on the used font. Moreover Unicode, as far as I could read, does not explicitely give a prefered size for characters, in the exception of some characters (mostly East-Asian), which are in dedicated Unicode planes (full-width and half-width characters). This is explained in this Technical Report for instance (the only paper from the Unicode Consortium I found which was dealing about character width as the main topic,elsewhere I could only find allusions, or small notes, as though it was implicit)
http://unicode.org/reports/tr11/

An extract from this:
"
Except for a few characters, which are explicitly called out as fullwidth or halfwidth in the Unicode Standard, characters are not duplicated based on distinction in width. Some characters, such as the ideographs, are always wide; others are always narrow; and some can be narrow or wide, depending on the context. The Unicode character property East_Asian_Width provides a default classification of characters, which an implementation can use to decide at runtime whether to treat a character as narrow or wide.
"

Even though it is focused on East-Asian characters, I could find some other characters which have very different sizes in different fonts. For instance I found a few fonts with '@' being double size compared to "typical" western characters (A-Z 0-9, etc.). Also this true for the European money character (euro: €), or even the Latin characters œ or æ (used in French among other places). I would even say that this seems logical as these characters are formed by including 2 characters in each other... So being double size seems normal to me, isn't it?
Unfortunately a function like wcwidth considers it must be "one column wide", and apparently the function used by vim too (being the same or another). Then I must find a font which has these characters but the same width than the rest (so mono or close). If I don't, the characters are "cut" by vim.

Would you have an idea about this? Couldn't vim be improved in such a way it would consider the font really used? This seems complicated as the font is defined in the Terminal Emulator, not in vim itself. And I could not find yet if there is some possible to advertise the used font in any terminal protocol (VT100 or else). But then what if there was an option in vim where the user could explicitely tell "I am using this font". So that when vim displays characters and then ask the terminal to "jump" to this or that column, it can calculate the right place to go, without cutting text?
Thanks.

Jehan

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Real displayed width of a character

Tony Mechelynck

On 24/10/08 16:22, Jehan Pagès wrote:

> Hi all,
>
> I have a question about "displayed width" (and not encoding length!) of
> a character. How does vim "decide" the width of a character, in term of
> number of columns? Does it use some function like "wcwidth" (POSIX
> function)? Some home-made similar function?
>
> The reason I ask this is that some characters sometimes would be single
> or double column depending on the used font. Moreover Unicode, as far as
> I could read, does not explicitely give a prefered size for characters,
> in the exception of some characters (mostly East-Asian), which are in
> dedicated Unicode planes (full-width and half-width characters). This is
> explained in this Technical Report for instance (the only paper from the
> Unicode Consortium I found which was dealing about character width as
> the main topic,elsewhere I could only find allusions, or small notes, as
> though it was implicit)
> http://unicode.org/reports/tr11/
>
> An extract from this:
> "
> Except for a few characters, which are explicitly called out as
> fullwidth or halfwidth in the Unicode Standard, characters are not
> duplicated based on distinction in width. Some characters, such as the
> ideographs, are always wide; others are always narrow; and some can be
> narrow or wide, depending on the context. The Unicode character property
> East_Asian_Width provides a default classification of characters, which
> an implementation can use to decide at runtime whether to treat a
> character as narrow or wide.
> "
>
> Even though it is focused on East-Asian characters, I could find some
> other characters which have very different sizes in different fonts. For
> instance I found a few fonts with '@' being double size compared to
> "typical" western characters (A-Z 0-9, etc.). Also this true for the
> European money character (euro: €), or even the Latin characters /œ /or
> æ (used in French among other places). I would even say that this seems
> logical as these characters are formed by including 2 characters in each
> other... So being double size seems normal to me, isn't it?
> Unfortunately a function like wcwidth considers it must be "one column
> wide", and apparently the function used by vim too (being the same or
> another). Then I must find a font which has these characters but the
> same width than the rest (so mono or close). If I don't, the characters
> are "cut" by vim.
>
> Would you have an idea about this? Couldn't vim be improved in such a
> way it would consider the font really used? This seems complicated as
> the font is defined in the Terminal Emulator, not in vim itself. And I
> could not find yet if there is some possible to advertise the used font
> in any terminal protocol (VT100 or else). But then what if there was an
> option in vim where the user could explicitely tell "I am using this
> font". So that when vim displays characters and then ask the terminal to
> "jump" to this or that column, it can calculate the right place to go,
> without cutting text?
> Thanks.
>
> Jehan

Fullwidth characters always occupy two screen columns. Sometimes an
empty column can be added in the last screen column if a fullwidth
character would otherwise start in it.

Halfwidth characters always occupy one screen column, except the hard
tab (U+0009 HORIZONTAL TAB) which occupies one or more columns depending
on 'tabstop' 'list' and 'listchars'. Strictly speaking, the tab is a
"control character" anyway.

Ambiguous-width characters are treated as fullwidth or halfwidth
depending on the setting of the global 'ambiwidth' option.

See:
        :help 'ambiwidth'
        :help 'tabstop'
        :help 'list'
        :help 'listchars'


Note also that proportional fonts (fonts where m is much wider than i or
l, not to mention Arabic final sad vs. isolated alif) are ugly in GTK2
versions of gvim and cannot be used in any other versions, or in Console
Vim.


Best regards,
Tony.
--
Although we modern persons tend to take our electric lights, radios,
mixers, etc., for granted, hundreds of years ago people did not have
any of these things, which is just as well because there was no place
to plug them in.  Then along came the first Electrical Pioneer,
Benjamin Franklin, who flew a kite in a lighting storm and received a
serious electrical shock.  This proved that lighting was powered by the
same force as carpets, but it also damaged Franklin's brain so severely
that he started speaking only in incomprehensible maxims, such as "A
penny saved is a penny earned."  Eventually he had to be given a job
running the post office.
                -- Dave Barry, "What is Electricity?"

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Real displayed width of a character

Mansing
Wow!  For ages, I knew not to ask this question.  Now with
:set ambiwidth=double
my Chinese /open/ quotation mark ( “ code=0x201c ) is displayed correctly --without colliding with the next character.  Strange that, the /close/ quotation mark ( ” code=0x201d ) has always been displayed well regardless of the ambiwidth setting?!

mt 081025


Tony Mechelynck wrote:
On 24/10/08 16:22, Jehan Pagès wrote:
  
Hi all,

I have a question about "displayed width" (and not encoding length!) of
a character. How does vim "decide" the width of a character, in term of
number of columns? . . .

Jehan
    
. . .

Ambiguous-width characters are treated as fullwidth or halfwidth 
depending on the setting of the global 'ambiwidth' option.

. . .
Tony.
  

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Real displayed width of a character

Tony Mechelynck

On 25/10/08 01:40, Mansing wrote:

> Wow!  For ages, I knew not to ask this question.  Now with
>
>     :set ambiwidth=double
>
> my Chinese /open/ quotation mark ( “ code=0x201c ) is displayed
> correctly --without colliding with the next character.  Strange that,
> the /close/ quotation mark ( ” code=0x201d ) has always been displayed
> well regardless of the ambiwidth setting?!
>
> mt 081025

Hm. Here these characters are displayed with the same (narrow) glyph as
a plain double quote in Bitstream Vera Sans Mono, but with FZFangSong
U+201C is a 66 quote occupying the right half of its wide glyph while
U+201D is a 99 quote in the left half of _its_ wide glyph (well, maybe I
should say right-top and left-top quarters), so that with
ambiwidth=single U+201C is overprinted on the next character while it's
only the blank right half of U+201D which is overprinted on _its_ follower.

Best regards,
Tony.
--
Meader's Law:
        Whatever happens to you, it will previously have happened to
everyone you know, only more so.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---