col() in lines with apostrophe characters

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

col() in lines with apostrophe characters

Gregor Uhlenheuer
Hello,
I think the col() function does not work properly for lines with
apostrophe characters (`´).

:echo col([line('.'), '$']) returns 6 on the line below:

´foo

It should return 5 I think.

Regards,
Gregor Uhlenheuer

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: col() in lines with apostrophe characters

Ingo Karkat
On 14-Mar-2010 17:49, Gregor Uhlenheuer wrote:
> Hello,
> I think the col() function does not work properly for lines with
> apostrophe characters (`´).
>
> :echo col([line('.'), '$']) returns 6 on the line below:
>
> ´foo
>
> It should return 5 I think.

It depends on the encoding of the apostrophe character, as col() returns byte
indices, not logical characters. You assume that the apostrophe uses only one
byte, whereas it seems that it actually uses two bytes.

To investigate, check the 'encoding' and 'fileencoding' settings, and use the g8
command on the apostrophe character; it'll return the (UTF-8) byte sequence used
to encode it. (This is a somewhat simplified explanation, it's a quite complex
matter.)

For many scripting uses, Vim's byte-orientation makes it difficult to properly
deal with multi-byte strings, but there are ways to cope. For example, to count
the number of characters, use
     let len = strlen(substitute(str, ".", "x", "g"))

-- regards, ingo

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: col() in lines with apostrophe characters

Nikolay Aleksandrovich Pavlov
Ответ на сообщение «Re: col() in lines with apostrophe characters»,
присланное в 20:15:39 14 марта 2010, Воскресенье,
отправитель Ingo Karkat:

>      let len = strlen(substitute(str, ".", "x", "g"))
Never use this. It is three orders of magnitude slower than
    let len=len(split(str, '\zs'))

You can test this by launching
    vim -u NONE -c 'source test.vim'
My results are located in «strprofile» file.

Note that both methods assume that one symbol is symbol+combining diacritics (if
any).

Текст сообщения:

> On 14-Mar-2010 17:49, Gregor Uhlenheuer wrote:
> > Hello,
> > I think the col() function does not work properly for lines with
> > apostrophe characters (`´).
> >
> > :echo col([line('.'), '$']) returns 6 on the line below:
> >
> > ´foo
> >
> > It should return 5 I think.
>
> It depends on the encoding of the apostrophe character, as col() returns
>  byte indices, not logical characters. You assume that the apostrophe uses
>  only one byte, whereas it seems that it actually uses two bytes.
>
> To investigate, check the 'encoding' and 'fileencoding' settings, and use
>  the g8 command on the apostrophe character; it'll return the (UTF-8) byte
>  sequence used to encode it. (This is a somewhat simplified explanation,
>  it's a quite complex matter.)
>
> For many scripting uses, Vim's byte-orientation makes it difficult to
>  properly deal with multi-byte strings, but there are ways to cope. For
>  example, to count the number of characters, use
>      let len = strlen(substitute(str, ".", "x", "g"))
>
> -- regards, ingo
>

strprofile (670 bytes) Download Attachment
test.vim (274 bytes) Download Attachment
signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: col() in lines with apostrophe characters

Gregor Uhlenheuer
In reply to this post by Ingo Karkat
Am 14.03.2010 18:15, schrieb Ingo Karkat:

> On 14-Mar-2010 17:49, Gregor Uhlenheuer wrote:
>> Hello,
>> I think the col() function does not work properly for lines with
>> apostrophe characters (`´).
>>
>> :echo col([line('.'), '$']) returns 6 on the line below:
>>
>> ´foo
>>
>> It should return 5 I think.
>
> It depends on the encoding of the apostrophe character, as col() returns
> byte indices, not logical characters. You assume that the apostrophe
> uses only one byte, whereas it seems that it actually uses two bytes.
>
> To investigate, check the 'encoding' and 'fileencoding' settings, and
> use the g8 command on the apostrophe character; it'll return the (UTF-8)
> byte sequence used to encode it. (This is a somewhat simplified
> explanation, it's a quite complex matter.)

Thanks for the input - I didn't know that.
> For many scripting uses, Vim's byte-orientation makes it difficult to
> properly deal with multi-byte strings, but there are ways to cope. For
> example, to count the number of characters, use
>     let len = strlen(substitute(str, ".", "x", "g"))

Since you nearly guessed my use of col() - I want to get the length of
the current line - thank you for the tip with the substitution. I
figured that using virtcol() is probably more appropriate for that
purpose.

Regards,
Gregor

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: col() in lines with apostrophe characters

Ingo Karkat
On 15-Mar-2010 11:21, Gregor Uhlenheuer wrote:
> Since you nearly guessed my use of col() - I want to get the length of
> the current line - thank you for the tip with the substitution. I
> figured that using virtcol() is probably more appropriate for that
> purpose.

Well, call it psychic debugging power ;-)

To prevent anyone reading this from falling into another trap, I'd like to
stress that virtcol() and the substitution I suggested are *not equivalent*!

The substitution counts the number of characters (as in "ABC" = 3), whereas
virtcol() is concerned with the screen width that characters occupy (as in "A" =
one column, "<Tab>" = 1..8 columns, "^V" (or other unprintable chars or the
whole range of double-width Asian Kanji characters = 2).

In short, if you're concerned with indenting or fitting text into a given width,
use virtcol(), if you're interested in the number of chars, use the substitution
count, and use col() for the physical number of bytes used to represent the string.

-- regards, ingo

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: col() in lines with apostrophe characters

Ingo Karkat
In reply to this post by Nikolay Aleksandrovich Pavlov
On 15-Mar-2010 18:57, ZyX wrote:

> Ответ на сообщение <<Re: col() in lines with apostrophe characters>>,
> присланное в 20:15:39 14 марта 2010, Воскресенье,
> отправитель Ingo Karkat:
>
>>       let len = strlen(substitute(str, ".", "x", "g"))
> Never use this. It is three orders of magnitude slower than
>      let len=len(split(str, '\zs'))
>
> You can test this by launching
>      vim -u NONE -c 'source test.vim'
> My results are located in <<strprofile>> file.
>
> Note that both methods assume that one symbol is symbol+combining diacritics (if
> any).
Thank you for this valuable insight. I had taken my substitution straight from
Vim's :help strlen().

When re-producing your profiling on my (Linux x86, Vim 7.2.368) system, the
difference is one order of magnitude (0.84 vs 0.08), still substantial.

Your test processes a very long string once. When I changed this to processing a
short string many times over (test2.vim, attached; an arguably more real-life
scenario), the substitution eventually is faster than the split.

I intended to ask Bram to update the documentation with your improved algorithm.
With my inconclusive results, I'm not so sure any more.

-- regards, ingo

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

strprofile (882 bytes) Download Attachment
strprofile2 (1K) Download Attachment
test2.vim (822 bytes) Download Attachment