New bug with Searchpair() and utf-8 in Vim6 and Vim7

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

New bug with Searchpair() and utf-8 in Vim6 and Vim7

John Wellesz
Hello,

There is a new bug with Searchpair() and utf-8, this was reported to me by
Tim S. who is using my PHP indent script.
The bug seems only to happen under Windows but with all Vim versions
including Vim 7.0aa:

The simplest way to reproduce the bug is to:

- Use my indent script at http://www.2072productions.com/vim/indent/php.vim
- Open a Gvim
- To set encoding to utf-8 ( :set enc=utf-8 )
- To open a new php file ( :e foo.php )
- To paste the following lines into it:

<?php
function xyz() {
        /*
         * Hello
     */
}
?>

Then in command mode put your cursor on the line containing the "*/" then
indent the line ( == ), you'll notice that the line is correctly indented.

Now in insert mode put your cursor just after the "/" of "*/" and press
return --> the "*/" will be wrongly moved to the left margin because the
function searchpair() at line 703 of my php.vim will return 0 instead of the
"/*" line number.

You can easily notice that by adding the following lines just after the line
703:

                echo 'Searchpair returned: ' . lnum
                call getchar()

(The line 703 should be:

let lnum = searchpair('/\*', '', '\*/\zs', s:searchpairflags) " find the
most outside /*

)

When we press ( == ) in command mode it works.
When the indent function his called by a key defined in indentkeys while we
are in insert mode the bug occurs.


John


PS: I will be away from the net from next Monday till September the 1st.

Reply | Threaded
Open this post in threaded view
|

Re: New bug with Searchpair() and utf-8 in Vim6 and Vim7

Bram Moolenaar

John Wellesz wrote:

> There is a new bug with Searchpair() and utf-8, this was reported to me by
> Tim S. who is using my PHP indent script.
> The bug seems only to happen under Windows but with all Vim versions
> including Vim 7.0aa:
>
> The simplest way to reproduce the bug is to:
>
> - Use my indent script at http://www.2072productions.com/vim/indent/php.vim
> - Open a Gvim
> - To set encoding to utf-8 ( :set enc=utf-8 )
> - To open a new php file ( :e foo.php )
> - To paste the following lines into it:
>
> <?php
> function xyz() {
> /*
> * Hello
>      */
> }
> ?>
>
> Then in command mode put your cursor on the line containing the "*/" then
> indent the line ( == ), you'll notice that the line is correctly indented.
>
> Now in insert mode put your cursor just after the "/" of "*/" and press
> return --> the "*/" will be wrongly moved to the left margin because the
> function searchpair() at line 703 of my php.vim will return 0 instead of the
> "/*" line number.
>
> You can easily notice that by adding the following lines just after the line
> 703:
>
> echo 'Searchpair returned: ' . lnum
> call getchar()
>
> (The line 703 should be:
>
> let lnum = searchpair('/\*', '', '\*/\zs', s:searchpairflags) " find the
> most outside /*
>
> )
>
> When we press ( == ) in command mode it works.
> When the indent function his called by a key defined in indentkeys while we
> are in insert mode the bug occurs.

It sounds like the cursor is after the "*/" and thus skips over it with
searchpair().  To find the matching "/*" the cursor must be on the "*/".
I didn't actually try it, too complicated...

--
A fine is a tax for doing wrong.  A tax is a fine for doing well.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

RE: New bug with Searchpair() and utf-8 in Vim6 and Vim7

John Wellesz
> -----Message d'origine-----
> De : [hidden email] [mailto:[hidden email]]
> Envoyé : dimanche 14 août 2005 14:24
> À : John Wellesz
> Cc : [hidden email]
> Objet : Re: New bug with Searchpair() and utf-8 in Vim6 and Vim7
>
> It sounds like the cursor is after the "*/" and thus skips over it with
> searchpair().  To find the matching "/*" the cursor must be on the "*/".

Hmmm... if "encoding" is not set to utf-8 it works correctly whereas it
shouldn't since as you said the cursor is positioned after the '*/' by a
call to search('\*/\zs', 'W') just before the call to searchpair(). (I don't
remember why I did that...)

Now I make sure the cursor is on the "*" of "*/" and it's working with utf-8

Sorry for this false alarm.

By the way, there is a part of the help on searchpair() I've never clearly
understood:

                When searching backwards and {end} is more than one
character,
                it may be useful to put "\zs" at the end of the pattern, so
                that when the cursor is inside a match with the end it finds
                the matching start.


I don't understand in which case the "\zs" is needed. I use it in my script
in calls like:

searchpair('/\*', '', '\*/\zs', 'bWr')

since the {end} is more than one character...

Before calling searchpair() I always make sure the cursor is on the '*' of
the '*/' (and not after; like it was till today)

I've removed the '\zs' in all those searchpair() calls and everything works
fine even the other bug with searchpair() doesn't happen anymore!

John





Reply | Threaded
Open this post in threaded view
|

RE: New bug with Searchpair() and utf-8 in Vim6 and Vim7

Bram Moolenaar

John Wellesz wrote:

> > It sounds like the cursor is after the "*/" and thus skips over it with
> > searchpair().  To find the matching "/*" the cursor must be on the "*/".
>
> Hmmm... if "encoding" is not set to utf-8 it works correctly whereas it
> shouldn't since as you said the cursor is positioned after the '*/' by a
> call to search('\*/\zs', 'W') just before the call to searchpair(). (I don't
> remember why I did that...)

When using the "\zs" it should work when the cursor is just after the
"*/", which is where the "\zs" would leave the cursor when searching for
that pattern.

I can't reproduce a difference between 'encoding' at "latin1" and
"utf-8".  I also can't imagine there would be a difference for
MS-Windows.  Perhaps something else gets in the way?

--
Light travels faster than sound.  This is why some people
appear bright until you hear them speak

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

RE: New bug with Searchpair() and utf-8 in Vim6 and Vim7

John Wellesz
> -----Message d'origine-----
> De : [hidden email] [mailto:[hidden email]]
> Envoyé : dimanche 14 août 2005 22:57
> À : John Wellesz
> Cc : [hidden email]
> Objet : RE: New bug with Searchpair() and utf-8 in Vim6 and Vim7
>
 
> When using the "\zs" it should work when the cursor is just after the
> "*/", which is where the "\zs" would leave the cursor when searching for
> that pattern.

So it should have worked. Is "\zs" really necessary in that case? I still
don't understand why it's needed when the {end} pattern is more than one
character.

> I can't reproduce a difference between 'encoding' at "latin1" and
> "utf-8".
> I also can't imagine there would be a difference for
> MS-Windows.  Perhaps something else gets in the way?

I cannot test on another OS; this is what Tim S. told me when he reported
the problem to me but I only have Windows (I know, don't hit me).

I just know that when encoding is not set to utf-8, searchpair() works as
expected and that the problem only occurs when in insert mode, if the indent
function is called on the very same line in normal mode it will indent
correctly.

My script does nothing special when using utf-8 (except in vim <= 603 where
it changes the flags used by searchpair() to avoid the first bug), but the
problem we are talking about also occurs in  Vim 7.0aa...

So the only difference is the use of utf-8, maybe it's the result of an
overflow caused by utf-8 that damages something else in the memory :-( (I
hate this kind of bug)

If you are using VIM 7 it should be easier to reproduce the problem since
you don't have to install my script:

Just set encoding to utf-8, open a new empty php file and type the following
lines:

<?php
function xyz() {
        /*
         * Hello
      */
}
?>

Then, in insert mode, place the cursor after the "*/" and press enter (you
can also put the cursor anywhere on the line and type CTRL-F, it's defined
in 'indentkeys'), the indent function will be called and the line will be
indented to the left margin...
 (the job is done at line 703 of my original script, the one with the
changelog, in the version I gave you it's at line 516)


Anyway, since I changed the "*/\zs" pattern to "*/" the problem is gone so
it's not a big problem anymore. I'll send you the new version of my script
in September (I want to make some other improvements).

John


Reply | Threaded
Open this post in threaded view
|

RE: New bug with Searchpair() and utf-8 in Vim6 and Vim7

Bram Moolenaar

John Wellesz wrote:

> > When using the "\zs" it should work when the cursor is just after the
> > "*/", which is where the "\zs" would leave the cursor when searching for
> > that pattern.
>
> So it should have worked. Is "\zs" really necessary in that case? I still
> don't understand why it's needed when the {end} pattern is more than one
> character.

The basic idea is that Vim starts at the cursor position, searching
backwards for matches with the start and end patterns.  There is a
counter that starts at one.  When matching an end pattern the counter is
incremented, when a start pattern is matched it is decremented.  When
the counter is zero the matching start item has been located.

Consider this text, with three possible start positions:

         /* comment */
               A     B   C

If you start after the end pattern at C, the */ is found first and the
counter is incremented.  When encountering the /* the counter is
decremented to one, searching continues without finding a match.

If you start at A the /* is found right away, counter decremented to
zero, it's a match.

When you start at B it depends on whether you find the */ or not.
That's where it matters what exactly is the match position of the
pattern.  Normally it's the start of the pattern, thus the * of the */.
If you use the \zs at the end of the pattern, the match position becomes
the character just after the */.  Thus when starting at B the */ match
will be after the cursor and won't be counted.

> If you are using VIM 7 it should be easier to reproduce the problem since
> you don't have to install my script:
>
> Just set encoding to utf-8, open a new empty php file and type the following
> lines:
>
> <?php
> function xyz() {
> /*
> * Hello
>       */
> }
> ?>
>
> Then, in insert mode, place the cursor after the "*/" and press enter (you
> can also put the cursor anywhere on the line and type CTRL-F, it's defined
> in 'indentkeys'), the indent function will be called and the line will be
> indented to the left margin...
>  (the job is done at line 703 of my original script, the one with the
> changelog, in the version I gave you it's at line 516)

OK, I can see the problem now.  Adding a few lines to see what is
happening around line 516 does indicate the 'encoding' changes what
happens.  I'll investigate further.

--
hundred-and-one symptoms of being an internet addict:
42. Your virtual girlfriend finds a new net sweetheart with a larger bandwidth.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///