regex @vim, negating a group

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

regex @vim, negating a group

o1792
Hi vimmers,

When searching through text files using regex, I am
trying
unsuccessfully to negated a complicated pattern
without success.

What doesn't help is the double usage of the
circumflex ^ character
(may also be called caret, not sure), because it also
means start of a
line.

However when used within square brackets, it negates
the character in
front of it. i.e. [^p] finds anything that is not p.
[^aeiou] finds
anything that is not a lower case vowel. That's all
well and good but when
you have a sequence (usually grouped) you're trying to
negate, simple
obvious approaches have not worked.

 if you want to find anything that is not any word
ending in "ion",
well the regex group you're looking at is
\(\<.\+ion\>\), but how do you
negate that? Put it all in square brackets and provide
a caret ^ at the
beginning? Nope. in fact group within square brackets
doesn't work as
might be expected. Th enegation pretty much seems to
be built for single
character negation only, not sequences.

I'm only referrign to searching here, when it comes to
substituting or
deleting, :v/etc/d seems tailor nmade to help with
negations of tricky
regex.

Has anybody else had this similar type of problem?
Have read the help
files, surprised to see nothing on this. Caret ^ is
mostly used in its
second sense, that of marking start of line.

Any help appreciated.


               
___________________________________________________________
Switch an email account to Yahoo! Mail, you could win FIFA World Cup tickets. http://uk.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

Charles E Campbell Jr
o1792 wrote:

>Hi vimmers,
>
>When searching through text files using regex, I am
>trying
>unsuccessfully to negated a complicated pattern
>without success.
>  
>
...snip...

May I suggest that you look into LogiPat,

  http://vim.sourceforge.net/scripts/script.php?script_id=1290

which is useful for searching with boolean-logic patterns.  In particular,
with it you can try

   :echo LogiPat('"!ion"')

The result is a regex pattern for matching any lines that don't contain
"ion".

Regards,
Chip Campbell

Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

Gerald Lai-2
In reply to this post by o1792
On Tue, 2 May 2006, o1792 wrote:

[snip]

> if you want to find anything that is not any word
> ending in "ion",
> well the regex group you're looking at is
> \(\<.\+ion\>\), but how do you
> negate that? Put it all in square brackets and provide
> a caret ^ at the
> beginning? Nope. in fact group within square brackets
> doesn't work as
> might be expected. Th enegation pretty much seems to
> be built for single
> character negation only, not sequences.
>
> I'm only referrign to searching here, when it comes to
> substituting or
> deleting, :v/etc/d seems tailor nmade to help with
> negations of tricky
> regex.

The regex format for a negative search is

   /\%(<search>\zs\)*

where the $ anchor, if needed, is placed after "\zs".

For example, if your search is

   /^start.*foo.*bar.*end$

to negate that, do

   /\%(^start.*foo.*bar.*end\zs$\)*

In your case of anything that is not a word ending with "ion", you'd want

   /\%(\<\w\+ion\>\zs\)*

Can't use "."; use "\w" instead. See ":help /\w" and also ":help /\zs".

HTH :)
--
Gerald
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

iler.ml
In reply to this post by o1792
On 5/2/06, o1792 <[hidden email]> wrote:

> Hi vimmers,
>
> When searching through text files using regex, I am
> trying
> unsuccessfully to negated a complicated pattern
> without success.
>
> What doesn't help is the double usage of the
> circumflex ^ character
> (may also be called caret, not sure), because it also
> means start of a
> line.
>
> However when used within square brackets, it negates
> the character in
> front of it. i.e. [^p] finds anything that is not p.
> [^aeiou] finds
> anything that is not a lower case vowel. That's all
> well and good but when
> you have a sequence (usually grouped) you're trying to
> negate, simple
> obvious approaches have not worked.
>
>  if you want to find anything that is not any word
> ending in "ion",
> well the regex group you're looking at is
> \(\<.\+ion\>\), but how do you
> negate that? Put it all in square brackets and provide
> a caret ^ at the
> beginning? Nope. in fact group within square brackets
> doesn't work as
> might be expected. Th enegation pretty much seems to
> be built for single
> character negation only, not sequences.
>
> I'm only referrign to searching here, when it comes to
> substituting or
> deleting, :v/etc/d seems tailor nmade to help with
> negations of tricky
> regex.

Pattern
   /\i\+\(ion\)\@<!\>/
matches words that do not end with 'ion'

BTW, can anyone explain why this pattern does *not*
work, does not match words that do not end with 'ion' :
    /\i\+\(ion\)\@!/
I thought this pattern would match words not ending with
'ion'. But it matches all words, including words ending
with 'ion'. Why ?

Yakov
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

James Vega-3
On Tue, May 02, 2006 at 08:27:49PM +0300, Yakov Lerner wrote:
> On 5/2/06, o1792 <[hidden email]> wrote:
> BTW, can anyone explain why this pattern does *not*
> work, does not match words that do not end with 'ion' :
>    /\i\+\(ion\)\@!/
> I thought this pattern would match words not ending with
> 'ion'. But it matches all words, including words ending
> with 'ion'. Why ?

That pattern will match as long as you don't force it to leave 3
characters after the negation.  Given the word description:

   descript.ion <-- won't match because \(ion\)\@! matches AT that point
   description. <-- works just fine because there's no 'ion' at the
                    current position

   /\i\+\(ion\)\@!\i\{3}\>

The above pattern will do what you wanted since it forces there to be 3
more characters and the end of word when you try to match 'ion'.

James
--
GPG Key: 1024D/61326D40 2003-09-02 James Vega <[hidden email]>

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

Matthew Winn
On Tue, May 02, 2006 at 02:03:47PM -0400, James Vega wrote:

> On Tue, May 02, 2006 at 08:27:49PM +0300, Yakov Lerner wrote:
> > On 5/2/06, o1792 <[hidden email]> wrote:
> > BTW, can anyone explain why this pattern does *not*
> > work, does not match words that do not end with 'ion' :
> >    /\i\+\(ion\)\@!/
> > I thought this pattern would match words not ending with
> > 'ion'. But it matches all words, including words ending
> > with 'ion'. Why ?
>
> That pattern will match as long as you don't force it to leave 3
> characters after the negation.  Given the word description:
>
>    descript.ion <-- won't match because \(ion\)\@! matches AT that point
>    description. <-- works just fine because there's no 'ion' at the
>                     current position
>
>    /\i\+\(ion\)\@!\i\{3}\>
>
> The above pattern will do what you wanted since it forces there to be 3
> more characters and the end of word when you try to match 'ion'.

That won't match words of fewer than four characters.  To match all
words that don't end in "ion" it's better to do:

     /\<\(\i*\(ion\)\@!\i\{3}\|\i\i\=\)\>
      ^^^^  ^                ^^^^^^^^^^

(The leading \< is required to prevent the pattern matching the "on" at
the end of words like "negation".  The alternate part at the end catches
all words of one or two letters.)

--
Matthew Winn ([hidden email])
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

Daniel Einspanjer
On Tue, May 02, 2006 at 08:27:49PM +0300, Yakov Lerner wrote:
> BTW, can anyone explain why this pattern does *not*
> work, does not match words that do not end with 'ion' :
>    /\i\+\(ion\)\@!/
> I thought this pattern would match words not ending with
> 'ion'. But it matches all words, including words ending
> with 'ion'. Why ?

The docs in :help /\@! answer this query pretty well.  The issue is that
there are many places where a pattern doesn't match.
take the word zion.
\i\+ can match either z or zi or zio or zion.  It is greedy so it will first
attempt to match zion.
Now, the \(ion\)\@! is applied.  The current match position is just before
the EOL and EOL != ion so the entire pattern matches.

On Tue, May 02, 2006 at 02:03:47PM -0400, James Vega wrote:
>    /\i\+\(ion\)\@!\i\{3}\>
This one is trying to get past the match position limitation in the wrong
way.. any time you try to describe what should be in the place of a zero
width match you must be extremely careful that you are only dealing with
terms that are important to you.  In this case, the \i\{3\} isn't important
to you and it causes problems.  This isn't a steadfast rule, but it can be a
useful guideline.


"Matthew Winn" <[hidden email]> wrote in message
news:[hidden email]...
> That won't match words of fewer than four characters.  To match all
> words that don't end in "ion" it's better to do:
>
>     /\<\(\i*\(ion\)\@!\i\{3}\|\i\i\=\)\>
>      ^^^^  ^                ^^^^^^^^^^
This one attempts to correct the problems of the \i\{3\} by adding even more
inspection of tokens that aren't important to you. This is an example that
proves why it is bad to try to dictate what exists in the place of a zero
width match. :)

On Tue, May 02, 2006 at 08:27:49PM +0300, Yakov Lerner wrote:
> Pattern
>    /\i\+\(ion\)\@<!\>/
> matches words that do not end with 'ion'

This one is the best way of solving the presented problem (words not ending
in 'ion'). To break it down:
    1. Match as many identifier chars as possible ( \i\+ )
    2. Make sure the last three characters behind the current match point
are not 'ion' ( \(ion\)\@<! )
    3. Make sure the current match point is a word boundary ( \> )
This regex will consume the whole word then back up on any words that do
contain ion but fail to match because of the \> requirement.


Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

iler.ml
Daniel Einspanjer <[hidden email]> wrote:
> The docs in :help /\@! answer this query pretty well.

Reminds me.  Professor gives a lecture in mathematics.
At some point he says "From this it obviously follows .." and writes
long something that does not resemble anything he's written before.
Then he falls silent, looks at this formula and runs out of the room.
Half an hour passes. He returns and  says "All right. I checked it.
It's really obvious".

Daniel Einspanjer <[hidden email]> wrote:
> Yakov Lerner wrote:
> > Pattern
> >    /\i\+\(ion\)\@<!\>/
> > matches words that do not end with 'ion'
>
> This one is the best way of solving the presented problem (words not ending
> in 'ion').

Mmmm, good. Where do I claim the prize ?

Yakov
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

iler.ml
In reply to this post by iler.ml
On 5/2/06, Yakov Lerner <[hidden email]> wrote:
> Pattern
>    /\i\+\(ion\)\@<!\>/
> matches words that do not end with 'ion'

Two more ways to match words not ending with 'ion':

2)  This pattern also matches words not not ending with 'ion':

    \<\(\w*ion\>\)\@!\w\+

3) The 'old' way. That's what I'd use before the \@ stuff to match
words not ending with 'ion':

    /\<\(\w*[^n]\|\w*[^o]n\|\w*[^i]on\)\>

Tricky

Yakov
Reply | Threaded
Open this post in threaded view
|

Re: regex @vim, negating a group

Matthew Winn
On Thu, May 04, 2006 at 08:56:41AM +0300, Yakov Lerner wrote:
> 3) The 'old' way. That's what I'd use before the \@ stuff to match
> words not ending with 'ion':
>
>    /\<\(\w*[^n]\|\w*[^o]n\|\w*[^i]on\)\>

Not much fun if you want to match all words not ending in "ification". :-)
This is where negative lookahead and negative lookbehind really come into
their own.

--
Matthew Winn ([hidden email])