find/replace problem and a good tutorial

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

find/replace problem and a good tutorial

Juan Pablo Aqueveque
Hi All:

My great problem with vim has always been the regular expressions. It
is very difficult to understand them and much more to develop one. For
instances, I want to do the next find and replace:
 
_word_ to <em>word</em>  
*word* to <strong>word</strong>  
??word?? to <cite>word</cite>  
 
Besides helping me with the previous problem, how can I learn regular
expressions in a simple way?, some remarkable URL in this respect it
would be very valued.
 
In advance, thank you very much for your help, the members of this
list are always willing to help.


--
juan pablo aqueveque
www.juque.cl
Reply | Threaded
Open this post in threaded view
|

RE: find/replace problem and a good tutorial

ehannes
try: http://www.oreilly.com/catalog/regex/

:%s/_\([^_]\+\)_/<em>\1<\/em>/g
:%s/\*\([^\*]\+\)\*/<strong>\1<\/strong>/g
:%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

Hans Scholte, <DPC/>



-----Oorspronkelijk bericht-----
Van: Juan Pablo Aqueveque [mailto:[hidden email]]
Verzonden: maandag 30 mei 2005 16:22
Aan: [hidden email]
Onderwerp: find/replace problem and a good tutorial


Hi All:

My great problem with vim has always been the regular expressions. It
is very difficult to understand them and much more to develop one. For
instances, I want to do the next find and replace:
 
_word_ to <em>word</em>  
*word* to <strong>word</strong>  
??word?? to <cite>word</cite>  
 
Besides helping me with the previous problem, how can I learn regular
expressions in a simple way?, some remarkable URL in this respect it
would be very valued.
 
In advance, thank you very much for your help, the members of this
list are always willing to help.


--
juan pablo aqueveque
www.juque.cl
Reply | Threaded
Open this post in threaded view
|

Re: find/replace problem and a good tutorial

Tim Chase-2
In reply to this post by Juan Pablo Aqueveque
 > _word_ to <em>word</em>
 > *word* to <strong>word</strong>
 > ??word?? to <cite>word</cite>

There are a variety of ways these can be defined :)

This would likely be something like

:%s/\<_\(\S*\)_\>/\='<em>'.substitute(submatch(1), '_', ' ',
'g').'<\/em>'/g

:%s/\*\(\S*\)\*/\='<strong>'.substitute(submatch(1), '*', ' ',
'g').'<\/strong>'/g

These should take care of _cases_like_this_ though I haven't
figured out a clean way for it to handle _cases like this_ in
terms of distinguishing them from _a first case_ and _a second_
where there are two on the same line.  If it's just a single
word, they can be simplified to

:%s/\<_\(\S*\)_\>/<em>\1<\/em>/g
:%s/\*\(\S*\)\*/<strong>\1<\/strong>/g

or if only a single "_" marks both the beginning and the end, as
in the above _this is an example_, then it can be done similarly
with

:%s/\<_\([^_]*\)_\>/<em>\1<\/em>/g
:%s/\<\*\([^_]*\)\*\>/<strong>\1<\/strong>/g

Also, note that the "*...*" ones don't make use of the \<...\>
because that's not considered a word-boundary.  This assumes your
'iskeyword' property doesn't include an asterisk, but does
include the underscore.  YMMV if you've bunged with this option
:)

The third one is a bit trickier, as you've got to find two
adjacent characters to work with.  If *no* question marks can
occur in the text, it's not so bad.  Something like

%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

However, if you're allowed to have something like

        ??What time is it? she said??

then something, perhaps, like the following (mostly untested)


%s/??\([^?]\%(.\%(??\)\@<!\)*\)??/<cite>\1<\/cite>/g

 > Besides helping me with the previous problem, how can I learn
 > regular expressions in a simple way?, some remarkable URL in
 > this respect it would be very valued.

There are a number of ways to come at it.  Having taken a course
in programming languages really helps, or if you've written your
own finite-state-machines (FSMs). :)  As for links and texts,
there's an O'reilly book with "regular expressions" in the title
which is supposed to be quite good.  Regexps really are a
programming language of sorts, so if you code, it's just a matter
of breaking down the problem (the target matches you want) into
regexp atoms.  Additionally, you'll often see us break down
complex regexps to help folks understand the magic they're doing.

So, in that same spirit... :)

In the first example (the one with the substitute() and
submatch() functions) it breaks down like this:

%s/foo/bar/g    you're surely familiar with this.

where "foo" is defined as "\<_\(\S*\)_\>" and "bar" is the
evaluation of an expression.  For more help on replacing with the
results of expressions, see ":help sub-replace-special".

Now, that first expression is
\< ensure the pattern match begins at the start of a word
_ that begins with an underscore
\(...\) mark some stuff that we'll reference later
\S* everything that's not considered whitespace (WS)
_ the closing/ending underscore
\> make sure a word ends here (followed by WS)

So basically, it's "when an underscore starts something, is
followed by a bunch of non-whitespace stuff, and then ends with
an underscore, remember the stuff that wasn't whitespace".

This then gets massaged via the "sub-replace-special" evalutaion
of the "\=".  First, we start by creating the tags...we know it
will look something like

        "<em>".stuff."<\/em>"

(note that we have to escape the forward slash or else the
":s/foo/bar/g" gets confused by it, and thinks that its reached
the end of the replacement text)

Now, the "stuff" is simply the originally captured stuff from
above, only we want to replace any underscores in it as well so
we don't end up with something like

        _this_is_a_test_

becoming

        <em>this_is_a_test</em>

The substitute() function takes care of this replacement.

==================

For the second bout of them, it's the same thing as before, only
we don't have to worry about stripping out extraneous
underscores.  This simplifies matters.  No need for the
"sub-replace-special" expressions, substitute() calls, etc.  We
can just use the back-references, making the

        <em>\1<\/em>

where the "\1" is replaced with the text we previously tagged as
"interesting".  Again, escaping the forward-slash to keep if from
terminating the replacement expression prematurely.

==================

The third theme & variation on this is to change what constitutes
the search target.  Previously, we wanted non-whitespace.  This
one, we simply want anything and everything that's not an
underscore.  So we swap the "\S*" for "[^_]*" which is how one
denotes "anything that isn't an underscore".

==================

Again, the fourth is the same as the previous, only the
prohibited characters are question-marks, rather than
underscores.  This is something akin to

:%s/??\(.\{-}\)??/<cite>\1<\/cite>/g

which stops the match at the first "??" it sees after the opening
"??"

==================

Lastly, that "??...??" one is tricker.  Previously, we could look
for a single starting atom, some stuff, and a single ending atom.
This time, we have to look for the starting marker, some stuff
that doesn't include the ending marker, followed by the ending
marker.  From that, you should be able to discern that we've got

        :%s/??\(stuff\)??/<cite>\1<\/cite>/g

which is about the same as above.  However, things get messy in
that "stuff" portion.  The initial "[^?]" is a single character
that isn't a question mark.  This prevents troubles that may crop
up with things like

        ?????

We then group (but don't bother to tag, using the \%(...\)
syntax) any characters that aren't immediately followed/preceeded
(depending on where you start counting) by a pair of question marks.


[^?] a character that isn't a question mark
\%(...)* a bunch of valid things we group, but don't need to track
. any character
\%(??\)\@<! ensure that "??" doesn't match before this point.



They can be complex & hairy, but they've gotta make sense to the
regexp interpreter at some point, so they can be dissected :)
It's just a matter of breaking down the problem into bits that
you know have solutions, then stringing them all together.

Further help within vim can be found at topics such as

        :help :s
        :he sub-replace-special
        :he 'iskeyword'
        :he substitute()
        :he submatch()
        :he /\(
        :he /\%(
        :he /[]
        :he /\@<!
        :he /\1


Hope this gets you well on the road to regexp mastery...

-tim






Reply | Threaded
Open this post in threaded view
|

Re: find/replace problem and a good tutorial

Marian Csontos
In reply to this post by ehannes
On Mon, 30 May 2005 17:02:57 +0200, Scholte, J.C.M. <[hidden email]>  
wrote:

> try: http://www.oreilly.com/catalog/regex/
>
> :%s/_\([^_]\+\)_/<em>\1<\/em>/g
> :%s/\*\([^\*]\+\)\*/<strong>\1<\/strong>/g
> :%s/??\([^?]\+\)??/<cite>\1<\/cite>/g

In this case I'd prefer
:%s/??\([^?]\{-}\)??/<cite>\1<\/cite>/g

Marian



________ Information from NOD32 ________
This message was checked by NOD32 Antivirus System for Linux Mail Server.
http://www.nod32.com
Reply | Threaded
Open this post in threaded view
|

Re: find/replace problem and a good tutorial

Paul-433
In reply to this post by Juan Pablo Aqueveque
On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:

> Besides helping me with the previous problem, how can I learn regular
> expressions in a simple way?, some remarkable URL in this respect it
> would be very valued.

http://www.geocities.com/volontir/

--

.
Reply | Threaded
Open this post in threaded view
|

Re: find/replace problem and a good tutorial

A.J.Mechelynck
Vigil wrote:
> On Mon, 30 May 2005, Juan Pablo Aqueveque wrote:
>
>> Besides helping me with the previous problem, how can I learn regular
>> expressions in a simple way?, some remarkable URL in this respect it
>> would be very valued.
>
>
> http://www.geocities.com/volontir/
>

You might start with

        :help 03.9
        :help usr_27.txt
        :help pattern.txt

in ascending order of difficulty. I believe that everything is there,
but like all Vim help, it is best to read it attentively; often
"hands-on" experimenting (trying some searches on a "real" file and
seeing if they work) is the best way to learn.

There may be books at your bookshop; but keep in mind that Perl regular
expressions, Vim regular expressions, and "grep" regular expressions,
are all similar but not identical.

There are also two references and one URL at the very end of the "tutor"
file:

        :view $VIMRUNTIME/tutor/tutor
        G

I don't know if the books are still in print, or the URL still operational.


Best regards,
Tony.

Reply | Threaded
Open this post in threaded view
|

finding null fields in bar-separated records

jkilbour
I would like to identify the null fields in a set of files (which have
different numbers of fields; i.e. to find not just the number of fields
that are null but also which fields are null. Is this possible using vim
regular expressions?


Reply | Threaded
Open this post in threaded view
|

Re: finding null fields in bar-separated records

John (Eljay) Love-Jensen
Hi jkilbour,

>I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?

Given...
|one|two|three||five||seven|

You want to search for:
/||

And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?

Is that what you are asking?

I don't think that's possible.  (Which means Tony will show how to do it in a minute or two.)

I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields.  Then, with a separate search, all the empty 2nd fields.  Then with another separate search, all the empty 3rd fields.  4th, 5th, 6th, and finally empty 7th fields.

Would that suffice?

Here's an example of finding the empty fourth field:
/^\(|[^|]*\)\{3}\zs||

NOTE:  I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar.  If it doesn't, you'll have to adjust the search pattern accordingly.

HTH,
--Eljay

Reply | Threaded
Open this post in threaded view
|

Re: finding null fields in bar-separated records

A.J.Mechelynck
Eljay Love-Jensen wrote:

> Hi jkilbour,
>
>
>>I would like to identify the null fields in a set of files (which have different numbers of fields; i.e. to find not just the number of fields that are null but also which fields are null. Is this possible using vim regular expressions?
>
>
> Given...
> |one|two|three||five||seven|
>
> You want to search for:
> /||
>
> And you want the search to discriminate to you, somehow, whether your are sitting on empty field 4, or empty field 6?
>
> Is that what you are asking?
>
> I don't think that's possible.  (Which means Tony will show how to do it in a minute or two.)

Thanks for your high opinion of my capacities. Regexes were however
never my forte. I don't know if it is possible, but if it is, it would
require a more complicated regex than what I feel up to generating at
the moment. The first step would be to define what to replace the aobve
line by. Maybe generate a quickfix "error file" with would reference
each matching || (overlappings allowed!) so that :cn would find them all
in turn (using :vimgrep on Vim 7 if possible)? Or else, generate a file with

25|one|two|three|4|five|6|seven

or

25|4|6

if the line you showed was line 25?

Or something else? Let jkilbour answer.

>
> I do believe it is possible to attack the problem vertically, in that you can find all the empty 1st fields.  Then, with a separate search, all the empty 2nd fields.  Then with another separate search, all the empty 3rd fields.  4th, 5th, 6th, and finally empty 7th fields.
>
> Would that suffice?
>
> Here's an example of finding the empty fourth field:
> /^\(|[^|]*\)\{3}\zs||
>
> NOTE:  I presumed in my example that the 1st data data field is delimited with an initial vertical bar, and likewise the last data field is delimited with a terminating vertical bar.  If it doesn't, you'll have to adjust the search pattern accordingly.
>
> HTH,
> --Eljay
>
>
>
>

Best regards,
Tony.

Reply | Threaded
Open this post in threaded view
|

Re: finding null fields in bar-separated records

Hari Krishna Dara
In reply to this post by jkilbour

On Tue, 31 May 2005 at 6:46am, [hidden email] wrote:

> I would like to identify the null fields in a set of files (which have
> different numbers of fields; i.e. to find not just the number of fields
> that are null but also which fields are null. Is this possible using vim
> regular expressions?
>

Depending on what exactly you want to do with them, you might be able to
create multiple solutions. May be you can first number all of the fields
and then search for those that are empty to lookup them up.

function! Submatch()
  let g:idx = g:idx + 1
  let match = submatch(1)
  return (match == '' ? g:idx.':'.'<null>' : match)
endfunction

let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g

The above will transform

|one|two|three||five||seven|

into

|one|two|three|4:<null>|five|6:<null>|seven|

All that you need to do then is to search for nulls using a pattern such
as "\d\+:<null>". The actual regex to use to do the above substitution
will depend on exact specifications, such as can you have a "|"
character inside a field, and if so how you escape them. I am not a
regex guru myself, but if you need further help, you can describe your
needs in more details, for me or others on the list to come up with the
right pattern.

There are many regex gurus on this list, so I won't be surprised to see
a much simpler/easy to use solution. However, if you want to deal with a
programmatic approach, you can take a look at my multvals.vim plugin to
iterate over the fields and do something with them.

call MvIterCreate('|one|two|three||five||seven|', '|', 'Iter')
let n = 0
while MvIterHasNext('Iter')
  let ele = MvIterNext('Iter')
  if ele == ''
    echo 'Found null at: ' . n
  endif
  let n = n + 1
endwhile
call MvIterDestroy('Iter')

PS: Multvals treats the first "|" in the string also as a separator
resulting in one extra field, but you should be able to workaround that.

--
HTH,
Hari


               
__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new Resources site
http://smallbusiness.yahoo.com/resources/
Reply | Threaded
Open this post in threaded view
|

Re: finding null fields in bar-separated records

Antony Scriven
Hello

On May 31, Hari Krishna Dara wrote:

 > On Tue, 31 May 2005 at 6:46am, [hidden email] wrote:
 >
 > > I would like to identify the null fields in a set of
 > > files (which have different numbers of fields; i.e. to
 > > find not just the number of fields that are null but
 > > also which fields are null. Is this possible using vim
 > > regular expressions?
 >
 > Depending on what exactly you want to do with them, you
 > might be able to create multiple solutions. May be you
 > can first number all of the fields and then search for
 > those that are empty to lookup them up.
 >
 > function! Submatch()
 >   let g:idx = g:idx + 1
 >   let match = submatch(1)
 >   return (match == '' ? g:idx.':'.'<null>' : match)
 > endfunction
 >
 > let g:idx = 0 | s/|\([^|]*|\@=\)/\='|'.Submatch()/g
 >
 > The above will transform
 >
 > |one|two|three||five||seven|
 >
 > into
 >
 > |one|two|three|4:<null>|five|6:<null>|seven|
 >
 > [...]

This is somewhat shorter and transforms the whole buffer (if
I understand the problem correctly; I missed the original
mail):

%s/\(^.*|\)\@<=|/\=strlen(substitute(submatch(1),'[^|]','','g')).':<null>|'/g

But I offer this mostly for interesting ways to use \@<=. It
is, IMO, unmaintainable. I think \= plus a function, as Hari
has done, is normally the best approach for sort of thing.

Antony