Substitute pattern over multiple lines

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Substitute pattern over multiple lines

John Cordes
 I'm seeking help with editing a GEDCOM (genealogy) file. For
this I'm using Vim 8.2 in Windows. Here is a segment of text from
the file (the language doesn't make sense since I've deleted
some internal lines in the NOTEs which aren't relevant to the
question):

=======================
1 EVEN
2 TYPE tngnote
2 NOTE I have included the children William, Charles, Alice, and
with his parents in 1881, and with his widowed mother in 1
3 CONC 891 (e.g. see my online transcription of the 1891 Smiths
with James Moser, son of Henry Moser and Mary Henneberry, and his
wife Margaret Woodin; however
3 CONC , I have not yet taken this step.
1 BIRT
=======================
 
 The 2 lines beginning with ^3 CONC  are Continuation (CONC=Concatenation) lines.

 I want to surround the text of the NOTE with a 'div' tag, so that
the final result should look like this:

=======================
1 EVEN
2 TYPE tngnote
2 NOTE <div class="xxx">I have included the children William,
Charles, Alice, and with his parents in 1881, and with his widowed
mother in 1891 (e.g. see my online transcription of the 1891
Smiths with James Moser, son of Henry Moser and Mary Henneberry,
and his wife Margaret Woodin; however, I have not yet taken this
step.</div>
1 BIRT
=======================

 The complete GEDCOM file (which may have 850,000 or so lines) may
have NOTE tags with 0, 1, 2, or 3 CONC tags (probably no more than
that) following.
 
 It is this variable number of continuation lines which I find
most difficult to deal with.

 For the NOTE tags where there are no continuation lines I believe
this is working:

:g/^2 TYPE tngnote/+1s/^2 NOTE\(.*\)/2 NOTE <div class="xxx">\1 <\/div>/

 but when there are 1 or more CONC tags following the NOTE I get stuck.

 I tried:
:g/^2 TYPE tngnote/+1s/^2 NOTE\(.*\n\(3 CONC \(.*\)\)*\)/2 NOTE <div class="xxx">\1\3<\/div> /

 which 'almost' works if there is just 1 CONC tag (though it
leaves "3 CONC" in place which I don't want). So it's pretty bad!


 I realize this is pretty messy looking but I'm hoping one of the
experts who so generously contribute to this group may be able to
give me a pointer for how to deal with this.

 Thanks,
 John Cordes


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223214854.GA8272%40dal.ca.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Tim Chase
On 2020-12-23 17:48, John Cordes wrote:

>  I'm seeking help with editing a GEDCOM (genealogy) file. For
> this I'm using Vim 8.2 in Windows. Here is a segment of text from
> the file (the language doesn't make sense since I've deleted
> some internal lines in the NOTEs which aren't relevant to the
> question):
>
> =======================
> 1 EVEN
> 2 TYPE tngnote
> 2 NOTE I have included the children William, Charles, Alice, and
> with his parents in 1881, and with his widowed mother in 1
> 3 CONC 891 (e.g. see my online transcription of the 1891 Smiths
> with James Moser, son of Henry Moser and Mary Henneberry, and his
> wife Margaret Woodin; however
> 3 CONC , I have not yet taken this step.
> 1 BIRT
> =======================
>  
>  The 2 lines beginning with ^3 CONC  are Continuation
> (CONC=Concatenation) lines.
>
>  I want to surround the text of the NOTE with a 'div' tag, so that
> the final result should look like this:
>
> =======================
> 1 EVEN
> 2 TYPE tngnote
> 2 NOTE <div class="xxx">I have included the children William,
> Charles, Alice, and with his parents in 1881, and with his widowed
> mother in 1891 (e.g. see my online transcription of the 1891
> Smiths with James Moser, son of Henry Moser and Mary Henneberry,
> and his wife Margaret Woodin; however, I have not yet taken this
> step.</div>
> 1 BIRT
> =======================

I'd start with this ugly monstrosity:

:%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
'g'), '\n', '', 'g')."<\/div>\n"

(all one line in case it breaks in the mail)

If you only want it to do "2 NOTE" lines, you can change that initial

  2 \u\{3,} \zs

(which does any item that has continuations) to

  2 NOTE \zs

This does join *all* the lines and doesn't re-wrap them, so you'd
then want a second pass to do the wrapping

  :set tw=70
  :g/<div [^>]*>.*<\/div>$/norm gqq

Hope this gives you some ideas to work with.

-tim



--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223170832.7ca70687%40bigbox.attlocal.net.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes
 
On Wed, Dec 23, 2020 at 05:08:32PM -0600, Tim Chase wrote:

> On 2020-12-23 17:48, John Cordes wrote:
> >  I'm seeking help with editing a GEDCOM (genealogy) file. For
> > this I'm using Vim 8.2 in Windows. Here is a segment of text from
> > the file (the language doesn't make sense since I've deleted
> > some internal lines in the NOTEs which aren't relevant to the
> > question):
> >
> > =======================
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 NOTE I have included the children William, Charles, Alice, and
> > with his parents in 1881, and with his widowed mother in 1
> > 3 CONC 891 (e.g. see my online transcription of the 1891 Smiths
> > with James Moser, son of Henry Moser and Mary Henneberry, and his
> > wife Margaret Woodin; however
> > 3 CONC , I have not yet taken this step.
> > 1 BIRT
> > =======================
> >
> >  The 2 lines beginning with ^3 CONC  are Continuation
> > (CONC=Concatenation) lines.
> >
> >  I want to surround the text of the NOTE with a 'div' tag, so that
> > the final result should look like this:
> >
> > =======================
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 NOTE <div class="xxx">I have included the children William,
> > Charles, Alice, and with his parents in 1881, and with his widowed
> > mother in 1891 (e.g. see my online transcription of the 1891
> > Smiths with James Moser, son of Henry Moser and Mary Henneberry,
> > and his wife Margaret Woodin; however, I have not yet taken this
> > step.</div>
> > 1 BIRT
> > =======================
>
> I'd start with this ugly monstrosity:
>
> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
> (all one line in case it breaks in the mail)
>
> If you only want it to do "2 NOTE" lines, you can change that initial
>
>   2 \u\{3,} \zs
>
> (which does any item that has continuations) to
>
>   2 NOTE \zs
>
> This does join *all* the lines and doesn't re-wrap them, so you'd
> then want a second pass to do the wrapping
>
>   :set tw=70
>   :g/<div [^>]*>.*<\/div>$/norm gqq
>
> Hope this gives you some ideas to work with.

 Yes indeed Tim -- an excellent idea. Thanks very much.
 I will attempt to deconstruct your 'monstrosity' somewhat later,
but I've been trying to get things to work with my situation.

 It's a bit more complicated than I first explained. Two aspects:
a) I *do* need to search on the "2 NOTE" lines, since there are
various other chunks of lines with the CONC lines; and
b) Sometimes the line "2 TYPE tngnote" has a line between it and
the "2 NOTE". The intervening line can look like this

2 DATE 18 AUG 1776
 or this
2 _SDATE 1802

 So the lines to change could look like this:

===================
1 EVEN
2 TYPE tngnote
2 _SDATE 1802
2 NOTE The surname of John's wife is not positively established.
However, it is certain that her given name is Elizabeth; evidence
for this comes first from the baptismal records for Rebecca and
Eliza Catherine; these children were born while th
3 CONC e family was in London so the records are available in the
London Metropolitan Archives (the other two children were born in
Sheffield). Henry's baptismal record in Sheffield also has his
parents being John (a skinner) and Elizabeth. The id
3 CONC entification of John's wife specifically with  Elizabeth
Coxsey is somewhat tentative, however.
1 EVEN
===================

 This search pattern
/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE

 works to find all 3 possibilities: no DATE line, an _SDATE line
or a DATE line.

 I thought I would be able to combine that with your pattern like so:

:%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"

 but that is not working. Here's an example of one small chunk of
lines which were transformed by that command:

1 EVEN
2 TYPE tngnote
2 DATE 18 AUG 1776
2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
1 EVEN

 The command is eliminating the content which had been in the NOTE tags altogether.

 I will keep trying, but more help would be terrific!

 Thanks,
 John

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224003911.GA16492%40dal.ca.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

George Dinwiddie
In reply to this post by Tim Chase
Why not use

:%s/\n3 CONC //

to concatenate all the continuations and then use

:%s/\(2 NOTE \)\(.*\)/\1<div> class="xxx">\2<\/div>/

to turn all the NOTE lines into <div> blocks? Or am I misunderstanding
something about the transformation you need?

  - George

On 12/23/20 6:08 PM, Tim Chase wrote:

> On 2020-12-23 17:48, John Cordes wrote:
>>   I'm seeking help with editing a GEDCOM (genealogy) file. For
>> this I'm using Vim 8.2 in Windows. Here is a segment of text from
>> the file (the language doesn't make sense since I've deleted
>> some internal lines in the NOTEs which aren't relevant to the
>> question):
>>
>> =======================
>> 1 EVEN
>> 2 TYPE tngnote
>> 2 NOTE I have included the children William, Charles, Alice, and
>> with his parents in 1881, and with his widowed mother in 1
>> 3 CONC 891 (e.g. see my online transcription of the 1891 Smiths
>> with James Moser, son of Henry Moser and Mary Henneberry, and his
>> wife Margaret Woodin; however
>> 3 CONC , I have not yet taken this step.
>> 1 BIRT
>> =======================
>>  
>>   The 2 lines beginning with ^3 CONC  are Continuation
>> (CONC=Concatenation) lines.
>>
>>   I want to surround the text of the NOTE with a 'div' tag, so that
>> the final result should look like this:
>>
>> =======================
>> 1 EVEN
>> 2 TYPE tngnote
>> 2 NOTE <div class="xxx">I have included the children William,
>> Charles, Alice, and with his parents in 1881, and with his widowed
>> mother in 1891 (e.g. see my online transcription of the 1891
>> Smiths with James Moser, son of Henry Moser and Mary Henneberry,
>> and his wife Margaret Woodin; however, I have not yet taken this
>> step.</div>
>> 1 BIRT
>> =======================
>
> I'd start with this ugly monstrosity:
>
> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
> (all one line in case it breaks in the mail)
>
> If you only want it to do "2 NOTE" lines, you can change that initial
>
>    2 \u\{3,} \zs
>
> (which does any item that has continuations) to
>
>    2 NOTE \zs
>
> This does join *all* the lines and doesn't re-wrap them, so you'd
> then want a second pass to do the wrapping
>
>    :set tw=70
>    :g/<div [^>]*>.*<\/div>$/norm gqq
>
> Hope this gives you some ideas to work with.
>
> -tim
>
>
>

--
  ----------------------------------------------------------------------
   * George Dinwiddie *                      http://blog.gdinwiddie.com
   Software Development                    http://www.idiacomputing.com
   Consultant and Coach         https://pragprog.com/titles/gdestimate/
  ----------------------------------------------------------------------

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/c69245fc-7e67-f39d-5eb4-587a4d272108%40iDIAcomputing.com.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes

On Wed, Dec 23, 2020 at 9:04 PM George Dinwiddie <[hidden email]> wrote:
Why not use

:%s/\n3 CONC //

to concatenate all the continuations and then use

:%s/\(2 NOTE \)\(.*\)/\1<div> class="xxx">\2<\/div>/

to turn all the NOTE lines into <div> blocks? Or am I misunderstanding
something about the transformation you need?

  - George

 One big problem with the first part is that I *only* want to concatenate the continuation lines when they appear immediately following a "2 NOTE..." tag, AND that "2 NOTE" tag must be either the next or next but one line after "2 TYPE tngnote". 
 
 I neglected to make it clear earlier that I need to first search on  "2 TYPE tngnote" since there are other "2 TYPE" tags where I don't want to change anything.

 John 

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAGZBEdQR%2B2FtPG%3D%2B%3DNUWtDXCwgAXgBK764J-71VdKj_1tJ-imQ%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Tim Chase
In reply to this post by John Cordes
On 2020-12-23 20:39, John Cordes wrote:
>> I'd start with this ugly monstrosity:
>>
>> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div  
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  I will attempt to deconstruct your 'monstrosity' somewhat later,

Tweaking it so that it only does NOTE items, not generic
continuations:

:%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',  
'g'), '\n', '', 'g')."<\/div>\n"

Breaking it down so hopefully you can swap parts as you see fit:

:%s/^2 NOTE \zs     On every line starting with "2 NOTE "
                    start our replacement here (\zs)
\(                  start capturing the note
                    this will be submatch(1) later
.*                  everything else on that line
\n                  and the newline
\%(                 a non-capturing group for another line that
\%(\D               starts with either a non-digit
\|                  or
3 CONC              a literal "3 CONC "
\)                  (end of this OR of things marking a continuation)
.*\n                followed by the rest of the line
\)                  (end of this continuation-line)
\+                  we can have 1 or more continuation lines
\)                  end the capturing
/                   replace it with
\=                  the result of evaluating this expression
'<div class="xxx">' the literal opening tag
.                   and then the results of
substitute(         remove all the newlines from the results of
 substitute(        removing from
  submatch(1),      the whole set of continuation stuff
  '\n3 CONC ',      the literal newline-followed-by-"3 CONC "
  '',               and replace them with nothing
  'g'               everywhere
  ),                and in that "\n3 CONC "-less text, replace
 '\n',              newlines with
 '',                nothing
 'g')               everywhere
.                   and then tack on
"<\/div>\n"         the literal closing </div> followed by a newline

>  It's a bit more complicated than I first explained. Two aspects:
> a) I *do* need to search on the "2 NOTE" lines, since there are
> various other chunks of lines with the CONC lines; and
> b) Sometimes the line "2 TYPE tngnote" has a line between it and
> the "2 NOTE". The intervening line can look like this
>
> 2 DATE 18 AUG 1776
>  or this
> 2 _SDATE 1802

Given the substitution command above, it should only touch "2 NOTE"
lines with subsequent "3 CONT" lines.  It does *every* "2 NOTE" so if
you need to limit them to just those that immediately follow "2 TYPE
tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
have a NOTE immediately following them), you can tweak that command,
changing that inital "%" to

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…

This looks for all the "2 TYPE tngnote" lines, searches forward
(skipping over any DATE/_SDATE lines or other intervening stuff) for
the "2 NOTE " line following it, and then only performs the
subsitution on those particular lines.

>  So the lines to change could look like this:
>
> ===================
> 1 EVEN
> 2 TYPE tngnote
> 2 _SDATE 1802
> 2 NOTE The surname of John's wife is not positively established.
> However, it is certain that her given name is Elizabeth; evidence
> for this comes first from the baptismal records for Rebecca and
> Eliza Catherine; these children were born while th
> 3 CONC e family was in London so the records are available in the
> London Metropolitan Archives (the other two children were born in
> Sheffield). Henry's baptismal record in Sheffield also has his
> parents being John (a skinner) and Elizabeth. The id
> 3 CONC entification of John's wife specifically with  Elizabeth
> Coxsey is somewhat tentative, however.
> 1 EVEN
> ===================
>
>  This search pattern
> /^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
>
>  works to find all 3 possibilities: no DATE line, an _SDATE line
> or a DATE line.
>
>  I thought I would be able to combine that with your pattern like
> so:
>
> :%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  but that is not working.

I suspect that the problem snuck in by using \(…\) in your added
conditions which captured that as submatch(1).  So you can either
make it non-capturing by adding that "%" before the open-paren:

  \%(\_^2 .*DATE.*\)

or change the "submatch(1)" to "submatch(2)"

> Here's an example of one small chunk of
> lines which were transformed by that command:
>
> 1 EVEN
> 2 TYPE tngnote
> 2 DATE 18 AUG 1776
> 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> 1 EVEN

Note that the content here is what you captured in the first group.
:-)

Hope this helps get you on the right path,

-tim




--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223193113.36cd777d%40bigbox.attlocal.net.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes

On Wed, Dec 23, 2020 at 9:31 PM Tim Chase <[hidden email]> wrote:
On 2020-12-23 20:39, John Cordes wrote:
>> I'd start with this ugly monstrosity:
>>
>> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div 
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  I will attempt to deconstruct your 'monstrosity' somewhat later,

Tweaking it so that it only does NOTE items, not generic
continuations:

:%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 
'g'), '\n', '', 'g')."<\/div>\n"

Breaking it down so hopefully you can swap parts as you see fit:

:%s/^2 NOTE \zs     On every line starting with "2 NOTE "
                    start our replacement here (\zs)
\(                  start capturing the note
                    this will be submatch(1) later
.*                  everything else on that line
\n                  and the newline
\%(                 a non-capturing group for another line that
\%(\D               starts with either a non-digit
\|                  or
3 CONC              a literal "3 CONC "
\)                  (end of this OR of things marking a continuation)
.*\n                followed by the rest of the line
\)                  (end of this continuation-line)
\+                  we can have 1 or more continuation lines
\)                  end the capturing
/                   replace it with
\=                  the result of evaluating this expression
'<div class="xxx">' the literal opening tag
.                   and then the results of
substitute(         remove all the newlines from the results of
 substitute(        removing from
  submatch(1),      the whole set of continuation stuff
  '\n3 CONC ',      the literal newline-followed-by-"3 CONC "
  '',               and replace them with nothing
  'g'               everywhere
  ),                and in that "\n3 CONC "-less text, replace
 '\n',              newlines with
 '',                nothing
 'g')               everywhere
.                   and then tack on
"<\/div>\n"         the literal closing </div> followed by a newline

>  It's a bit more complicated than I first explained. Two aspects:
> a) I *do* need to search on the "2 NOTE" lines, since there are
> various other chunks of lines with the CONC lines; and
> b) Sometimes the line "2 TYPE tngnote" has a line between it and
> the "2 NOTE". The intervening line can look like this
>
> 2 DATE 18 AUG 1776
>  or this
> 2 _SDATE 1802

Given the substitution command above, it should only touch "2 NOTE"
lines with subsequent "3 CONT" lines.  It does *every* "2 NOTE" so if
you need to limit them to just those that immediately follow "2 TYPE
tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
have a NOTE immediately following them), you can tweak that command,
changing that inital "%" to

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…

This looks for all the "2 TYPE tngnote" lines, searches forward
(skipping over any DATE/_SDATE lines or other intervening stuff) for
the "2 NOTE " line following it, and then only performs the
subsitution on those particular lines.

>  So the lines to change could look like this:
>
> ===================
> 1 EVEN
> 2 TYPE tngnote
> 2 _SDATE 1802
> 2 NOTE The surname of John's wife is not positively established.
> However, it is certain that her given name is Elizabeth; evidence
> for this comes first from the baptismal records for Rebecca and
> Eliza Catherine; these children were born while th
> 3 CONC e family was in London so the records are available in the
> London Metropolitan Archives (the other two children were born in
> Sheffield). Henry's baptismal record in Sheffield also has his
> parents being John (a skinner) and Elizabeth. The id
> 3 CONC entification of John's wife specifically with  Elizabeth
> Coxsey is somewhat tentative, however.
> 1 EVEN
> ===================
>
>  This search pattern
> /^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
>
>  works to find all 3 possibilities: no DATE line, an _SDATE line
> or a DATE line.
>
>  I thought I would be able to combine that with your pattern like
> so:
>
> :%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  but that is not working.

I suspect that the problem snuck in by using \(…\) in your added
conditions which captured that as submatch(1).  So you can either
make it non-capturing by adding that "%" before the open-paren:

  \%(\_^2 .*DATE.*\)

or change the "submatch(1)" to "submatch(2)"

> Here's an example of one small chunk of
> lines which were transformed by that command:
>
> 1 EVEN
> 2 TYPE tngnote
> 2 DATE 18 AUG 1776
> 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> 1 EVEN

Note that the content here is what you captured in the first group.
:-)

Hope this helps get you on the right path,

-tim

 
 This is amazing looking, Tim -- thanks so much! There is a lot for a nearly 80-year old to unpack here -- it's going to take me a while. :)
  It looks as though you have covered all the bases I want to deal with. 

 Thank you again,
 John
    

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAGZBEdSChuJr8t82%3DOE-aMwQ6GgXyUKj-6SnBMmpQJLEHC9h%2BA%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes
On Wed, Dec 23, 2020 at 10:07:26PM -0400, John Cordes wrote:
>
> On Wed, Dec 23, 2020 at 9:31 PM Tim Chase <[hidden email]> wrote:
>
>     On 2020-12-23 20:39, John Cordes wrote:
>     >> I'd start with this ugly monstrosity:
>     >>
>     >> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
>     >> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>     >> 'g'), '\n', '', 'g')."<\/div>\n"

>     :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…
>
>     Hope this helps get you on the right path,
>
>     -tim
>  
>  This is amazing looking, Tim -- thanks so much! There is a lot for a nearly
> 80-year old to unpack here -- it's going to take me a while. :)
>   It looks as though you have covered all the bases I want to deal with.
>
>  Thank you again,
>  John

 Just a quick report to say that following your suggestion above leads to:

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"

 which as far as I can tell at the moment is working perfectly,
handling all situations the way I wanted. I will check further and
also test on another GEDCOM file when I'm fresher.

 Thanks again Tim; I have learned a lot. Now if it would only stick...

 John

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224023438.GA19170%40dal.ca.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes
On Wed, Dec 23, 2020 at 10:34:38PM -0400, John Cordes wrote:

> On Wed, Dec 23, 2020 at 10:07:26PM -0400, John Cordes wrote:
> >
> > On Wed, Dec 23, 2020 at 9:31 PM Tim Chase <[hidden email]> wrote:
> >
> >     On 2020-12-23 20:39, John Cordes wrote:
> >     >> I'd start with this ugly monstrosity:
> >     >>
> >     >> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> >     >> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> >     >> 'g'), '\n', '', 'g')."<\/div>\n"
>
> >     :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…
> >
> >     Hope this helps get you on the right path,
> >
> >     -tim
> >  
> >  This is amazing looking, Tim -- thanks so much! There is a lot for a nearly
> > 80-year old to unpack here -- it's going to take me a while. :)
> >   It looks as though you have covered all the bases I want to deal with.
> >
> >  Thank you again,
> >  John
>
>  Just a quick report to say that following your suggestion above leads to:
>
> :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"
>
>  which as far as I can tell at the moment is working perfectly,
> handling all situations the way I wanted. I will check further and
> also test on another GEDCOM file when I'm fresher.
>
>  Thanks again Tim; I have learned a lot. Now if it would only stick...
>
>  John

 Tim,

 I hate to trouble you further about this, but possibly while it
is still reasonably fresh in your mind...

 The last ":g..." command I listed above is working correctly
when there are continuation lines (i.e. at least one "3 CONC" tag
following the "2 NOTE" tag, but I think it seems to be skipping by
the "2 NOTE" tags which do *not* have a CONC / Continuation tag.
 I thought the pattern would be allowing for no CONC tags but I'm
not seeing what is wrong.
 At least I *think* that's what I am seeing.

 John

 

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224031837.GA30109%40dal.ca.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Tim Chase
On 2020-12-23 23:18, John Cordes wrote:
>> :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC
>> \).*\n\)\+\)/\='<div
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  The last ":g..." command I listed above is working correctly
> when there are continuation lines (i.e. at least one "3 CONC" tag
> following the "2 NOTE" tag, but I think it seems to be skipping by
> the "2 NOTE" tags which do *not* have a CONC / Continuation tag.

Ah, while I'm not positive (so shooting from the hip here) I think you
want to change the

  \+

(one or more continuation lines) to just

  *

(zero or more continuation lines) to produce

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC
\).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1),
'\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"

(I also snuck in an extra "%" in the inner \(…\) which I missed when
transcribing it earlier, but shouldn't impact the results)

-tim



--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223215706.6f09714c%40bigbox.attlocal.net.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes


On Wed, Dec 23, 2020 at 11:57 PM Tim Chase <[hidden email]> wrote:
On 2020-12-23 23:18, John Cordes wrote:
>> :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC
>> \).*\n\)\+\)/\='<div
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  The last ":g..." command I listed above is working correctly
> when there are continuation lines (i.e. at least one "3 CONC" tag
> following the "2 NOTE" tag, but I think it seems to be skipping by
> the "2 NOTE" tags which do *not* have a CONC / Continuation tag.

Ah, while I'm not positive (so shooting from the hip here) I think you
want to change the

  \+

(one or more continuation lines) to just

  *

(zero or more continuation lines) to produce

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC
\).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1),
'\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"

(I also snuck in an extra "%" in the inner \(…\) which I missed when
transcribing it earlier, but shouldn't impact the results)

-tim

 

 I think that did it - on a quick check.
I had tried changing that "\+" to "\=" thinking that would allow for 0 or 1 but something went wrong - can't remember exactly what right now. I should have just tried * - can't think why I didn't.

 Thanks again!
 John


  

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAGZBEdShfP_x%3DKDQB2Gd3Yg%3DDkeBTXCSF5xWw7ZoJ%3DTgYs3_Xg%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Tim Chase
In reply to this post by John Cordes
On 2020-12-23 22:34, John Cordes wrote:
> :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC
> \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  which as far as I can tell at the moment is working perfectly,
> handling all situations the way I wanted.

The only glaring edge-case is a situation in which a "2 TYPE tngnote"
section is followed by *no* NOTE, followed by a section that *isn't*
a "2 TYPE tngnote" that *does* have a NOTE that shouldn't be touched
such as

  2 TYPE tngnote
  9 TIM FAKE ANNOTATION this tngnote has no NOTE
  2 TYPE granola
  2 NOTE Don't touch granola-type notes or
  3 CONC rewrap their content or add <div>s!

In such a case, it will wrap the NOTE even though it's in a different
TYPE that shouldn't be touched because it's the first NOTE after a "2
TYPE tngnote", even though it's in a different section.

So that's where I'd focus my checking :-)

-tim



--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223225243.40882090%40bigbox.attlocal.net.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes
On Wed, Dec 23, 2020 at 10:52:43PM -0600, Tim Chase wrote:

> On 2020-12-23 22:34, John Cordes wrote:
> > :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC
> > \).*\n\)\+\)/\='<div
> > class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> > 'g'), '\n', '', 'g')."<\/div>\n"
> >
> >  which as far as I can tell at the moment is working perfectly,
> > handling all situations the way I wanted.
>
> The only glaring edge-case is a situation in which a "2 TYPE tngnote"
> section is followed by *no* NOTE, followed by a section that *isn't*
> a "2 TYPE tngnote" that *does* have a NOTE that shouldn't be touched
> such as
>
>   2 TYPE tngnote
>   9 TIM FAKE ANNOTATION this tngnote has no NOTE
>   2 TYPE granola
>   2 NOTE Don't touch granola-type notes or
>   3 CONC rewrap their content or add <div>s!
>
> In such a case, it will wrap the NOTE even though it's in a different
> TYPE that shouldn't be touched because it's the first NOTE after a "2
> TYPE tngnote", even though it's in a different section.
>
> So that's where I'd focus my checking :-)
>
> -tim

 Thanks Tim. The GEDCOM file, exported from my desktop genealogy
program TMG, *shouldn't* have a case like that (a tngnote tag
which isn't followed by its own Note), but... Hey, it's produced
by a large, complex, software program which is no longer supported
and does have a few known bugs. So obviously one can never
guarantee what will actually happen in practice.

 I will certainly keep a lookout for this edge case -- it would
indeed lead to very undesirable results. Presumably I should be
able to do a search for two successive "2 TYPE tngnote" entries
which don't have an intervening "2 NOTE " tag. Not sure how, but
I'll give it a try. :-)

 Thanks for the heads-up on this,
 John

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224145741.GB30109%40dal.ca.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Steve Litt
In reply to this post by Tim Chase
On Wed, 23 Dec 2020 17:08:32 -0600
Tim Chase <[hidden email]> wrote:


>   2 NOTE \zs
>
> This does join *all* the lines and doesn't re-wrap them, so you'd
> then want a second pass to do the wrapping
>
>   :set tw=70
>   :g/<div [^>]*>.*<\/div>$/norm gqq

His destination is HTML so he doesn't need to wrap them: The browser
will wrap them for him.
 
SteveT

Steve Litt
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224144324.41cb95a0%40mydesk.domain.cxm.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Steve Litt
In reply to this post by John Cordes
On Wed, 23 Dec 2020 21:16:20 -0400
John Cordes <[hidden email]> wrote:


>  One big problem with the first part is that I *only* want to
> concatenate the continuation lines when they appear immediately
> following a "2 NOTE..." tag, AND that "2 NOTE" tag must be either the
> next or next but one line after "2 TYPE tngnote".
>
>  I neglected to make it clear earlier that I need to first search on
> "2 TYPE tngnote" since there are other "2 TYPE" tags where I don't
> want to change anything.

Personally I'd do this as an AWK program (not an AWK one-liner). Have a
variable that gets incremented once when you hit "2 NOTE tngnote", gets
incremented again when you hit a "2 NOTE" 1 or 2 lines below, and
incremented again when you hit "3 CONC". If you increment twice like
this, you remove the "3 CONC" from the beginning of the each "3 CONC"
line and output it. At the end of the continuations, you put a </div>.
This requires that you put the corresponding <div> just before you
output the "2 NOTE" line.

If, at any time, you hit a line that forecloses the possibility of such
line-grafting, you drop the variable back to its original value.

It would also be very easy in Python, Python's advantage is that it can
easily store lines and "look back" before printing them. AWK can do
that, but it's more difficult.

I know this is offtopic on this list, but I think any Vim or ex
solution that can be made will be fragile and difficult to understand.

SteveT

Steve Litt
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224150120.72235c5f%40mydesk.domain.cxm.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

John Cordes
In reply to this post by Steve Litt


On Thu, Dec 24, 2020 at 3:43 PM Steve Litt <[hidden email]> wrote:
On Wed, 23 Dec 2020 17:08:32 -0600
Tim Chase <[hidden email]> wrote:


>   2 NOTE \zs
>
> This does join *all* the lines and doesn't re-wrap them, so you'd
> then want a second pass to do the wrapping
>
>   :set tw=70
>   :g/<div [^>]*>.*<\/div>$/norm gqq

His destination is HTML so he doesn't need to wrap them: The browser
will wrap them for him.

  Correct.  Clearly Tim's initial response was intended to deal with the precise format I said I wanted the output to be in.

 Things are working very well now thanks to the excellent help from Tim.

 John Cordes 

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAGZBEdQdqN-J_usNw0kaHuWRfr_97xO0CcfEfr%3DpryLE_uPCNQ%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

Tim Chase
In reply to this post by Steve Litt
On 2020-12-24 14:43, Steve Litt wrote:

> On Wed, 23 Dec 2020 17:08:32 -0600
> Tim Chase <[hidden email]> wrote:
>>   2 NOTE \zs
>>
>> This does join *all* the lines and doesn't re-wrap them, so you'd
>> then want a second pass to do the wrapping
>>
>>   :set tw=70
>>   :g/<div [^>]*>.*<\/div>$/norm gqq  
>
> His destination is HTML so he doesn't need to wrap them: The browser
> will wrap them for him.

However he also wrote

"""
I want to surround the text of the NOTE with a 'div' tag, so that
the final result should look like this:

=======================
1 EVEN
2 TYPE tngnote
2 NOTE <div class="xxx">I have included the children William,
Charles, Alice, and with his parents in 1881, and with his widowed
mother in 1891 (e.g. see my online transcription of the 1891
Smiths with James Moser, son of Henry Moser and Mary Henneberry,
and his wife Margaret Woodin; however, I have not yet taken this
step.</div>
1 BIRT
=======================
"""

which included the wrapping (even if the HTML rendering engine would
do that for him) in the example desired output, so I included the
fairly straight-forward means by which one could do that if needed.

-tim


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224143527.693a87df%40bigbox.attlocal.net.
Reply | Threaded
Open this post in threaded view
|

Re: Substitute pattern over multiple lines

jcordes
In reply to this post by Steve Litt


On Thursday, 24 December 2020 at 16:01:35 UTC-4 stevelitt wrote:
On Wed, 23 Dec 2020 21:16:20 -0400
John Cordes <john....@...> wrote:


> One big problem with the first part is that I *only* want to
> concatenate the continuation lines when they appear immediately
> following a "2 NOTE..." tag, AND that "2 NOTE" tag must be either the
> next or next but one line after "2 TYPE tngnote".
>
> I neglected to make it clear earlier that I need to first search on
> "2 TYPE tngnote" since there are other "2 TYPE" tags where I don't
> want to change anything.

Personally I'd do this as an AWK program (not an AWK one-liner). Have a
variable that gets incremented once when you hit "2 NOTE tngnote", gets
incremented again when you hit a "2 NOTE" 1 or 2 lines below, and
incremented again when you hit "3 CONC". If you increment twice like
this, you remove the "3 CONC" from the beginning of the each "3 CONC"
line and output it. At the end of the continuations, you put a </div>.
This requires that you put the corresponding <div> just before you
output the "2 NOTE" line.

If, at any time, you hit a line that forecloses the possibility of such
line-grafting, you drop the variable back to its original value.

It would also be very easy in Python, Python's advantage is that it can
easily store lines and "look back" before printing them. AWK can do
that, but it's more difficult.

I know this is offtopic on this list, but I think any Vim or ex
solution that can be made will be fragile and difficult to understand.

 Steve,

  I do understand. I am quite sure that if I had asked my son for help with this we would have ended up with an AWK script. That has happened before for at least one vaguely similar sort of job (in the sense of storing lines and checking back). I just really like using Vim, even though my skills for the more advanced techniques are sadly lacking.
 I have intended for ages to learn Python (I know that it is generally said to be very easy to learn) but it hasn't happened - not sure it ever will.
 
 John

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/ffbfce0c-725c-4794-8ae9-f60330de51d4n%40googlegroups.com.