using vim to add <a href= ...> links to an epub index file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

using vim to add <a href= ...> links to an epub index file

Chris Jones-44
I am currently in the final stages of putting together an epub version of
Auguste Escoffier's _Le Guide Culinaire_.

Since this is a "cookbook" of sorts, the last step before proofreading
pretty much requires building a working index with html style links to
the text relative to each entry.

In an epub context this can be achieved by wrapping the text of each
entry in  something of the form:

            <a href="../Text/file.xhtml#p0001">index_entry</a>

where "file.xhtml" is one of the files making up the text of the e-book
and "p0001" has been defined as an "< ... id="p0001"> within the file.

There are over 6000 entries in this index, which (loudly) suggests that
in this instance it might be worth spending a few hours concocting some
form of automated solution to add all the < href > links to the file in
one fell swoop rather than doing it manually.

The index is a repetition of lines with the following structure:


            <div class="ind-01"></div>
              <div class="ind-02">Abatis</div>
              <div class="ind-03">621</div>

            <div class="ind-01"></div>
              <div class="ind-02">    —     à la Bourguignonne</div>
              <div class="ind-03">621</div>

              ...


After loading the index file in a vim buffer I have found that:

1. I can match all page entries in a non-ambiguous manner by a search
   with the following pattern: "/\d\+<"

   The match as highlighted via ":set hlsearch" includes the page number
   and nothing else and the cursor sits on the first digit of the page
   number.

2. I can invoke the following one-liner from vim with the page number as
   an argument and it returns the generated link:


       #!/bin/bash
       
       grep -o 'p0[0-9][0-9][0-9]' *.htm  |                                         \
         awk 'BEGIN { FS=":"} {print "<a href=\"../Text/" $1 "#" $2 "\"" ">" }' |   \
         grep "$1"
       
       exit 0


  ... like so:

       :r ! My_script 0621

  generates the link and writes it to the vim buffer:

       <a href="../Text/gc0306.htm#p0621">

What I am missing at this point:

1. I need to retrieve the matched string of the current "/\d\+<" search
   and place it in some kind of vim variable (?) that I can use to
   invoke the script so that it can be done iteratively without having
   to tyoe the page number manually:

       :r ! my_script $vim_variable

2. I need to find a way to remove any new-line character(s) so that the
   output of "My_script $vim_variable" is placed at the right spot in
   the buffer: after I invoke the script using ":r ! My_script"... the
   output is inserted in column 0 on a new line immediately after the
   matching string:


       <div class="ind-01"></div>
         <div class="ind-02">Abatis</div>
         <div class="ind-03">621</div>
   <a href="../Text/gc0306.htm#p0621">


3. A third issue is  adding the closing "</a>" tag after the targeted
   text, thus completing the wrapping of the entry so that the end
   result of one iteration looks exactly like this:


       <div class="ind-01"></div>
         <div class="ind-02"><a href="../Text/gc0306.htm#p0621"> Abatis</a></div>
         <div class="ind-03">621</div>


In other words, I need to put together some kind of front-end...
presumably in vimscript (so that I have ability to navigate the lines in
the buffer)... that does the three things described above:

1. grab the current matched string/page number, pass it to the bash
   one-liner to generate the corresponding <a href="..."> and return
   the result to vim.

2. move the cursor to the first character of the corresponding index
   entry (the text and the page number are vertically aligned so that
   hitting "k" on the keyboard does exactly that...) and insert the
   generated text before the cursor (iow, what a Shift-P would do)

3. jump to the opening "<" of the closing </div> tag and insert "</a>"
   before the cursor.

Another approach I considered  might consist in recording a vim macro
that would reproduce manual actions at the keyboard and run it
iteratively against the buffer. But I doubt line-mode commands such as
":r ! ..." would be recorded.

Please let me know if this is at all feasible in vim (and vim might
offer better means of achieving what I am trying to do) or whether
I should look at other options.

Thanks,

CJ

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: using vim to add <a href= ...> links to an epub index file

porphyry5
On Thursday, November 23, 2017 at 3:53:58 PM UTC-8, Chris Jones wrote:

> I am currently in the final stages of putting together an epub version of
> Auguste Escoffier's _Le Guide Culinaire_.
>
> Since this is a "cookbook" of sorts, the last step before proofreading
> pretty much requires building a working index with html style links to
> the text relative to each entry.
>
> In an epub context this can be achieved by wrapping the text of each
> entry in  something of the form:
>
>             <a href="../Text/file.xhtml#p0001">index_entry</a>
>
> where "file.xhtml" is one of the files making up the text of the e-book
> and "p0001" has been defined as an "< ... id="p0001"> within the file.
>
> There are over 6000 entries in this index, which (loudly) suggests that
> in this instance it might be worth spending a few hours concocting some
> form of automated solution to add all the < href > links to the file in
> one fell swoop rather than doing it manually.
>
> The index is a repetition of lines with the following structure:
>
>
>             <div class="ind-01"></div>
>               <div class="ind-02">Abatis</div>
>               <div class="ind-03">621</div>
>
>             <div class="ind-01"></div>
>               <div class="ind-02">    —     à la Bourguignonne</div>
>               <div class="ind-03">621</div>
>
>               ...
>
>
> After loading the index file in a vim buffer I have found that:
>
> 1. I can match all page entries in a non-ambiguous manner by a search
>    with the following pattern: "/\d\+<"
>
>    The match as highlighted via ":set hlsearch" includes the page number
>    and nothing else and the cursor sits on the first digit of the page
>    number.
>
> 2. I can invoke the following one-liner from vim with the page number as
>    an argument and it returns the generated link:
>
>
>        #!/bin/bash
>        
>        grep -o 'p0[0-9][0-9][0-9]' *.htm  |                                         \
>          awk 'BEGIN { FS=":"} {print "<a href=\"../Text/" $1 "#" $2 "\"" ">" }' |   \
>          grep "$1"
>        
>        exit 0
>
>
>   ... like so:
>
>        :r ! My_script 0621
>
>   generates the link and writes it to the vim buffer:
>
>        <a href="../Text/gc0306.htm#p0621">
>
> What I am missing at this point:
>
> 1. I need to retrieve the matched string of the current "/\d\+<" search
>    and place it in some kind of vim variable (?) that I can use to
>    invoke the script so that it can be done iteratively without having
>    to tyoe the page number manually:
>
>        :r ! my_script $vim_variable
>
> 2. I need to find a way to remove any new-line character(s) so that the
>    output of "My_script $vim_variable" is placed at the right spot in
>    the buffer: after I invoke the script using ":r ! My_script"... the
>    output is inserted in column 0 on a new line immediately after the
>    matching string:
>
>
>        <div class="ind-01"></div>
>          <div class="ind-02">Abatis</div>
>          <div class="ind-03">621</div>
>    <a href="../Text/gc0306.htm#p0621">
>
>
> 3. A third issue is  adding the closing "</a>" tag after the targeted
>    text, thus completing the wrapping of the entry so that the end
>    result of one iteration looks exactly like this:
>
>
>        <div class="ind-01"></div>
>          <div class="ind-02"><a href="../Text/gc0306.htm#p0621"> Abatis</a></div>
>          <div class="ind-03">621</div>
>
>
> In other words, I need to put together some kind of front-end...
> presumably in vimscript (so that I have ability to navigate the lines in
> the buffer)... that does the three things described above:
>
> 1. grab the current matched string/page number, pass it to the bash
>    one-liner to generate the corresponding <a href="..."> and return
>    the result to vim.
>
> 2. move the cursor to the first character of the corresponding index
>    entry (the text and the page number are vertically aligned so that
>    hitting "k" on the keyboard does exactly that...) and insert the
>    generated text before the cursor (iow, what a Shift-P would do)
>
> 3. jump to the opening "<" of the closing </div> tag and insert "</a>"
>    before the cursor.
>
> Another approach I considered  might consist in recording a vim macro
> that would reproduce manual actions at the keyboard and run it
> iteratively against the buffer. But I doubt line-mode commands such as
> ":r ! ..." would be recorded.
>
> Please let me know if this is at all feasible in vim (and vim might
> offer better means of achieving what I am trying to do) or whether
> I should look at other options.
>
> Thanks,
>
> C]
Substitute (:h :s) will do all you need. In the case of links and anchors, I modify this model to the specific situation in each case:

:%s/ \(_\(\w\+\)\)/ <a href="#\1">\2<\/a>/g|:%s/^_\w\+$/<a name="&"><\/a>/

Being simple minded, I just ensure that anchors always occur at the start of lines, and that links never do.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: using vim to add <a href= ...> links to an epub index file

Chris Jones-44
On Fri, Nov 24, 2017 at 12:43:46PM EST, porphyry5 wrote:
> On Thursday, November 23, 2017 at 3:53:58 PM UTC-8, Chris Jones wrote:

[..]
>
> Substitute (:h :s) will do all you need. In the case of links and
> anchors, I modify this model to the specific situation in each case:
>
> :%s/ \(_\(\w\+\)\)/ <a href="#\1">\2<\/a>/g|:%s/^_\w\+$/<a > name="&"><\/a>/

Do you mean using submatch(0) to retrieve what /\d\+<  actually matched
in the current iteration?

So far this seems to be the only way to retrieve the string that a regex
actually matches... alas, as per the :help submatch vim manual...
submatch() can only be used in the context of the replacement part of
a :substitute command - which is not what I had in mind.

Just curious. I gave up on the idea of using vim in this instance and
wrote a ~10 lines python script that rewrites the file... adding the
links where relevant.  

> Being simple minded, I just ensure that anchors always occur at the
> start of lines, and that links never do.

Always try to eat off of a clean plate when you can. The index file as
tidied up by yours truly was nice and clean to start with... My little
script only created ~10 faulty <a href= > links out of the 6,000+...
which took c. 10 minutes to edit.

All the same & just for the hell of it... doing it in vim would have
been more satisfying.

So if you could afford the time... could you explain the vim solution
you had in mind? I'm still interested.

Thanks,

CJ

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: using vim to add <a href= ...> links to an epub index file

porphyry5
On Saturday, November 25, 2017 at 10:25:32 AM UTC-8, Chris Jones wrote:

> On Fri, Nov 24, 2017 at 12:43:46PM EST, porphyry5 wrote:
> > On Thursday, November 23, 2017 at 3:53:58 PM UTC-8, Chris Jones wrote:
>
> [..]
> >
> > Substitute (:h :s) will do all you need. In the case of links and
> > anchors, I modify this model to the specific situation in each case:
> >
> > :%s/ \(_\(\w\+\)\)/ <a href="#\1">\2<\/a>/g|:%s/^_\w\+$/<a > name="&"><\/a>/
>
> Do you mean using submatch(0) to retrieve what /\d\+<  actually matched
> in the current iteration?
>
> So far this seems to be the only way to retrieve the string that a regex
> actually matches... alas, as per the :help submatch vim manual...
> submatch() can only be used in the context of the replacement part of
> a :substitute command - which is not what I had in mind.
>
> Just curious. I gave up on the idea of using vim in this instance and
> wrote a ~10 lines python script that rewrites the file... adding the
> links where relevant.  
>
> > Being simple minded, I just ensure that anchors always occur at the
> > start of lines, and that links never do.
>
> Always try to eat off of a clean plate when you can. The index file as
> tidied up by yours truly was nice and clean to start with... My little
> script only created ~10 faulty <a href= > links out of the 6,000+...
> which took c. 10 minutes to edit.
>
> All the same & just for the hell of it... doing it in vim would have
> been more satisfying.
>
> So if you could afford the time... could you explain the vim solution
> you had in mind? I'm still interested.
>
> Thanks,
>
> CJ
I was referring to the :substitute command, which can use submatch() if need be, but usually is not necessary.
Entering :h :s<Enter> at the command line invokes the help for :substitute. :s is usually employed, being the shortest abbreviation of :substitute vim recognizes
There is an associated function substitute(), which works almost identically to :substitute

You really need to read the help chapters usr_27.txt and pattern.txt (:h usr_27 and :h pattern), I cannot possibly give a brief overview of vim's pattern matching and manipulating ability.

Largely I correct ocr-ed texts and convert them to .txt, .html and .epub. The 2 :s command string I supplied is literally all I ever need to produce the Page No. anchors and within Index links (occasionally I may need to use a minor modification of the pattern). But I do this early in the conversion process, when it is simple to differentiate links from anchors.  You have left yours until much later, so your pattern will be more complex, but the general principles still apply.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.