"breakat" non-English chars when set linebreak and wrap

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

"breakat" non-English chars when set linebreak and wrap

詹光耀
Hello!

I have quite a few text files that is mixed with English and non-English
chars such as Chinese. Usually they are documents that have very long
lines that every line is a paragraph per se. So I use "set wrap". For
English text, I prefer "set linebreak" so that a word would not break at
the end of the screen line end. But VIM doens't work as I expected by
breaking the line at chars specified in "breakat", especially when with
Chinese text where a character is a word on its own. For example:

    set linebreak
    set wrap

now I have this text in a long line (I'll use X to represent a single
Chinese char in case you can't display it.)

    English begins. English ends. Chinese begins.XXXXXXXXX.

Then I resize the window a bit narrower. This line should wrap like:

    English begins. English ends. Chinese begins.XXXXX
    XXXX.

This is because each Chinese char is a word on its own. I expect VIM to
break at Chinese chars as well as "breakat". But actually VIM wraps it
like:

    English begins. English ends. Chinese begins.
    XXXXXXXXX.

Although there are still enough space to display some Chinese chars
after the period sign "." in the first line.

Is there any mean that I can do to make VIM work as I expect?

Thank you!

Reply | Threaded
Open this post in threaded view
|

Re: "breakat" non-English chars when set linebreak and wrap

Bram Moolenaar

Yao G. Zhan wrote:

> I have quite a few text files that is mixed with English and non-English
> chars such as Chinese. Usually they are documents that have very long
> lines that every line is a paragraph per se. So I use "set wrap". For
> English text, I prefer "set linebreak" so that a word would not break at
> the end of the screen line end. But VIM doens't work as I expected by
> breaking the line at chars specified in "breakat", especially when with
> Chinese text where a character is a word on its own. For example:
>
>     set linebreak
>     set wrap
>
> now I have this text in a long line (I'll use X to represent a single
> Chinese char in case you can't display it.)
>
>     English begins. English ends. Chinese begins.XXXXXXXXX.
>
> Then I resize the window a bit narrower. This line should wrap like:
>
>     English begins. English ends. Chinese begins.XXXXX
>     XXXX.
>
> This is because each Chinese char is a word on its own. I expect VIM to
> break at Chinese chars as well as "breakat". But actually VIM wraps it
> like:
>
>     English begins. English ends. Chinese begins.
>     XXXXXXXXX.
>
> Although there are still enough space to display some Chinese chars
> after the period sign "." in the first line.
>
> Is there any mean that I can do to make VIM work as I expect?

I understand the problem.  'breakat' is a list of characters, thus it
doesn't allow a regexp or character range.  Adding all Chinese
characters to it would make it much too long.

Perhaps we could allow character ranges.  But previously something like
"[a-z]" would mean the characters "][az-".  Perhaps doubling the square
brackets isn't too bad: "[[a-z]]"?  Otherwise a separate option could be
used.

Anyway, using a regexp here will certainly slow down processing.
Currently a 256-entry lookup table is used to speedup processing.  That
won't work for multi-byte characters...

--
Nobody will ever need more than 640 kB RAM.
                -- Bill Gates, 1983
Windows 98 requires 16 MB RAM.
                -- Bill Gates, 1999
Logical conclusion: Nobody will ever need Windows 98.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///
Reply | Threaded
Open this post in threaded view
|

Re: "breakat" non-English chars when set linebreak and wrap

Camillo Särs
Hi Bram,

Bram Moolenaar wrote:
> Anyway, using a regexp here will certainly slow down processing.
> Currently a 256-entry lookup table is used to speedup processing.  That
> won't work for multi-byte characters...

Do you keep the unicode charater properties in memory somewhere?  In
that case you might want to consider doing a lookup in that table
instead.  Actually, I believe that that's the only "right" solution that
would work reasonably correctly under any language.

Regards,
Camillo
--
Camillo Särs <[hidden email]>             Aim for the impossible and you
http://camillo.särs.net                 will achieve the improbable
Reply | Threaded
Open this post in threaded view
|

Re: "breakat" non-English chars when set linebreak and wrap

Bram Moolenaar

Camillo S?rs wrote:

> Bram Moolenaar wrote:
> > Anyway, using a regexp here will certainly slow down processing.
> > Currently a 256-entry lookup table is used to speedup processing.  That
> > won't work for multi-byte characters...
>
> Do you keep the unicode charater properties in memory somewhere?  In
> that case you might want to consider doing a lookup in that table
> instead.  Actually, I believe that that's the only "right" solution that
> would work reasonably correctly under any language.

There are a few properties of Unicode characters that Vim knows, such as
the cell width and upper/lower case.  But that a sequence of characters
can be wrapped at any point isn't in there.  The rough separation in
latin1 and non-latin1 characters is sufficient for when mixing Asian
text with English.  Perhaps that's sufficient for most people.

--
hundred-and-one symptoms of being an internet addict:
120. You ask a friend, "What's that big shiny thing?" He says, "It's the sun."

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\              Project leader for A-A-P -- http://www.A-A-P.org        ///
 \\\     Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html   ///