Hello!
I have quite a few text files that is mixed with English and non-English chars such as Chinese. Usually they are documents that have very long lines that every line is a paragraph per se. So I use "set wrap". For English text, I prefer "set linebreak" so that a word would not break at the end of the screen line end. But VIM doens't work as I expected by breaking the line at chars specified in "breakat", especially when with Chinese text where a character is a word on its own. For example: set linebreak set wrap now I have this text in a long line (I'll use X to represent a single Chinese char in case you can't display it.) English begins. English ends. Chinese begins.XXXXXXXXX. Then I resize the window a bit narrower. This line should wrap like: English begins. English ends. Chinese begins.XXXXX XXXX. This is because each Chinese char is a word on its own. I expect VIM to break at Chinese chars as well as "breakat". But actually VIM wraps it like: English begins. English ends. Chinese begins. XXXXXXXXX. Although there are still enough space to display some Chinese chars after the period sign "." in the first line. Is there any mean that I can do to make VIM work as I expect? Thank you! |
Yao G. Zhan wrote: > I have quite a few text files that is mixed with English and non-English > chars such as Chinese. Usually they are documents that have very long > lines that every line is a paragraph per se. So I use "set wrap". For > English text, I prefer "set linebreak" so that a word would not break at > the end of the screen line end. But VIM doens't work as I expected by > breaking the line at chars specified in "breakat", especially when with > Chinese text where a character is a word on its own. For example: > > set linebreak > set wrap > > now I have this text in a long line (I'll use X to represent a single > Chinese char in case you can't display it.) > > English begins. English ends. Chinese begins.XXXXXXXXX. > > Then I resize the window a bit narrower. This line should wrap like: > > English begins. English ends. Chinese begins.XXXXX > XXXX. > > This is because each Chinese char is a word on its own. I expect VIM to > break at Chinese chars as well as "breakat". But actually VIM wraps it > like: > > English begins. English ends. Chinese begins. > XXXXXXXXX. > > Although there are still enough space to display some Chinese chars > after the period sign "." in the first line. > > Is there any mean that I can do to make VIM work as I expect? I understand the problem. 'breakat' is a list of characters, thus it doesn't allow a regexp or character range. Adding all Chinese characters to it would make it much too long. Perhaps we could allow character ranges. But previously something like "[a-z]" would mean the characters "][az-". Perhaps doubling the square brackets isn't too bad: "[[a-z]]"? Otherwise a separate option could be used. Anyway, using a regexp here will certainly slow down processing. Currently a 256-entry lookup table is used to speedup processing. That won't work for multi-byte characters... -- Nobody will ever need more than 640 kB RAM. -- Bill Gates, 1983 Windows 98 requires 16 MB RAM. -- Bill Gates, 1999 Logical conclusion: Nobody will ever need Windows 98. /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net \\\ /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html /// |
Hi Bram,
Bram Moolenaar wrote: > Anyway, using a regexp here will certainly slow down processing. > Currently a 256-entry lookup table is used to speedup processing. That > won't work for multi-byte characters... Do you keep the unicode charater properties in memory somewhere? In that case you might want to consider doing a lookup in that table instead. Actually, I believe that that's the only "right" solution that would work reasonably correctly under any language. Regards, Camillo -- Camillo Särs <[hidden email]> Aim for the impossible and you http://camillo.särs.net will achieve the improbable |
Camillo S?rs wrote: > Bram Moolenaar wrote: > > Anyway, using a regexp here will certainly slow down processing. > > Currently a 256-entry lookup table is used to speedup processing. That > > won't work for multi-byte characters... > > Do you keep the unicode charater properties in memory somewhere? In > that case you might want to consider doing a lookup in that table > instead. Actually, I believe that that's the only "right" solution that > would work reasonably correctly under any language. There are a few properties of Unicode characters that Vim knows, such as the cell width and upper/lower case. But that a sequence of characters can be wrapped at any point isn't in there. The rough separation in latin1 and non-latin1 characters is sufficient for when mixing Asian text with English. Perhaps that's sufficient for most people. -- hundred-and-one symptoms of being an internet addict: 120. You ask a friend, "What's that big shiny thing?" He says, "It's the sun." /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net \\\ /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html /// |
Free forum by Nabble | Edit this page |