UTF bidi support

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

UTF bidi support

J.A.J. Pater

Dear vimmers,

When I open a UTF-text file with right-to-left-text (hebrew in this
case) and left-to-right-text (english in this case) in gedit it is
rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).

But when I open the same file in gvim the right-to-left text (hebrew) is
showed as left-to-right text (just as the rest of the file, cq. english).

Is there a way to get the same behaviour as in gedit?

I searched for :help already and found things like mlterm, termbidi, set
bomb etc. but I just can't gvim to show the text like gedit.

Thanks in advance,

Adriaan


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Moshe Kamensky-2
No, there is no way to do it, I miss this as well... If you only want to
view text, I have a plugin for urxvt that does this.

Moshe

* J.A.J. Pater <[hidden email]> [02/07/09 08:09]:

>
> Dear vimmers,
>
> When I open a UTF-text file with right-to-left-text (hebrew in this
> case) and left-to-right-text (english in this case) in gedit it is
> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>
> But when I open the same file in gvim the right-to-left text (hebrew) is
> showed as left-to-right text (just as the rest of the file, cq. english).
>
> Is there a way to get the same behaviour as in gedit?
>
> I searched for :help already and found things like mlterm, termbidi, set
> bomb etc. but I just can't gvim to show the text like gedit.
>
> Thanks in advance,
>
> Adriaan
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message from the "vim_use" maillist.
> For more information, visit http://www.vim.org/maillist.php
> -~----------~----~----~----~------~----~------~--~---
>

attachment0 (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Bram Moolenaar
In reply to this post by J.A.J. Pater


Adriaan Pater wrote:

> When I open a UTF-text file with right-to-left-text (hebrew in this
> case) and left-to-right-text (english in this case) in gedit it is
> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>
> But when I open the same file in gvim the right-to-left text (hebrew) is
> showed as left-to-right text (just as the rest of the file, cq. english).
>
> Is there a way to get the same behaviour as in gedit?
>
> I searched for :help already and found things like mlterm, termbidi, set
> bomb etc. but I just can't gvim to show the text like gedit.

No, the whole text is either LTR or RTL.

I have never understood why people put the text in the wrong order in
the file and then change the order when displaying it.  The characters
should be in the file in the order they are displayed.

Perhaps there is a filter that change the order for this kind of file.

--
hundred-and-one symptoms of being an internet addict:
49. You never have to deal with busy signals when calling your ISP...because
    you never log off.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck

On 02/07/09 20:03, Bram Moolenaar wrote:

>
>
> Adriaan Pater wrote:
>
>> When I open a UTF-text file with right-to-left-text (hebrew in this
>> case) and left-to-right-text (english in this case) in gedit it is
>> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>>
>> But when I open the same file in gvim the right-to-left text (hebrew) is
>> showed as left-to-right text (just as the rest of the file, cq. english).
>>
>> Is there a way to get the same behaviour as in gedit?
>>
>> I searched for :help already and found things like mlterm, termbidi, set
>> bomb etc. but I just can't gvim to show the text like gedit.
>
> No, the whole text is either LTR or RTL.
>
> I have never understood why people put the text in the wrong order in
> the file and then change the order when displaying it.  The characters
> should be in the file in the order they are displayed.
>
> Perhaps there is a filter that change the order for this kind of file.
>

If you put the characters into the file in the order they are displayed
regardless of the order they are pronounced, you'll get no end of
trouble when trying to reformat (to wider or narrower width) paragraphs
containing sentences (or book titles etc.) in both directions, or even
text containing separate paragraphs (quotations...) in the opposite
direction. To reformat text with mixed LTR and RTL paragraphs you'll
need to (1) reverse the order of characters in every "wrong-direction"
line; (2) reformat; (3) reverse again. If you have paragraphs in one
direction with at least two consecutive words in the opposite direction,
you'll have to take care of all the possibilities of line breaks coming
and going in the middle of the "wrong-direction" text.

The "right" sequence of letters in a file consists of putting the start
of every word before its end, and the words of every sentence in the
order they are pronounced. Then the reordering happens when displaying,
_after_ deciding where line breaks (if any) have to come. IIUC the worst
headbreaker in that respect lies in the scripts specific to the Indian
subcontinent (not yet supported by Vim), which are LTR on the whole, but
with some vowels written to the left of the consonant which comes before
them.

Vim can display _each window_ as either LTR or RTL but not both, use
":setlocal invrightleft" to toggle. |'rightleft'| Unless you are running
Console Vim in a true-bidi terminal, in which case (IIUC) setting
'termbidi' tells Vim that the terminal, not Vim, is in charge of bidi
display, Arabic shaping, etc.

If I have an English or French paragraph with one word in Hebrew or
Arabic, I'll keep it in 'norightleft' and know that gvim displays the
RTL word "the wrong way". Conversely for Arabic (RTL) text maybe
including some numbers (LTR even with Arabic-Indic digits), where I'll
use 'rightleft' and know that the numbers are displayed in gvim with the
digits reversed. OTOH if I have a file with long sentences in both LTR
and LTR I'll maybe display it in two split-windows, one of them
'rightleft' and the other 'norightleft'. Or else, I'll be busy with only
one language at a time and orient the window accordingly.

What I do to view (or print) a text (or HTML) file with mixed LTR and
LRT text is save it to disk, then display it in my favourite browser.
Thus Hebrew and Arabic appear right-to-left, English etc. appear
left-to-right and mixed-direction paragraphs are handled properly.


Best regards,
Tony.
--
Naeser's Law:
        You can make it foolproof, but you can't make it
damnfoolproof.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

bill lam
In reply to this post by Bram Moolenaar

On Thu, 02 Jul 2009, Bram Moolenaar wrote:
> I have never understood why people put the text in the wrong order in
> the file and then change the order when displaying it.  The characters
> should be in the file in the order they are displayed.

IIUC that is what bidi mean. The text is in 'correct order' but
displayed as bidi.  eg. numbers are ltr in arabic, in order to write
the sentence 'year2009' in arabic on a piece of paper,
  (pretending letters in arabic)            
            y
           ey
          aey
         raey
  (jump some space before writing number)
     2   raey
     20  raey
     200 raey
     2009raey

--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck

On 03/07/09 03:34, bill lam wrote:

>
> On Thu, 02 Jul 2009, Bram Moolenaar wrote:
>> I have never understood why people put the text in the wrong order in
>> the file and then change the order when displaying it.  The characters
>> should be in the file in the order they are displayed.
>
> IIUC that is what bidi mean. The text is in 'correct order' but
> displayed as bidi.  eg. numbers are ltr in arabic, in order to write
> the sentence 'year2009' in arabic on a piece of paper,
>    (pretending letters in arabic)
>              y
>             ey
>            aey
>           raey
>    (jump some space before writing number)
>       2   raey
>       20  raey
>       200 raey
>       2009raey
>

Well, I thought Bram meant that che characters should be in the file in
the order

2
0
0
9
<space>
r
a
e
y

if it's a "LTR file", and in the order

y
e
a
r
<space>
9
0
0
2

if it's a "RTL file" (more probable for Arabic).


The Unicode standard (with which I agree for reasons outlined in my
previous post, and I think you do too), is that they MUST be in the order

y
e
a
r
<space>
2
0
0
9

at least in a Unicode file. (IIUC, some legacy encodings may require one
of the other two).


Best regards,
Tony.
--
"There is no reason for any individual to have a computer in their
home."
                -- Ken Olson, President of DEC, World Future Society
                   Convention, 1977

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Ken Bloom-2
In reply to this post by Bram Moolenaar

On Thu, 02 Jul 2009 20:03:02 +0200, Bram Moolenaar wrote:

> Adriaan Pater wrote:
>
>> When I open a UTF-text file with right-to-left-text (hebrew in this
>> case) and left-to-right-text (english in this case) in gedit it is
>> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>>
>> But when I open the same file in gvim the right-to-left text (hebrew)
>> is showed as left-to-right text (just as the rest of the file, cq.
>> english).
>>
>> Is there a way to get the same behaviour as in gedit?
>>
>> I searched for :help already and found things like mlterm, termbidi,
>> set bomb etc. but I just can't gvim to show the text like gedit.
>
> No, the whole text is either LTR or RTL.
>
> I have never understood why people put the text in the wrong order in
> the file and then change the order when displaying it.  The characters
> should be in the file in the order they are displayed.
>
> Perhaps there is a filter that change the order for this kind of file.

They're stored in the file in "logical order", which is the order that
the reader processes them when reading. That means, if he has an English
document with some embedded Hebrew, then when he encounters the first
Hebrew letter, his eyes will skip to the end of the Hebrew phrase (or the
end of the same line if it's a multi-line hebrew phrase), and start
working backwords until he hits the English, at which point his eyes will
skip again across the Hebrew to the English text that follows the Hebrew.
This is "logical order", and it's the order he reads in.

It's also the order that a computer would use if it were:
* lexicographically comparing mixed-language strings
* performing text-to-speech conversion
* rewrapping paragraphs
* it is the order in which text is typed at the keyboard.

See pages 19-20 of the Unicode Standard 5.0 (available online at http://
unicode.org/versions/Unicode5.0.0/ch02.pdf)

Since display is the only part of the system that doesn't operate in
logical order, it's logical to put the conversion into the display
routines, rather than putting it into the file itself where it screws up
every other operation the computer has to perform on it.

--Ken

--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Ali Gholami Rudi-3
In reply to this post by J.A.J. Pater

Hi,

"J.A.J. Pater" <[hidden email]> wrote:
> When I open a UTF-text file with right-to-left-text (hebrew in this
> case) and left-to-right-text (english in this case) in gedit it is
> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>
> But when I open the same file in gvim the right-to-left text (hebrew) is
> showed as left-to-right text (just as the rest of the file, cq. english).
>
> Is there a way to get the same behaviour as in gedit?

As a workaround, I use this function to make editing files with mixed
rtl and ltr words easier:

" make right to left editing easier
" replace Mylang and mykeymap
function! Mylang()
        setlocal keymap=mykeymap rl delcombine
        " use s-tab to switch the direction and keymap
        map <buffer> <s-tab> :let &imi=1-&imi<cr>:setlocal invrl<cr>
        " the same in insert mode
        imap <buffer> <s-tab> <esc><s-tab>a
endfunction
command Mylang call Mylang()

        Ali

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

J.A.J. Pater
In reply to this post by Tony Mechelynck

Tony Mechelynck wrote:
 > Unless you are running Console Vim in a true-bidi terminal, in which
case (IIUC) setting 'termbidi' tells Vim that the terminal, not Vim, is
in charge of bidi display, Arabic shaping, etc.
OK, thanks!
This was the part I didn't understand.
Tested in mlterm and it works fine.

Although it would be nice to keep h and l going the same direction as in
LTR mode...

 > OTOH if I have a file with long sentences in both LTR and LTR I'll
maybe display it in two split-windows, one of them 'rightleft' and the
other 'norightleft'. Or else, I'll be busy with only one language at a
time and orient the window accordingly.
Good idea.


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

J.A.J. Pater
In reply to this post by Ali Gholami Rudi-3

Ali Gholami Rudi wrote:

> As a workaround, I use this function to make editing files with mixed
> rtl and ltr words easier:
>
> " make right to left editing easier
> " replace Mylang and mykeymap
> function! Mylang()
>         setlocal keymap=mykeymap rl delcombine
> " use s-tab to switch the direction and keymap
> map <buffer> <s-tab> :let &imi=1-&imi<cr>:setlocal invrl<cr>
> " the same in insert mode
> imap <buffer> <s-tab> <esc><s-tab>a
> endfunction
> command Mylang call Mylang()
>
> Ali
Thanks for the tip.


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

J.A.J. Pater
In reply to this post by Moshe Kamensky-2

Moshe Kamensky schreef:
> No, there is no way to do it, I miss this as well... If you only want to
> view text, I have a plugin for urxvt that does this.
>
> Moshe
Would this plugin not work while editing text?
I'd like to have it.


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Moshe Kamensky-2

* J.A.J. Pater <[hidden email]> [03/07/09 07:27]:
>
> Moshe Kamensky schreef:
> > No, there is no way to do it, I miss this as well... If you only want to
> > view text, I have a plugin for urxvt that does this.
> >
> > Moshe
> Would this plugin not work while editing text?
> I'd like to have it.
>

It won't work for editing, because it only works when applying the bidi
algorithm once for each line. I attach the plugin. You should put it in
the same directory as the other urxvt perl plugins (possibly
/usr/lib/urxvt/perl), and add bidi to URxvt.perl-ext-common, or use the  
-pe switch (see the man pages for urxvt and urxvtperl for details).
You will need Text::Bidi installed (from CPAN) which, in turn, needs
libfribidi.

There is a resource URxvt.bidiFieldSeparator that can be set to a
sequence of characters, each of which serves as a separator for the bidi
algorithm. This is useful for example to preserve the columns when using
a mail client within urxvt.

Best,
Moshe


>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message from the "vim_use" maillist.
> For more information, visit http://www.vim.org/maillist.php
> -~----------~----~----~----~------~----~------~--~---
>

bidi (3K) Download Attachment
attachment1 (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Bram Moolenaar
In reply to this post by Tony Mechelynck


Tony Mechelynck wrote:

> On 03/07/09 03:34, bill lam wrote:
> >
> > On Thu, 02 Jul 2009, Bram Moolenaar wrote:
> >> I have never understood why people put the text in the wrong order in
> >> the file and then change the order when displaying it.  The characters
> >> should be in the file in the order they are displayed.
> >
> > IIUC that is what bidi mean. The text is in 'correct order' but
> > displayed as bidi.  eg. numbers are ltr in arabic, in order to write
> > the sentence 'year2009' in arabic on a piece of paper,
> >    (pretending letters in arabic)
> >              y
> >             ey
> >            aey
> >           raey
> >    (jump some space before writing number)
> >       2   raey
> >       20  raey
> >       200 raey
> >       2009raey
> >
>
> Well, I thought Bram meant that che characters should be in the file in
> the order
>
> 2
> 0
> 0
> 9
> <space>
> r
> a
> e
> y
>
> if it's a "LTR file", and in the order
>
> y
> e
> a
> r
> <space>
> 9
> 0
> 0
> 2
>
> if it's a "RTL file" (more probable for Arabic).
>
>
> The Unicode standard (with which I agree for reasons outlined in my
> previous post, and I think you do too), is that they MUST be in the order

Can you give a reference where Unicode specifies this?

> y
> e
> a
> r
> <space>
> 2
> 0
> 0
> 9
>
> at least in a Unicode file. (IIUC, some legacy encodings may require one
> of the other two).

--
hundred-and-one symptoms of being an internet addict:
54. You start tilting your head sideways to smile. :-)

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Bram Moolenaar
In reply to this post by Ken Bloom-2


Ken Bloom wrote:

> On Thu, 02 Jul 2009 20:03:02 +0200, Bram Moolenaar wrote:
>
> > Adriaan Pater wrote:
> >
> >> When I open a UTF-text file with right-to-left-text (hebrew in this
> >> case) and left-to-right-text (english in this case) in gedit it is
> >> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
> >>
> >> But when I open the same file in gvim the right-to-left text (hebrew)
> >> is showed as left-to-right text (just as the rest of the file, cq.
> >> english).
> >>
> >> Is there a way to get the same behaviour as in gedit?
> >>
> >> I searched for :help already and found things like mlterm, termbidi,
> >> set bomb etc. but I just can't gvim to show the text like gedit.
> >
> > No, the whole text is either LTR or RTL.
> >
> > I have never understood why people put the text in the wrong order in
> > the file and then change the order when displaying it.  The characters
> > should be in the file in the order they are displayed.
> >
> > Perhaps there is a filter that change the order for this kind of file.
>
> They're stored in the file in "logical order", which is the order that
> the reader processes them when reading. That means, if he has an English
> document with some embedded Hebrew, then when he encounters the first
> Hebrew letter, his eyes will skip to the end of the Hebrew phrase (or the
> end of the same line if it's a multi-line hebrew phrase), and start
> working backwords until he hits the English, at which point his eyes will
> skip again across the Hebrew to the English text that follows the Hebrew.
> This is "logical order", and it's the order he reads in.
>
> It's also the order that a computer would use if it were:
> * lexicographically comparing mixed-language strings
> * performing text-to-speech conversion
> * rewrapping paragraphs
> * it is the order in which text is typed at the keyboard.
>
> See pages 19-20 of the Unicode Standard 5.0 (available online at http://
> unicode.org/versions/Unicode5.0.0/ch02.pdf)
>
> Since display is the only part of the system that doesn't operate in
> logical order, it's logical to put the conversion into the display
> routines, rather than putting it into the file itself where it screws up
> every other operation the computer has to perform on it.

The display is not the only part.  Suppose you move your cursor to the
start of a word and type "dw".  You expect the word to be deleted.
Since "start of the word" depends on what direction the word is to be
read, the editor needs to understand the meaning of the word to be able
to decide what to do.  And it gets worse: What if some of the characters
in the word are LTR and some are RTL?  This quickly gets very
complicated.

So Vim uses a simple and reliable method: Display the text either as LTR
or RTL and do the editing assuming all text is to be read that way.

You can open two windows on the same text, one in LTR and one in RTL if
you want to edit mixed text.

It would be really messy to display the text with mixed directions and
then have all edits work one way or perhaps fail with an error.  Or
worse: delete the wrong text.

There are actually many more places where it matters: When
concatanating two files with text, "echo -n" in the shell, etc.
That's why i18n is so difficult.

I'm glad Australians don't write upside-down!


--
hundred-and-one symptoms of being an internet addict:
55. You ask your doctor to implant a gig in your brain.

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Ali Gholami Rudi-3
In reply to this post by Bram Moolenaar

Bram Moolenaar <[hidden email]> wrote:

> > if it's a "LTR file", and in the order
> >
> > y
> > e
> > a
> > r
> > <space>
> > 9
> > 0
> > 0
> > 2
> >
> > if it's a "RTL file" (more probable for Arabic).
> >
> >
> > The Unicode standard (with which I agree for reasons outlined in my
> > previous post, and I think you do too), is that they MUST be in the order
>
> Can you give a reference where Unicode specifies this?

http://unicode.org/reports/tr9/?

Anyway, I don't think storing chars in presentation order is a good
idea.  Apart from problems when using the file (other tools expect the
logical order), this does not make editor's task any easier; people
write text in the logical order (not the presentation order).  For
instance in your example, although the word looks like RAEY, people
write it as YEAR.  So if this is going to work, when inserting the
editor should reverse the order of the characters that appear in parts
which use the characters in an rtl language (this algorithm is explained
in the URL above).  So in practice it might be even harder (or at least
as hard).

        Ali

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

J.A.J. Pater
In reply to this post by Bram Moolenaar
Bram Moolenaar wrote:
> The display is not the only part.  Suppose you move your cursor to the
> start of a word and type "dw".  You expect the word to be deleted.
> Since "start of the word" depends on what direction the word is to be
> read, the editor needs to understand the meaning of the word to be able
> to decide what to do.  And it gets worse: What if some of the characters
> in the word are LTR and some are RTL?  This quickly gets very
> complicated.
>
> So Vim uses a simple and reliable method: Display the text either as LTR
> or RTL and do the editing assuming all text is to be read that way.


These are good reason for gvim to work the way its works!
So this editing 'mode' should be default.

But it still would be nice if there would be an optional 'real-bidi mode'.

> It would be really messy to display the text with mixed directions and
> then have all edits work one way or perhaps fail with an error.


Using the ViGedit plugin for gedit I'm also able to do some real-bidi typing.  :-)
Works almost the same as "mlterm+vim+set termbidi"
With an exception: in command mode the h always goes left, the l always goes right, just as I want it.
(It just feels weird pressing h and cursor goes right).
Indeed as far as I'm concerned a command like "dw" in RTL mode should delete from R to L.

Since gedit seems to be real-bidi I guess GTK+ has the algorithm mentioned by Ali sort of implemented.
Guess this could be used in gvim.

Indeed it will be quite hard to figure out how vim commands should work.
So maybe a 'real-bidi mode' could use only a subset of vim commands?

Well just my 2 cents.

Adriaan





--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck
In reply to this post by Ali Gholami Rudi-3

On 03/07/09 20:52, Ali Gholami Rudi wrote:
> Bram Moolenaar<[hidden email]>  wrote:
[Tony wrote]
>>> The Unicode standard (with which I agree for reasons outlined in my
>>> previous post, and I think you do too), is that they MUST be in the order
y
e
a
r
<space>
2
0
0
9
>>
>> Can you give a reference where Unicode specifies this?
>
> http://unicode.org/reports/tr9/ ?

Yeah, that's one place, though like most of the "normative" Unicode
texts it is very much "technical" -- the kind which will put non-techies
to sleep before they have a chance to get an idea of what is being
talked about.

This text is about determining how to display Unicode text given the
memory (or disk) representation, but it says near the top that the
representation is "logical" which means that the characters are stored
in memory or on disk in the order they would be pronounced or
handwritten (by someone who knows the language).


Best regards,
Tony.
--
"I don't think they could put him in a mental hospital.  On the other
hand, if he were already in, I don't think they'd let him out."

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck
In reply to this post by Bram Moolenaar

On 03/07/09 20:11, Bram Moolenaar wrote:
> Ken Bloom wrote:
[...]

>> They're stored in the file in "logical order", which is the order that
>> the reader processes them when reading. That means, if he has an English
>> document with some embedded Hebrew, then when he encounters the first
>> Hebrew letter, his eyes will skip to the end of the Hebrew phrase (or the
>> end of the same line if it's a multi-line hebrew phrase), and start
>> working backwords until he hits the English, at which point his eyes will
>> skip again across the Hebrew to the English text that follows the Hebrew.
>> This is "logical order", and it's the order he reads in.
>>
>> It's also the order that a computer would use if it were:
>> * lexicographically comparing mixed-language strings
>> * performing text-to-speech conversion
>> * rewrapping paragraphs
>> * it is the order in which text is typed at the keyboard.
>>
>> See pages 19-20 of the Unicode Standard 5.0 (available online at http://
>> unicode.org/versions/Unicode5.0.0/ch02.pdf)
>>
>> Since display is the only part of the system that doesn't operate in
>> logical order, it's logical to put the conversion into the display
>> routines, rather than putting it into the file itself where it screws up
>> every other operation the computer has to perform on it.
>
> The display is not the only part.  Suppose you move your cursor to the
> start of a word and type "dw".  You expect the word to be deleted.
> Since "start of the word" depends on what direction the word is to be
> read, the editor needs to understand the meaning of the word to be able
> to decide what to do.  And it gets worse: What if some of the characters
> in the word are LTR and some are RTL?  This quickly gets very
> complicated.

With logical order, the start of a word is the letter which stands
earliest in memory. If you move your cursor to the leading alif of
Allah, stored in memory logically as ALLH (the second alif is usually
not written, or only as a diacritical mark above the second lam), then
do "dw", the word should be deleted until the heh, even though the alif
is displayed rightmost and the heh leftmost. Memory order is what
matters, and with logical storage the first letter is still the first
(though maybe not the leftmost one), not as if you stored Allah as HLLA
in memory.

As for characters needing reordering within a single word, I suppose
that's one of the reasons why Vim doesn't yet support devanagari,
gujarati, and the other Indian-subcontinent scripts of that family.

>
> So Vim uses a simple and reliable method: Display the text either as LTR
> or RTL and do the editing assuming all text is to be read that way.
>
> You can open two windows on the same text, one in LTR and one in RTL if
> you want to edit mixed text.
>
> It would be really messy to display the text with mixed directions and
> then have all edits work one way or perhaps fail with an error.  Or
> worse: delete the wrong text.

IIUC it works correctly in Console mode with mlterm (a true-bidi
terminal) though in that case h and l will move the cursor in the
opposite direction when the underlying text is RTL: with my Allah
example, repeatedly hitting l moves from A to L to L to H which is
right-to-left but still logically first-to-last.

>
> There are actually many more places where it matters: When
> concatanating two files with text, "echo -n" in the shell, etc.
> That's why i18n is so difficult.

When concatenating files, assuming there is a paragraph break between
them, logical order gives flawless concatenation in all cases. With
"presentation order", even with a paragraph break you might have to
reverse each line of one of the files if they didn't have the same
direction, and then you would have to somehow know the LTR or RTL
direction of all three files (both inputs and the output) to begin with.

>
> I'm glad Australians don't write upside-down!
>
>

oh, they do, only they aren't conscious of it. ;-) Happily the mailboat
(or plane, or even the email transport) reverses it on the way when
they're writing to us, or we to them.


Best regards,
Tony.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck
In reply to this post by J.A.J. Pater

On 03/07/09 21:05, J.A.J. Pater wrote:

> Bram Moolenaar wrote:
>  > The display is not the only part.  Suppose you move your cursor to the
>  > start of a word and type "dw".  You expect the word to be deleted.
>  > Since "start of the word" depends on what direction the word is to be
>  > read, the editor needs to understand the meaning of the word to be able
>  > to decide what to do.  And it gets worse: What if some of the characters
>  > in the word are LTR and some are RTL?  This quickly gets very
>  > complicated.
>  >
>  > So Vim uses a simple and reliable method: Display the text either as LTR
>  > or RTL and do the editing assuming all text is to be read that way.
>
> These are good reason for gvim to work the way its works!
> So this editing 'mode' should be default.
>
> But it still would be nice if there would be an optional 'real-bidi mode'.
>
>  > It would be really messy to display the text with mixed directions and
>  > then have all edits work one way or perhaps fail with an error.
>
> Using the ViGedit plugin for gedit I'm also able to do some real-bidi
> typing. :-)
> Works almost the same as "mlterm+vim+set termbidi"
> With an exception: in command mode the h always goes left, the l always
> goes right, just as I want it.
> (It just feels weird pressing h and cursor goes right).

Hm, I think it's one of those things one could get used to in time, no
harder than deleting with d rather than Ctrl-X, pasting with P rather
than Ctrl-V, and copying with y rather than Ctrl-C. I know there is
mswin.vim for the latter three, but IMO it is the result of a misguided
attempt to make Vim more like Notepad. Indeed, in true-bidi terminals
with 'termbidi' on (which should be the Vim default for mlterm) Vim has
no knowledge of character direction, so lllll goes uniformly
first-to-last, and hhhhh last-to-first, even if the movement is a little
jerky when meeting a direction change within a line of text. (Or did I
misunderstand? AFAICT I haven't got mlterm installed)

> Indeed as far as I'm concerned a command like "dw" in RTL mode should
> delete from R to L.
>
> Since gedit seems to be real-bidi I guess GTK+ has the algorithm
> mentioned by Ali sort of implemented.
> Guess this could be used in gvim.

My notion would be that a true-bidi gvim should work exactly like
vim+mlterm with 'termbidi'.

>
> Indeed it will be quite hard to figure out how vim commands should work.
> So maybe a 'real-bidi mode' could use only a subset of vim commands?
>
> Well just my 2 cents.
>
> Adriaan


Best regards,
Tony.
--
Meskimen's Law:
        There's never time to do it right, but there's always time to
do it over.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: UTF bidi support

Tony Mechelynck
In reply to this post by Ali Gholami Rudi-3

On 03/07/09 07:13, Ali Gholami Rudi wrote:

>
> Hi,
>
> "J.A.J. Pater"<[hidden email]>  wrote:
>> When I open a UTF-text file with right-to-left-text (hebrew in this
>> case) and left-to-right-text (english in this case) in gedit it is
>> rendered OK (rtl is displayed as rtl, ltr is displayed as ltr).
>>
>> But when I open the same file in gvim the right-to-left text (hebrew) is
>> showed as left-to-right text (just as the rest of the file, cq. english).
>>
>> Is there a way to get the same behaviour as in gedit?
>
> As a workaround, I use this function to make editing files with mixed
> rtl and ltr words easier:
>
> " make right to left editing easier
> " replace Mylang and mykeymap
> function! Mylang()
>          setlocal keymap=mykeymap rl delcombine
> " use s-tab to switch the direction and keymap
> map<buffer>  <s-tab>  :let&imi=1-&imi<cr>:setlocal invrl<cr>

.....................:let &l:imi = !&l:imi..........
no need to clobber the global setting
(for 'rl' you properly used :setlocal)

> " the same in insert mode
> imap<buffer>  <s-tab>  <esc><s-tab>a
> endfunction
> command Mylang call Mylang()
>
> Ali

Best regards,
Tony.
--
        The seven eyes of Ningauble the Wizard floated back to his hood
as he reported to Fafhrd: "I have seen much, yet cannot explain all.
The Gray Mouser is exactly twenty-five feet below the deepest cellar in
the palace of Gilpkerio Kistomerces.  Even though twenty-four parts in
twenty-five of him are dead, he is alive.

        "Now about Lankhmar.  She's been invaded, her walls breached
everywhere and desperate fighting is going on in the streets, by a
fierce host which out-numbers Lankhmar's inhabitants by fifty to one --
and equipped with all modern weapons.  Yet you can save the city."

        "How?" demanded Fafhrd.

        Ningauble shrugged.  "You're a hero.  You should know."
                -- Fritz Leiber, from "The Swords of Lankhmar"

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

12