vim + win + utf-8 => I'm lost

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

vim + win + utf-8 => I'm lost

Mojca Miklavec
Hello,

1. I've been using vim for quite some time as a basic user, but I cannot
figure out how to type unicode under Windows. When I installed vim to a
friend's computer, windows + unicode was no problem at all, it's just my
computer that's causing problems.

Someone said that probably windows doesn't pass the proper characters
from keyboard to vim and I figured out that he was probably right. I'm
using cp1250 by default and it works perfect. :set encoding=utf-8 also
works, but I can't type in anything but plain ASCII. Copy-paste from and
to other programs works OK.

In Control panel -> Regional options -> Advanced, there's an option
"Select the language version for the programs which don't support
Unicode". I selected Slovenian (cp1250) and cp1250 actually works with
vim. Unicode in Mozilla also works without any problems. If I connect to
a remote computer with putty (ssh) and use vim there, typing unicode is
no problem at all.

I also worked with a computer under linux, where the encoding in locale
was latin1. I also didn't succeed to type our characters (ccaron,
scaron) from keyboard there, although kwrite, Mozilla, OpenOffice and
many other graphical programs had no problems dealing with unicode and a
foreign keyboard.

2. If I have a file in ISO-8859-2 encoding, I can't open it properly.
:set encoding=latin2 doesn't have any influence on the way I see
accented characters. The only remaining option is recode or other text
editor.

3. I don't need to read Swahili and I don't need to have all the
10^\infinity Chinese figures, but with the font that vim uses by default
(fixedsys) I can't see cyrillic, greek, euro symbol and some of the very
common characters from European languages (with ogonek, cedilla, stroke,
...). Which fonts can be recommended?

Thank you very muuch for any hints,
     Mojca
Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

Mojca Miklavec
> Someone said that probably windows doesn't pass the proper characters
> from keyboard to vim and I figured out that he was probably right. I'm
> using cp1250 by default and it works perfect. :set encoding=utf-8 also
> works, but I can't type in anything but plain ASCII. Copy-paste from and
> to other programs works OK.

I'm sorry. It seems that I had to flood the mailing list before
discovering the :set termencoding=cp1250 command by myself (and I've
been looking for it for at least two years). However, the other two
questions are still relevant and I would still be interested in the
answer about how to convince Windows to send proper unicode to the
editor.

Thank you,
    Mojca
Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

A.J.Mechelynck
In reply to this post by Mojca Miklavec
----- Original Message -----
From: "Mojca Miklavec" <[hidden email]>
To: <[hidden email]>
Sent: Friday, August 05, 2005 2:23 AM
Subject: vim + win + utf-8 => I'm lost


> Hello,
>
> 1. I've been using vim for quite some time as a basic user, but I cannot
> figure out how to type unicode under Windows. When I installed vim to a
> friend's computer, windows + unicode was no problem at all, it's just my
> computer that's causing problems.
>
> Someone said that probably windows doesn't pass the proper characters from
> keyboard to vim and I figured out that he was probably right. I'm using
> cp1250 by default and it works perfect. :set encoding=utf-8 also works,
> but I can't type in anything but plain ASCII. Copy-paste from and to other
> programs works OK.
>
> In Control panel -> Regional options -> Advanced, there's an option
> "Select the language version for the programs which don't support
> Unicode". I selected Slovenian (cp1250) and cp1250 actually works with
> vim. Unicode in Mozilla also works without any problems. If I connect to a
> remote computer with putty (ssh) and use vim there, typing unicode is no
> problem at all.
>
> I also worked with a computer under linux, where the encoding in locale
> was latin1. I also didn't succeed to type our characters (ccaron, scaron)
> from keyboard there, although kwrite, Mozilla, OpenOffice and many other
> graphical programs had no problems dealing with unicode and a foreign
> keyboard.
>
> 2. If I have a file in ISO-8859-2 encoding, I can't open it properly. :set
> encoding=latin2 doesn't have any influence on the way I see accented
> characters. The only remaining option is recode or other text editor.
>
> 3. I don't need to read Swahili and I don't need to have all the
> 10^\infinity Chinese figures, but with the font that vim uses by default
> (fixedsys) I can't see cyrillic, greek, euro symbol and some of the very
> common characters from European languages (with ogonek, cedilla, stroke,
> ...). Which fonts can be recommended?
>
> Thank you very muuch for any hints,
>     Mojca
>
>

I've written a few tips and scripts about Vim and Unicode; and I am on
Windows myself -- currently XP, but before that I was on 98 which didn't go
as smoothly.

Here are the links:

http://vim.sourceforge.net/tips/tip.php?tip_id=246 (tip) "Working with
Unicode"

http://vim.sourceforge.net/scripts/script.php?script_id=789 (script)
"Switching to Unicode in an orderly manner"

http://vim.sourceforge.net/tips/tip.php?tip_id=632 (tip) "Setting the font
in the GUI"

    Notes about the latter:
        * This is not specifically Unicode-related, but many good-looking
fonts don't have a wide variety of glyphs in different scripts. Myself, I
use Lucida_Console for Latin, Courier_New for non-East-Asian Unicode,
MingLiU for Traditional Chinese. YMMV.
        * Not for you, but maybe for others: The way to do it in kvim is
explained in the "user comments".

In particular, when you switch over from your Windows-default encoding to
UTF-8, your 'termencoding' should not remain empty. It should always jibe
with what your keyboard is actually inputting, and that hasn't changed. See
how the script above does it.

To type "special" characters not on your keyboard, see "help digraph.txt".
Here are a few examples (where ^K means "hit Ctrl-K"):

    ^Kc<    gives    ?    SMALL LATIN LETTER C WITH CARON
    ^KS<    gives    ?    CAPITAL LATIN LETTER S WITH CARON

etc. (see ":help digraphs-default" for some widely used "second characters"
in digraphs).

If, after reading all this, you have more questions, feel free to come back
to the list.


HTH,
Tony.


Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

A.J.Mechelynck
In reply to this post by Mojca Miklavec
----- Original Message -----
From: "Mojca Miklavec" <[hidden email]>
To: <[hidden email]>
Sent: Friday, August 05, 2005 3:28 AM
Subject: Re: vim + win + utf-8 => I'm lost


>> Someone said that probably windows doesn't pass the proper characters
>> from keyboard to vim and I figured out that he was probably right. I'm
>> using cp1250 by default and it works perfect. :set encoding=utf-8 also
>> works, but I can't type in anything but plain ASCII. Copy-paste from and
>> to other programs works OK.
>
> I'm sorry. It seems that I had to flood the mailing list before
> discovering the :set termencoding=cp1250 command by myself (and I've
> been looking for it for at least two years). However, the other two
> questions are still relevant and I would still be interested in the
> answer about how to convince Windows to send proper unicode to the
> editor.
>
> Thank you,
>    Mojca

See my other reply.

See also, in addition to my tips and script listed over there

    :help digraph.txt
    :help i_CTRL-V_digit
    :help mbyte-keymap
    :help 'langmap'

* Digraphs are a great method to type "simple" Unicode characters like
c-caron, s-cedilla, o-slash, oe-ligature, one-half, etc.
* ^Vuxxxx and ^VUxxxxxxxx (where each x is a hex digit) are invaluable when
you know the codepoint number but don't have a handy digraph.
* Keymaps are very useful to define an "alternate keyboard" for a given
language and to switch "on the fly" between that and English.
* The 'langmap' option is useful to type Vim commands in Latin alphabet when
your "native" encoding is something else, for instance Cyrillic or Greek.

HTH,
Tony.


Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

Mojca Miklavec
Tony, thank you very much for all the hints. Digraphs, termencoding
and langmap (once I write some definitions) now solve 85% of my
problems. I would be glad if windows could communicate with vim in
unicode directly, but I can live with intermediate step in cp1250 for
now.

1. Now another question: I have plenty of material in cp1250. Can I
write something like that in vimrc:

if (file seems to be in utf-8 or if this is a new window)
    set encoding=utf-8
else
    set encoding=cp1250
?

2. Does anyone have any idea why I can't set the latin2 encoding? (I
can set it, but the files are not displayed any different than if
cp1250 encoding is set. The worst thing is that probably 10 characters
are at some other place, but exactly the ones I need are displayed
wrong.)

Thanks,
    Mojca


Tony Mechelynck wrote:

> See also, in addition to my tips and script listed over there
>
>     :help digraph.txt
>     :help i_CTRL-V_digit
>     :help mbyte-keymap
>     :help 'langmap'
>
> * Digraphs are a great method to type "simple" Unicode characters like
> c-caron, s-cedilla, o-slash, oe-ligature, one-half, etc.
> * ^Vuxxxx and ^VUxxxxxxxx (where each x is a hex digit) are invaluable when
> you know the codepoint number but don't have a handy digraph.
> * Keymaps are very useful to define an "alternate keyboard" for a given
> language and to switch "on the fly" between that and English.
> * The 'langmap' option is useful to type Vim commands in Latin alphabet when
> your "native" encoding is something else, for instance Cyrillic or Greek.
>
> HTH,
> Tony.
Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

A.J.Mechelynck
----- Original Message -----
From: "Mojca Miklavec" <[hidden email]>
To: <[hidden email]>
Sent: Sunday, August 07, 2005 2:37 AM
Subject: Re: vim + win + utf-8 => I'm lost


> Tony, thank you very much for all the hints. Digraphs, termencoding
> and langmap (once I write some definitions) now solve 85% of my
> problems. I would be glad if windows could communicate with vim in
> unicode directly, but I can live with intermediate step in cp1250 for
> now.
>
> 1. Now another question: I have plenty of material in cp1250. Can I
> write something like that in vimrc:
>
> if (file seems to be in utf-8 or if this is a new window)
>    set encoding=utf-8
> else
>    set encoding=cp1250
> ?

    set encoding=utf-8 termencoding=cp1250
    set fileencodings=ucs-bom,utf-8,cp1250

This will set 'fileencoding' and 'bomb' buffer-locally to:

    1. bomb fileencoding=<the proper Unicode encoding (of the 5 possible)>
        for any file with a BOM
    2. nobomb fileencoding=utf-8
        - for an empty (or new) file
        - for a file which doesn't contain invalid byte sequences for UTF-8
    3. nobomb fileencoding=cp1250
        otherwise

Note that ucs-bom should always be first, that there should be at most one
8-bit encoding, and that it should be last.

These 3 steps are run in that order. Step 1 is what Windows does. It will
recognise UTF-8 files with BOM there, i.e., files whose first 3 bytes are EF
BB BF in hex (codepoint U+FEFF). To add a BOM to any Unicode file of yours,
use ":setlocal bomb".

Below is my 'statusline' setting, you may or may not find it useful. It
displays the 'fileencoding' (or 'encoding' if 'fileencoding' is empty), the
'bomb' status and (if any) the current keymap. Disregard any linebreaks
added by my mail client; it should be all on one line, and spurious line
breaks should be replaced by spaces.

     set statusline=%<%f\
%h%m%r%=%k[%{(&fenc==\"\")?&enc:&fenc}%{(&bomb?\",BOM\":\"\")}]\
%-14.(%l,%c%V%)\ %P


>
> 2. Does anyone have any idea why I can't set the latin2 encoding? (I
> can set it, but the files are not displayed any different than if
> cp1250 encoding is set. The worst thing is that probably 10 characters
> are at some other place, but exactly the ones I need are displayed
> wrong.)

The Vim name is iso-8859-2 and you may need a working iconv.dll in your
PATH. I got my iconv.exe and iconv.dll from the GnuWin32 project on
sourceforge.net.

To read a latin2 file, use

    :e ++enc=iso-8859-2 filename.ext

(see ":help ++opt") after installing iconv and making sure that it is in
your PATH.

>
> Thanks,
>    Mojca

My pleasure,
Tony.


Reply | Threaded
Open this post in threaded view
|

Re: vim + win + utf-8 => I'm lost

Mojca Miklavec
Tony Mechelynck wrote:

> > 1. Now another question: I have plenty of material in cp1250. Can I
> > write something like that in vimrc:
> >
> > if (file seems to be in utf-8 or if this is a new window)
> >    set encoding=utf-8
> > else
> >    set encoding=cp1250
> > ?
>
>     set encoding=utf-8 termencoding=cp1250
>     set fileencodings=ucs-bom,utf-8,cp1250

Thank you. The last command is exactly what I was looking for (but I
would never figure it out alone)!

> Below is my 'statusline' setting, you may or may not find it useful. It
> displays the 'fileencoding' (or 'encoding' if 'fileencoding' is empty), the
> 'bomb' status and (if any) the current keymap. Disregard any linebreaks
> added by my mail client; it should be all on one line, and spurious line
> breaks should be replaced by spaces.
>
>      set statusline=%<%f\
> %h%m%r%=%k[%{(&fenc==\"\")?&enc:&fenc}%{(&bomb?\",BOM\":\"\")}]\
> %-14.(%l,%c%V%)\ %P

Thank you. Something very useful indeed.

> > 2. Does anyone have any idea why I can't set the latin2 encoding? (I
> > can set it, but the files are not displayed any different than if
> > cp1250 encoding is set. The worst thing is that probably 10 characters
> > are at some other place, but exactly the ones I need are displayed
> > wrong.)
>
> The Vim name is iso-8859-2 and you may need a working iconv.dll in your
> PATH. I got my iconv.exe and iconv.dll from the GnuWin32 project on
> sourceforge.net.
>
> To read a latin2 file, use
>
>     :e ++enc=iso-8859-2 filename.ext

It seems that I already installed iconv once (or that it was installed
by some other program). However, ++enc was the magic missing word :)
Thank you a lot!

And GnuWin32 is another fantastic set of tools. Thank you for telling
me about them. I have some other gnu tools installed, but some tools
are only present in GnuWin32, not in the one I have installed.

Thank you for those short, life-saving piecies of code once again, Tony,
    Mojca