UTF-8 input with Terminal.app

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF-8 input with Terminal.app

Charles Collicutt
Hi,

I'm using Vim 6.4.1 (co from the CVS yesterday) on OS X
10.3.9 compiled with --with-features=big --enable-multibyte.

$LANG is set to "en_GB.UTF-8" in both .bashrc and
environment.plist and Vim correctly notices this and sets
encoding to "utf-8".

When I run Vim with its GUI - by double-clicking the Vim.app
icon or using "open -a Vim" - it automatically sets
termencoding to "macroman". I have "set termencoding=utf-8"
in my .gvimrc but it seems to be overridden. Vim will then
accept UTF-8 input (generated using the option key or the
character palette.) Inputting with i_CTRL-V_digit or
digraphs also works.

That's great, but I usually use Vim in a terminal. With
Terminal.app set to use UTF-8, vim will accept input using
i_CTRL-V_digit or digraphs perfectly but won't accept UTF-8
input directly (such as that generated by the character
palette or by using the option key.) It seems to interpret
16-bit UTF-8 codes (such as those in the Latin-1 extension)
as two separate 8-bit characters. If I attempt to enter
lowercase-a-with-an-umlaut, I get uppercase-a-with-a-tilde
followed by the international currency symbol. If I'm right,
the UTF-8 representation of ä consists of two bytes, the
first of which happens to correspond to à in ISO8859-1 and
the second to ¤. Why is it assuming that the input is
ISO8859-1 when encoding is set to UTF-8 and termencoding is
empty?

If I set termencoding to "macroman" in my .vimrc then an
attempt to input "ä" results in "ä " (i.e. an extra space is
inserted after the character) which is no good but at least
it has sort of recognised the character properly.
Unfortunately, now I cannot input unicode characters with
digraphs because vim thinks I am limited to macroman.

I prefer to use digraphs anyway - so it is the same wherever
I am using vim - so this isn't that bad but I'd like to know
what is going on.

As I understand it, Unicode-aware applications in OS X (i.e.
all Cocoa apps and most recent Carbon apps) should receive
UTF-8 input. Non-aware apps receive whatever Script is
specified in International in System Preferences (in my
case, MacRoman.) Terminal.app is a Unicode-aware app, so it
should be receiving UTF-8 input. This fits with the fact
that the garbage I get when I try to enter non-ascii
characters into vim does seem to be the result of
interpreting 16-bit UTF-8 codes as two separate 8-bit
characters. What I don't understand is why setting
termencoding to macroman results in the correct character
followed by a space? I also don't understand why it works in
gvim and not vim. When running with a GUI, termencoding only
specifies the input and not the display, whereas in a
terminal it specifies both, right?  Does that have anything
to do with it? Is there any way to decouple the input and
display encodings when running in a terminal? And why does
macroman work anyway, if it's receiving UTF-8 input?

Any help would be very appreciated...

--
Charles

attachment0 (193 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 input with Terminal.app

Charles Collicutt
I've done some more investigation and can clarify the
problem slightly now:

> When I run Vim with its GUI - by double-clicking the
> Vim.app icon or using "open -a Vim" - it automatically
> sets termencoding to "macroman". Vim will then accept
> UTF-8 input.

It doesn't actually accept UTF-8 input at all. OS X treats
Vim.app as a non-Unicode-aware application so sends it input
according to the Script setting in the International panel
of System Preferences. So it is actually receiving MacRoman
input (which fits termencoding) and converting it to UTF-8
internally as encoding is set to utf-8. Therefore you can't
actually input anything that isn't in the MacRoman character
repertoire without using digraphs or i_CTRL-V_digit.

Terminal.app is treated as being Unicode-aware, so it
receives UTF-8 input. However, vim seemed to be treating
this as ISO8859-1 input. So, for example, ä would appear as
ä (because ä is C3A4 in UTF-8 while à is C3 and ¤ is A4 in
ISO8859-1.) This turned out to be the fault of a default
setting in Terminal.app - in the Emulation pane of Window
Settings there is an option to "Escape non-ASCII characters"
which is ticked by default. If unticked, UTF-8 input to vim
works properly.

So, all that remains is to get OS X to treat Vim.app as a
Unicode-aware application so UTF-8 input works with the GUI.

--
Charles

attachment0 (193 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 input with Terminal.app

Benji Fisher
On Tue, Nov 01, 2005 at 06:54:49PM +0000, Charles Collicutt wrote:

> I've done some more investigation and can clarify the
> problem slightly now:
>
> > When I run Vim with its GUI - by double-clicking the
> > Vim.app icon or using "open -a Vim" - it automatically
> > sets termencoding to "macroman". Vim will then accept
> > UTF-8 input.
>
> It doesn't actually accept UTF-8 input at all. OS X treats
> Vim.app as a non-Unicode-aware application so sends it input
> according to the Script setting in the International panel
> of System Preferences. So it is actually receiving MacRoman
> input (which fits termencoding) and converting it to UTF-8
> internally as encoding is set to utf-8. Therefore you can't
> actually input anything that isn't in the MacRoman character
> repertoire without using digraphs or i_CTRL-V_digit.
>
> Terminal.app is treated as being Unicode-aware, so it
> receives UTF-8 input. However, vim seemed to be treating
> this as ISO8859-1 input. So, for example, ? would appear as
> ä (because ? is C3A4 in UTF-8 while ? is C3 and ? is A4 in
> ISO8859-1.) This turned out to be the fault of a default
> setting in Terminal.app - in the Emulation pane of Window
> Settings there is an option to "Escape non-ASCII characters"
> which is ticked by default. If unticked, UTF-8 input to vim
> works properly.
>
> So, all that remains is to get OS X to treat Vim.app as a
> Unicode-aware application so UTF-8 input works with the GUI.
>
> --
> Charles

     Have you tried vim 7.0?  I think that some Mac-specific code was
added that changes how it deals with Unicode.  If you do not feel like
compiling it yourself, you can get a binary at

http://macvim.org/OSX/index.php#Downloading

HTH --Benji Fisher