Keeping the original encoding.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Keeping the original encoding.

kroiz
Hi
Some text files that I open are not readable and have weird signs,
The bom for this files is ff fe (UTF-16 BE according to wikipedia)
vim has fencs set to ucs-bom
it is vim version 7.2 on windows XP
I can load the files fine if before I load them I do
set encoding = utf-8
but I don't want to change the encoding of the file when saving.
Is there a way to that?

thanks
Guy Kroizman

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Keeping the original encoding.

Tony Mechelynck
On 20/02/10 09:16, kroiz wrote:

> Hi
> Some text files that I open are not readable and have weird signs,
> The bom for this files is ff fe (UTF-16 BE according to wikipedia)
> vim has fencs set to ucs-bom
> it is vim version 7.2 on windows XP
> I can load the files fine if before I load them I do
> set encoding = utf-8
> but I don't want to change the encoding of the file when saving.
> Is there a way to that?
>
> thanks
> Guy Kroizman
>

Normally, Vim should detect the encoding and remember how to translate
to disk what it has in memory.

'encoding' is the representation of the characters in Vim's internal
memory. UTF-16 cannot be used because it has too many null bytes, which
would terminate the C strings used by Vim (and BTW, FF FE is UTF-16le,
not -be), but UTF-8 is capable of representing the characters of all
charsets used on any computer, so if you set that, you're safe.

When loading an already existing file, Vim uses a heuristic defined by
the option 'fileencodings' (with s at the end): this is a
comma-separated lists of possible charsets, as follows:

- ucs-bom, if used (and it is recommended that it _be_ used) should come
first
- There should be no more than one 8-bit charset, and it should come last
- Charsets are tried from left to right, and the first one which doesn't
give an error signal is used to read the file. (That's why any 8-bit
charset used should be last: such charsets cannot give an error signal).

A typical value is: :set fencs=ucs-bom,utf-8,latin1

This will correctly detect any Unicode file which has a BOM, or failing
that Vim will try UTF-8, and if the file is not valid UTF-8 the file
will then be shown to you under the assumption that it is Latin1. Vim
stores the disk charset of the file in the local string option
'fileencoding' for that file, and the presence or absence of a BOM in
the local Boolean option 'bomb'. IOW, if the file you mentioned has been
read correctly,

        :setlocal fileencoding? bomb?
or, if (like me) you're lazy,
        :setl fenc? bomb?

sould reply

   fileencoding=utf-16le
   bomb

That means everything is OK, and that you don't need to do anything to
record the file in the correct encoding -- :w or :wq will know how to do
the required translation from the UTF-8 representation in memory.

If you see something else, you can save the file in UTF-16le with BOM
(which, then, will *not* be the original charset of the file) by doing

        :setlocal fenc=utf-16le bomb

before you save the file. (Use :setlocal, not just :set, because the
latter alters what will happen to _other_ files, especially the new ones
you create thereafter.)

For more info, and pointers to the relevant information in the help, see
http://vim.wikia.com/wiki/Working_with_Unicode


Best regards,
Tony.
--
What good is having someone who can walk on water if you don't follow
in his footsteps?

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php