Changing encoding of an already loaded buffer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Changing encoding of an already loaded buffer

A. Wik
Hi all,

I sometimes need to change the encoding used for a file.  I have the
default set to latin1 except for files with an ucs-bom.  However, when
I load a file encoded in UTF-8 or CP-437 the default is wrong.  What I
do then is normally to ":set fencs=utf8" and ":vi" to reload the file.

However, what can I do about a file that cannot be reloaded?  Eg:

$ man llseek | gvim -f -

To work around it, I have to do this:

$ man llseek > llseek.man
$ gvim llseek.man

Is there another way?

Regards,
Albert.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mSGPPmWZnyfjTiLQetcS416ZgHmYW6aXXsF%3DSHx-bZfUw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
The actual "correct" way to "change" the encoding of a buffer is, I
believe, with the "++enc" option, added either to :e (e.g. `:e
++enc=utf8`) or several similar commands such as indeed :vi (`:vi
++enc=utf8`).

However I couldn't find a way to make it work with a file-less buffer,
such as your pipe example:

If I use `:e! ++enc=utf8` I'm given an «E32: No file name» error.

I thought of passing "%" of "#n" as the filename for :e (`:e ++enc=utf8
%`), but it doesn't work, I'm given a «E499: Empty file name for '%' or
'#', only works with ":p:h"» error (and indeed the `:h _%` stuff is
described as standing for "file names", not for the actual buffers).

Then I tried adding a filename, with `:file whatever`, but once that's
done :e! loads a new empty buffer named "whatever"...

So there doesn't seem to be a way to really reload (possibly with
different encoding options) the current buffer, only to reload the file
from which the current buffer was loaded, and so for file-less buffers
no way at all.

However under Linux and other systems there may well be a way to access
the buffer's file's descriptor (/dev/fd/0 ?), so it might work by
passing that as the filename.

And there's probably some other way by copying the text around.

By the way, apparently this also means that you can't even set the
encoding of a pipe that you haven't yet created, from the shell, since
to the best of my knowledge the only way to set the encoding of a file
from the shell, before opening it, is `vim +":e ++enc=<encoding>
<filename>"` (which actually means to open it from inside vim). But
maybe you can with some more intricate command.


I'm far from being Vim expert however, I might well be missing something
(or a lot).


And encoding stuff is in general quite a mess in Vim, I'll grumble about
it one time or another... :/


Cheers

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/07941846-edc4-431c-3889-0c7020254157%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
Ah yes, I had also tried passing "-" as a filename for the reload
attempts, nope, it was interpreted as an actual "-" file name...

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/c3639889-8787-da73-ed90-e7bdbea86fd4%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Tony Mechelynck
In reply to this post by A. Wik
On Mon, Dec 7, 2020 at 5:40 PM A. Wik <[hidden email]> wrote:

>
> Hi all,
>
> I sometimes need to change the encoding used for a file.  I have the
> default set to latin1 except for files with an ucs-bom.  However, when
> I load a file encoded in UTF-8 or CP-437 the default is wrong.  What I
> do then is normally to ":set fencs=utf8" and ":vi" to reload the file.
>
> However, what can I do about a file that cannot be reloaded?  Eg:
>
> $ man llseek | gvim -f -
>
> To work around it, I have to do this:
>
> $ man llseek > llseek.man
> $ gvim llseek.man
>
> Is there another way?
>
> Regards,
> Albert.

If you find out after loading the stdin that it was opened in the
wrong encoding, then it's too late; but if you know the file's
encoding in advance, the should be a way, especially if your
'encoding' (the charset used internally by Vim) is UTF-8 and if your
Vim is compiled with +iconv.

To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
        set fileencodings=ucs-bom,utf-8,latin1
somewhere in your vimrc (the s at the end of fileencodings is
important); but this isn't enough for files in cp437, especially if
Vim gets them on stdin. For those, load them with (untested)
  someprogram | view ++enc=cp437 -
(the minus sign at the end is important) which means that you have to
know the file's encoding before starting Vim if it is other than UTF-8
or Latin1. Using "view" instead of "vim" on the command-line avoids
problems with the 'modified' flag; for ++enc see ":help ++enc".

The above will detect files in 7-bit us-ascii encoding as utf-8 rather
than Latin1. This is not a bug, because the 128 characters which are
valid in us-ascii are represented identically in all three in
us-ascii, Latin1 and UTF-8.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAJkCKXtd5YiRQv3wa7GAOwy%3Dq9P1zcGKv0rgQRpr1sw2qO2A0Q%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

A. Wik
Hi all,

I tried a few things:

(1) gvim -f ++enc=utf8 -
result: "E492: Not an editor command: +enc=utf8
(2) gvim -f +enc=utf8 -
result: see (1)
(3) gvim -f +"set fenc=utf8" -
result: no error message; sets fenc to "utf-8", but file is loaded as
if with latin1.
(4) gvim -f -c "set fenc=utf8" -
result: see (3)
(5) gvim -f --cmd "set fenc=utf8" -
no error message; fenc remains is "latin1"

A different approach:
(6) (man llseek ; echo 'vim:fenc=utf8:') | gvim -f -
result: no error message; fenc gets set to "utf-8"; file is loaded as
if with latin1

See also below:

On Tue, 8 Dec 2020 at 01:45, Tony Mechelynck
<[hidden email]> wrote:
>
> If you find out after loading the stdin that it was opened in the
> wrong encoding, then it's too late; but if you know the file's
> encoding in advance, the should be a way, especially if your
> 'encoding' (the charset used internally by Vim) is UTF-8 and if your
> Vim is compiled with +iconv.

Both conditions hold true.

> To be able to detect Latin1 and UTF-8 (and UTF-16 with BOM) automagically, add
>         set fileencodings=ucs-bom,utf-8,latin1

I tried that months ago.  The result was that new files were assumed
to have fenc=utf-8, for reasons you mention below.  This is not
acceptable, so I use "fileencodings=ucs-bom,latin1,cp437" (yes, I know
the trailing ",cp437" is pointless).

> somewhere in your vimrc (the s at the end of fileencodings is
> important); but this isn't enough for files in cp437, especially if
> Vim gets them on stdin. For those, load them with (untested)
>   someprogram | view ++enc=cp437 -

I tested it; see top of message.

> The above will detect files in 7-bit us-ascii encoding as utf-8 rather
> than Latin1. This is not a bug, because the 128 characters which are
> valid in us-ascii are represented identically in all three in
> us-ascii, Latin1 and UTF-8.

Right!

Cheers,
Albert.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mQiUGf4-PEUU%2Bi3efpj0VWG7nmueO-OedxKUcij6_MTVA%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

A. Wik
In reply to this post by Gabriele Fava
On Mon, 7 Dec 2020 at 20:49, Gabriele F <[hidden email]> wrote:
>
> The actual "correct" way to "change" the encoding of a buffer is, I
> believe, with the "++enc" option, added either to :e (e.g. `:e
> ++enc=utf8`) or several similar commands such as indeed :vi (`:vi
> ++enc=utf8`).

Thanks, I didn't know about that.  It's more convenient than changing
the "fileencodings".

> However I couldn't find a way to make it work with a file-less buffer,
> such as your pipe example:

Right.  The only way I've found is to use a temporary file.
Incidentally, the zsh shell makes that easy:
% gvim -f =(man llseek)

Regards,
Albert.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mQKZ1DPRYc%2B_bz%3D8mTFUWfnz2KhDthX7-oDBZE7eY_2BA%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Bram Moolenaar

Albert Wik wrote:

> On Mon, 7 Dec 2020 at 20:49, Gabriele F <[hidden email]> wrote:
> >
> > The actual "correct" way to "change" the encoding of a buffer is, I
> > believe, with the "++enc" option, added either to :e (e.g. `:e
> > ++enc=utf8`) or several similar commands such as indeed :vi (`:vi
> > ++enc=utf8`).
>
> Thanks, I didn't know about that.  It's more convenient than changing
> the "fileencodings".
>
> > However I couldn't find a way to make it work with a file-less buffer,
> > such as your pipe example:
>
> Right.  The only way I've found is to use a temporary file.
> Incidentally, the zsh shell makes that easy:
> % gvim -f =(man llseek)

Assuming that loading the text as latin1 didn't mess it up (since it's
an 8 bit encoding it should be OK), then you can convert it to utf-8
with:
        :set fencs=utf-8,latin1
        :%!iconv -f latin1 -t utf-8

Vim might recognize the utf-8 encoding, if not set set 'fenc':
        :set fenc=utf8

Hopefully that works.

--
You can be stopped by the police for biking over 65 miles per hour.
You are not allowed to walk across a street on your hands.
                [real standing laws in Connecticut, United States of America]

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/202012081255.0B8CtN671630556%40masaka.moolenaar.net.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

A. Wik
On Tue, 8 Dec 2020 at 12:55, Bram Moolenaar <[hidden email]> wrote:

>
>
> Albert Wik wrote:
> >
> > Right.  The only way I've found is to use a temporary file.
> > Incidentally, the zsh shell makes that easy:
> > % gvim -f =(man llseek)
>
> Assuming that loading the text as latin1 didn't mess it up (since it's
> an 8 bit encoding it should be OK), then you can convert it to utf-8
> with:
>         :set fencs=utf-8,latin1
>         :%!iconv -f latin1 -t utf-8
>
> Vim might recognize the utf-8 encoding, if not set set 'fenc':
>         :set fenc=utf8
>
> Hopefully that works.

Thanks a lot for the "%!"-idea!  That's what I needed.

This works:
:set fencs=utf8
:%!cat
although "fenc" remains "latin1".

It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
fact corrupt the data!) because the data is already in UTF-8, and that
is why it is not displayed properly in Vim (because Vim thinks it is
in Latin-1); in particular, the short dash character is shown as
"â<80><90>".  When it is displayed properly, a "‐" is shown; putting
the cursor at it and doing "ga" reports that this is character number
0x2010.

Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
not going to change the "fenc" accordingly?

Cheers,
Albert.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mREoMoWYG%2BW26d_vWPiD5bKhU-r5MvY8RSOE3YTj-KZvQ%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Bram Moolenaar

Albert Wik wrote:

> > > Right.  The only way I've found is to use a temporary file.
> > > Incidentally, the zsh shell makes that easy:
> > > % gvim -f =(man llseek)
> >
> > Assuming that loading the text as latin1 didn't mess it up (since it's
> > an 8 bit encoding it should be OK), then you can convert it to utf-8
> > with:
> >         :set fencs=utf-8,latin1
> >         :%!iconv -f latin1 -t utf-8
> >
> > Vim might recognize the utf-8 encoding, if not set set 'fenc':
> >         :set fenc=utf8
> >
> > Hopefully that works.
>
> Thanks a lot for the "%!"-idea!  That's what I needed.
>
> This works:
> :set fencs=utf8
> :%!cat
> although "fenc" remains "latin1".

Yeah, for an existing buffer and filtering the first entry in 'fencs' is
used to read the filter output, but 'fenc' isn't set.  That's a bit
strange, but I'm not sure what would break if we change this.  It might
actually be good to fix this, since if you write that file it might get
messed up.
 

> It is not appropriate to use "iconv -f latin1 -t utf8" (that does in
> fact corrupt the data!) because the data is already in UTF-8, and that
> is why it is not displayed properly in Vim (because Vim thinks it is
> in Latin-1); in particular, the short dash character is shown as
> "â<80><90>".  When it is displayed properly, a "‐" is shown; putting
> the cursor at it and doing "ga" reports that this is character number
> 0x2010.
>
> Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> not going to change the "fenc" accordingly?

When reading a file (or filter output) the values in 'fencs' are tried
one by one.  Normally when something fails then the next one is tried,
but since reading filter output from a pipe doesn't allow for a retry,
it will always use the first one.

The real problem is that 'fencs' was set to "latin1" at first, thus Vim
didn't even try to use another encoding.  Perhaps it also works if you
do that on the command line:
        somecommand | vim - -c 'set fencs=utf8,latin1'

Didn't try it.  Should at least work if you set 'fencs' in your .vimrc.


--
If an elephant is left tied to a parking meter, the parking fee has to be paid
just as it would for a vehicle.
                [real standing law in Florida, United States of America]

 /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/202012081647.0B8GlVCw1678686%40masaka.moolenaar.net.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

A. Wik
On Tue, 8 Dec 2020 at 16:47, Bram Moolenaar <[hidden email]> wrote:

>
>
> Albert Wik wrote:
> >
> > Why does "set fencs=utf8" matter for the "%!cat" operation if Vim is
> > not going to change the "fenc" accordingly?
>
> When reading a file (or filter output) the values in 'fencs' are tried
> one by one.  Normally when something fails then the next one is tried,
> but since reading filter output from a pipe doesn't allow for a retry,
> it will always use the first one.

Thanks, that is useful to know.

> The real problem is that 'fencs' was set to "latin1" at first, thus Vim
> didn't even try to use another encoding.  Perhaps it also works if you
> do that on the command line:
>         somecommand | vim - -c 'set fencs=utf8,latin1'

No, because (according to --help) the command is run after loading the
first file.  Meanwhile, "--cmd <command>" does not work because it
runs the command before sourcing any vimrc file, and so, the new fencs
setting gets overwritten by the vimrc.  It would be useful to have an
option to run a command just *before* loading the first file but after
any rc-files.

I don't include utf8 in my default fencs setting because that has the
side effect of using utf8 for any newly created files.

-aw

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mSgAFud82k-rEv4_MjWkPZQy84VRGFm1Yy79ZROEATppw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by A. Wik
On 08/12/2020 10.47, A. Wik wrote:

> Hi all,
>
> I tried a few things:
>
> (1) gvim -f ++enc=utf8 -
> result: "E492: Not an editor command: +enc=utf8
> (2) gvim -f +enc=utf8 -
> result: see (1)
> (3) gvim -f +"set fenc=utf8" -
> result: no error message; sets fenc to "utf-8", but file is loaded as
> if with latin1.
> (4) gvim -f -c "set fenc=utf8" -
> result: see (3)
> (5) gvim -f --cmd "set fenc=utf8" -
> no error message; fenc remains is "latin1"

Yes, I tried stuff like that while perusing the manual a hundred times,
it can't work and that's also kind of declared in some points of the
documentation; :h fenc is a jungle, and I seem to remember that it's
also not completely correct. Basically 'fenc' is only looked at when
writing a file, and who knows what the output of that write will be.

So essentially, besides 'fencs', the ++enc "opt" (which **has nothing to
do with the 'enc' option!!!**) is the only thing that can have an effect
when reading a file, and after it's read you better forget about fixing
its encoding.

The only way forward in my opinion would be to deprecate 'enc', 'fenc',
++enc and probably 'fencs', giving warnings when they do get used, and
introduce completely different options and commands.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/10740b06-b5c1-cc44-9c3e-d5607662214a%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by A. Wik
On 08/12/2020 14.58, A. Wik wrote:
> Thanks a lot for the "%!"-idea!  That's what I needed.
>
> This works:
> :set fencs=utf8
> :%!cat

That :%!cat is indeed a neat (if hacky) idea!

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/7b28bffa-70f8-3009-45ff-ce1a85be472c%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by Bram Moolenaar
On 08/12/2020 17.47, Bram Moolenaar wrote:
>> This works:
>> :set fencs=utf8
>> :%!cat
>> although "fenc" remains "latin1".
> Yeah, for an existing buffer and filtering the first entry in 'fencs' is
> used to read the filter output, but 'fenc' isn't set.  That's a bit
> strange, but I'm not sure what would break if we change this.  It might
> actually be good to fix this, since if you write that file it might get
> messed up.

I performed a couple of tests trying to write the result to a file after
doing the above (using a correct UTF-8 file as source):
- if you leave fenc to latin1 the new file will be in latin1 (with all
the characters correctly encoded)
- if you set fenc to utf8 *after* the %!cat (but of course before
writing the file) the new file will be in UTF-8 with all the characters
correctly encoded
- if you set fenc to utf8 *before* the %!cat (and of course before
writing the file) the new file will be... a mess: by all appearances Vim
thinks that the individual bytes of the UTF-8 file are individual latin1
characters, and it then converts them to UTF-8; so you'll get a UTF-8
encoded file with the wrong characters, e.g. a "C3 B2" sequence in the
original file, which stands for a UTF-8 encoded "ò", (Unicode code point
F2) will become a "C3 83 C2 B2" sequence in the written file: "C3" is a
"Â" in latin1 (and yes, in Unicode too), and "Â" is encoded as "C3 83"
in UTF-8, "B2" is a "²" in latin1 (and Unicode) and "²" is encoded as
"C2 B2" in UTF-8 (in case someone noticed it, don't let yourself get
confused by the fact that C3 and B2 occur both in the source and the
translated sequence, that's largely just an unfortunate coincidence of
my example).

Given that Unicode is identical to latin1 in the first 256 characters,
to better confirm what happened I also tried using another charset
(cp850) instead of latin1 in the above tests (fencs=cp850 in my vimrc
and setting fenc=cp850 in the second and third tests), still using a
correct UTF-8 file as a source; the results are analogous, with a
correct cp850 file in the first test, a correct UTF-8 one in the second
and a UTF-8 one with the original file's bytes interpreted as cp850 and
then converted to UTF-8 in the third (the original "ò", "C3 83", becomes
a "E2 94 9C E2 96 93" sequence, given that "C3" is a "├" symbol in
cp850, Unicode code point 251C ->  "E2 94 9C" UTF-8, and 83 is a "▓",
Unicode code point 2593 -> "E2 96 93" UTF-8).

Yes, I... ahem, had a lot of fun this afternoon :D


Cheers

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/d90f2dd2-ef6a-fb16-0118-4f30dc238aba%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by A. Wik
On 09/12/2020 18.47, A. Wik wrote:
> I don't include utf8 in my default fencs setting because that has the
> side effect of using utf8 for any newly created files.

Completely off-topic, if you don't have particular needs I'd advise you
to use UTF- 8 with BOMs for all your new files ('set bomb', 'set
encoding=utf-8' and 'fenc' left to the default in your vimrc), it will
prevent any future encoding problem for at least them.

I've been doing so for more than a decade and pretty much never had
problems, and sigh a relief every time I see I'm working with one of them.

I heard many protest the BOMs in UTF-8, but they are the first thing
ever to allow a reliable encoding detection and they solve a lot more
problems than they can cause (if they cause problems they usually do so
immediately and noticeably, much better than discovering years later
that you irremediably botched the encoding of some file). So I find it
absurd to disparage them, and delusive to think that we'll ever get to a
point when non-utf8 files will be rare enough that we won't need to
handle them.
I imagine most of the critics are from countries that never needed more
than ASCII

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/a7b20b97-cfc7-a2d6-d2a3-744a438199a5%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Tony Mechelynck
On Wed, Dec 9, 2020 at 9:20 PM Gabriele F <[hidden email]> wrote:

>
> On 09/12/2020 18.47, A. Wik wrote:
> > I don't include utf8 in my default fencs setting because that has the
> > side effect of using utf8 for any newly created files.
>
> Completely off-topic, if you don't have particular needs I'd advise you
> to use UTF- 8 with BOMs for all your new files ('set bomb', 'set
> encoding=utf-8' and 'fenc' left to the default in your vimrc), it will
> prevent any future encoding problem for at least them.
>
> I've been doing so for more than a decade and pretty much never had
> problems, and sigh a relief every time I see I'm working with one of them.
>
> I heard many protest the BOMs in UTF-8, but they are the first thing
> ever to allow a reliable encoding detection and they solve a lot more
> problems than they can cause (if they cause problems they usually do so
> immediately and noticeably, much better than discovering years later
> that you irremediably botched the encoding of some file). So I find it
> absurd to disparage them, and delusive to think that we'll ever get to a
> point when non-utf8 files will be rare enough that we won't need to
> handle them.
> I imagine most of the critics are from countries that never needed more
> than ASCII

IIUC the critics are from people who do a lot of programming, either
in C (where sources are supposed to be in Latin1; they may be in UTF-8
if characters above U+007F are used only in alphanumeric literals, but
they cannot start with a BOM) or in Perl, Python, Unix shell script
language, etc. (where the first two bytes of a source file must be #!
in that order):

The problem with ":setg fenc=utf8 bomb" is that *every* new text file
will start with 0xEF 0xBB 0xBF unless you explicitly turn it off for
that file by means of ":setl nobomb" or ":setl fenc=latin1" or similar
before writing it. For C sources this wil confuse the compiler
(generating an error and preventing successful compilation) and for
anything starting with a shebang (shell scripts, perl sources, etc.)
it will prevent the #! shebang leader from being recognized. OTOH for
"well-behaved" filetypes like Vim scripts (if not run by means of a
shebang), HTML pages, CSS style sheets, etc., there is no problem. So
whether or not to set it should depend on what types of files you
write most often. I use it because most of the files I write are HTML
or CSS, followed by Vim scripts; but then when I write a shell script
I have to remember to turn the 'bomb' setting off for that file.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAJkCKXtbAtoj%2BU0EfF-oudbmoMng5nt2AbZZUi%2B7N6HayrwqmA%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

A. Wik
In reply to this post by Gabriele Fava
On Wed, 9 Dec 2020 at 20:20, Gabriele F <[hidden email]> wrote:
>
> On 09/12/2020 18.47, A. Wik wrote:
> > I don't include utf8 in my default fencs setting because that has the
> > side effect of using utf8 for any newly created files.
>
> Completely off-topic, if you don't have particular needs ...

I just like to keep things "8-bit clean".  As long as all tools used
to process the files are also 8-bit clean, nothing gets corrupted.
Alas, it does mean files are sometimes displayed incorrectly.  But in
my experience, it gets messy when I introduce UTF-8.

> I imagine most of the critics are from countries that never needed more
> than ASCII

There is something to it.  People who use only ASCII seem to like
UTF-8 better than those who frequently use non-English characters.
I've seen claims that UTF-8 is "compact" but compared to strictly
8-bit character sets like Latin-1 it is not.

-aw

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALPW7mTLvRvds6gHuL1%3Du2BYcqaL1HgL_aPFsLY05vryPZNotg%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Tony Mechelynck
On Thu, Dec 10, 2020 at 2:04 PM A. Wik <[hidden email]> wrote:

>
> On Wed, 9 Dec 2020 at 20:20, Gabriele F <[hidden email]> wrote:
> >
> > On 09/12/2020 18.47, A. Wik wrote:
> > > I don't include utf8 in my default fencs setting because that has the
> > > side effect of using utf8 for any newly created files.
> >
> > Completely off-topic, if you don't have particular needs ...
>
> I just like to keep things "8-bit clean".  As long as all tools used
> to process the files are also 8-bit clean, nothing gets corrupted.
> Alas, it does mean files are sometimes displayed incorrectly.  But in
> my experience, it gets messy when I introduce UTF-8.
>
> > I imagine most of the critics are from countries that never needed more
> > than ASCII
>
> There is something to it.  People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.
>
> -aw

- For pure 7-bit ASCII, all three of us-ascii, Latin1 and UTF-8 are
equivalent, they represent the data identically.
- For "Western Latin" (French, Spanish, etc.) Latin1 is slightly more
economical than UTF-8. How much more depends on the percent abundance
of accented letters not found in ASCII.
- When mixing several scripts (at least two of Latin, Greek, Cyrillic,
Hebrew, Arabic, CJK ideographic, etc.) within a single document, I
know no better encoding than UTF-8. In an 8-bit charset like Latin1
you have only (at most) 256 different valid character values, and that
is much too few as soon as you start mixing scripts: be it for a
juxtalinear edition of the Bible (with the original Hebrew, Aramaic or
Greek text next to a translation and/or commentary) or for a
Greek-Russian or Russian-Finnish dictionary. And of course even for a
single CJK script, no 8-bit script can do the job.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAJkCKXt2bHw0RfJ6yfOBX%3D7%3DErBV0nPtUK--V0tP%2B6Og%3DONTHg%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by Gabriele Fava
I should add that those tests were all made with 'encoding' set in my
vimrc to utf-8, I haven't tried with the default latin1 or other values.
I don't know if this influenced something.

That's the setting that A. Wik said to have as well, anyway.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/7074f3f9-22e0-273c-41fc-34f9dc428704%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Gabriele Fava
In reply to this post by Gabriele Fava
On 09/12/2020 20.35, Gabriele F wrote:
> That :%!cat is indeed a neat (if hacky) idea!

It should be noted that it works only as long as the 'shelltemp' option
is on though, which is the default.

'shelltemp' makes Vim use a temporary file for the filtering instead of
a pipe, which is evidently the (probably accidental) cause of the
effects on the encoding.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/3cd5d0bd-4c71-09cf-3368-81e4167f79de%40tiscali.it.
Reply | Threaded
Open this post in threaded view
|

Re: Changing encoding of an already loaded buffer

Boyko Bantchev
In reply to this post by A. Wik
On Thu, 10 Dec 2020 at 15:04, A. Wik <[hidden email]> wrote:

>
> On Wed, 9 Dec 2020 at 20:20, Gabriele F <[hidden email]> wrote:
> ..............
> > I imagine most of the critics are from countries that never needed more
> > than ASCII
>
> There is something to it.  People who use only ASCII seem to like
> UTF-8 better than those who frequently use non-English characters.
> I've seen claims that UTF-8 is "compact" but compared to strictly
> 8-bit character sets like Latin-1 it is not.

To people who use only ASCII the distinction between ASCII and
UTF-8 is totally irrelevant, because in their case UTF-8 is precisely ASCII
by definition.

But people like me, who regularly use scripts other than Latin, and who
also like to indulge themselves with mathematical and other ‘special’
characters in plain text – they are those who really appreciate and
praise the advent of Unicode and UTF-8.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CALdOZq%3DTObU-DcO1Jvt9P6yxGKraq8c8mVO6d565rt8ZGd0Wfw%40mail.gmail.com.
12