Trouble getting started with vim and utf-8 file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Trouble getting started with vim and utf-8 file

DanKegel
The file http://winetricks.org/winetricks is, I hope, a utf-8 file,
but is not recognized as such in the vim that comes
with ubuntu 11.04 (with german locale, even).
It's mostly ascii, with just a few non-ascii lines, e.g.

#   If you do not see an o with two dots over it here [ö], stop!
...
        mymenu="$HOME/.local/share/applications/wine/Programs/
Electronic Arts/Th
e Sims Medieval/The Sims™ Medieval.desktop"

That first line contains an o umlaut, and the second line contains the
trademark symbol.

Opening the file with vi winetricks shows

#   If you do not see an o with two dots over it here [ö], stop!
...
         mymenu="$HOME/.local/share/applications/wine/Programs/
Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"

which isn't right.  Just opening up vi with no arguments, and doing
  !!cat winetricks
brings the file in great, and the utf-8 chars look good, but then
saving it complains
"winetricks"  CONVERSION ERROR in line 12328; 14640 lines, 496509
characters written
and yields a very corrupt file.

So what's going on?  It seems that vim has decided the file Is Not
UTF-8.  :se shows
  fileencoding=latin1
  fileencodings=ucs-bom,utf-8,default,latin1
even if I put
  set encoding=utf8 fileencoding=utf8
in ~/.vimrc.

Help...

Thanks,
Dan

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
BAA
Reply | Threaded
Open this post in threaded view
|

Re: Trouble getting started with vim and utf-8 file

BAA
Here's what i've found

Opening this file in gVim doesn't show it right. Encoding detected is
cp1251 (on my config)

issuing this command
:e ++enc=utf-8

did fine and displayed TM symbol, but it also gave warning about
illegal byte at line 7388
which looked so
   title="?Torrent 3.0" \
Previous section had µ , so just replaced illegal char with it.

Saving/opening from command line - works fine with encoding detected

It doesn't answer your question, just a workaround

On Apr 8, 9:33 am, DanKegel <[hidden email]> wrote:

> The filehttp://winetricks.org/winetricksis, I hope, a utf-8 file,
> but is not recognized as such in the vim that comes
> with ubuntu 11.04 (with german locale, even).
> It's mostly ascii, with just a few non-ascii lines, e.g.
>
> #   If you do not see an o with two dots over it here [ö], stop!
> ...
>         mymenu="$HOME/.local/share/applications/wine/Programs/
> Electronic Arts/Th
> e Sims Medieval/The Sims™ Medieval.desktop"
>
> That first line contains an o umlaut, and the second line contains the
> trademark symbol.
>
> Opening the file with vi winetricks shows
>
> #   If you do not see an o with two dots over it here [ö], stop!
> ...
>          mymenu="$HOME/.local/share/applications/wine/Programs/
> Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"
>
> which isn't right.  Just opening up vi with no arguments, and doing
>   !!cat winetricks
> brings the file in great, and the utf-8 chars look good, but then
> saving it complains
> "winetricks"  CONVERSION ERROR in line 12328; 14640 lines, 496509
> characters written
> and yields a very corrupt file.
>
> So what's going on?  It seems that vim has decided the file Is Not
> UTF-8.  :se shows
>   fileencoding=latin1
>   fileencodings=ucs-bom,utf-8,default,latin1
> even if I put
>   set encoding=utf8 fileencoding=utf8
> in ~/.vimrc.
>
> Help...
>
> Thanks,
> Dan

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Trouble getting started with vim and utf-8 file

Tony Mechelynck
In reply to this post by DanKegel
On 08/04/11 07:33, DanKegel wrote:

> The file http://winetricks.org/winetricks is, I hope, a utf-8 file,
> but is not recognized as such in the vim that comes
> with ubuntu 11.04 (with german locale, even).
> It's mostly ascii, with just a few non-ascii lines, e.g.
>
> #   If you do not see an o with two dots over it here [ö], stop!
> ...
>          mymenu="$HOME/.local/share/applications/wine/Programs/
> Electronic Arts/Th
> e Sims Medieval/The Sims™ Medieval.desktop"
>
> That first line contains an o umlaut, and the second line contains the
> trademark symbol.
>
> Opening the file with vi winetricks shows
>
> #   If you do not see an o with two dots over it here [ö], stop!
> ...
>           mymenu="$HOME/.local/share/applications/wine/Programs/
> Electronic Arts/The Sims Medieval/The Simsâ<84>¢ Medieval.desktop"
>
> which isn't right.  Just opening up vi with no arguments, and doing
>    !!cat winetricks
> brings the file in great, and the utf-8 chars look good, but then
> saving it complains
> "winetricks"  CONVERSION ERROR in line 12328; 14640 lines, 496509
> characters written
> and yields a very corrupt file.
>
> So what's going on?  It seems that vim has decided the file Is Not
> UTF-8.  :se shows
>    fileencoding=latin1
>    fileencodings=ucs-bom,utf-8,default,latin1
> even if I put
>    set encoding=utf8 fileencoding=utf8
> in ~/.vimrc.
>
> Help...
>
> Thanks,
> Dan
>

I've downloaded that file in my browser, then tried to open it in Vim,
which does not see it as UTF-8 even though I have 'enc' set to utf-8 and
'fencs' set to ucs-bom,utf-8,latin1

Intrigued, I hit 8g8 which brings me to line 7388 column 11 where the
character µ ("micro" prefix, similar to Greek mu, 0xB5) cannot be UTF-8
(bytes in the range 0x80 to 0xBF can only exist in UTF-8 as "trailing
bytes" in a multibyte sequence whose first byte is 0xC0 or higher).
Moving the cursor one position right and repeating gives me only a beep,
so this is AFAICT the only illegal character in the file -- but one
illegal byte in the whole file is enough to reject UTF-8 as the file's
'fileencoding'.

Rereading the file with

        :view ++enc=utf-8

reads it as UTF-8 at the cost of an error message about line 7388, where
the µ is now replaced by a question mark (but the o-umlaut at line 71
appears as ö).

It seems that your file is in UTF-8 at line 71 but in Latin1 at line
7388, which means that it is the file's fault, not Vim's fault, that
such a file cannot be displayed correctly.

See
        :help 8g8
        :help ++opt


Best regards,
Tony.
--
Never hit a man with glasses.  Hit him with a baseball bat.

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Trouble getting started with vim and utf-8 file

Dan Kegel
Thanks very much, guys!

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

RE: Trouble getting started with vim and utf-8 file

JohnBeckett
In reply to this post by DanKegel
DanKegel wrote:
> The file http://winetricks.org/winetricks is, I hope, a utf-8
> file, but is not recognized as such in the vim that comes
> with ubuntu 11.04 (with german locale, even).

It looks like you created that file, so you need to fix it
because it is not UTF-8.

Downloading the file with wget and dumping the bytes shows that
the character which I have shown as "?" in the following is not
valid UTF-8:
   title="?Torrent 3.0" \

That single byte is hex B5 or binary 10110101. That starts with
"10" which is never valid as the first byte of a character in
UTF-8.

BTW you can find that in Vim by opening the file and typing 8g8
which jumps to the next illegal byte sequence, then typing ga to
show the value.

John

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Trouble getting started with vim and utf-8 file

Dan Kegel
On Sat, Apr 9, 2011 at 12:13 AM, John Beckett <[hidden email]> wrote:

> It looks like you created that file, so you need to fix it
> because it is not UTF-8.
>
> Downloading the file with wget and dumping the bytes shows that
> the character which I have shown as "?" in the following is not
> valid UTF-8:
>   title="?Torrent 3.0" \
>
> That single byte is hex B5 or binary 10110101. That starts with
> "10" which is never valid as the first byte of a character in
> UTF-8.
>
> BTW you can find that in Vim by opening the file and typing 8g8
> which jumps to the next illegal byte sequence, then typing ga to
> show the value.

Yeah, that's what I gathered from the other replies.
Thanks!
- Dan

--
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php