Unicode conversion bug?

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Unicode conversion bug?

Mansing

I am playing with CSV files with Chinese contents.  Windows (Excel)
outputs a text file in ucs-2le coding.  VIM correctly opens this file
and displays Chinese characters.  I then convert the file into utf-8
coding...  but all the Chinese characters were morphed into monsters.
I did the same conversion using Windows Notepad and it works
perfectly.

Comparing the two output (utf-8) files:

VIM converted utf-8:

00040b0: 7169 3320 796f 6e67 3120 28c3 a5c2 91c2  qi3 yong1 (.....
00040c0: a8c3 a5c2 95c2 93c3 a5c2 bac2 b829 2c20  .............),

Notepad converted utf-8:

0004090: 7169 3320 796f 6e67 3120 28e5 91a8 e595  qi3 yong1 (.....
00040a0: 93e5 bab8 292c 200d 0a43 6f73 6d6f 7320  ....), ..Cosmos

As can be seen, English characters are fine, but Chinese characters
(those "..." inside parantheses) are different.  Does it suggest a bug
in the VIM conversion?  Note that I did not set my Windows locale to
Chinese (because I have non-unicode multibyte files in other
languages.)

mt 2008-03-12

PS, My first post here... my first post in Google in fact.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

[hidden email] wrote:

> I am playing with CSV files with Chinese contents.  Windows (Excel)
> outputs a text file in ucs-2le coding.  VIM correctly opens this file
> and displays Chinese characters.  I then convert the file into utf-8
> coding...  but all the Chinese characters were morphed into monsters.
> I did the same conversion using Windows Notepad and it works
> perfectly.
>
> Comparing the two output (utf-8) files:
>
> VIM converted utf-8:
>
> 00040b0: 7169 3320 796f 6e67 3120 28c3 a5c2 91c2  qi3 yong1 (.....
> 00040c0: a8c3 a5c2 95c2 93c3 a5c2 bac2 b829 2c20  .............),
>
> Notepad converted utf-8:
>
> 0004090: 7169 3320 796f 6e67 3120 28e5 91a8 e595  qi3 yong1 (.....
> 00040a0: 93e5 bab8 292c 200d 0a43 6f73 6d6f 7320  ....), ..Cosmos
>
> As can be seen, English characters are fine, but Chinese characters
> (those "..." inside parantheses) are different.  Does it suggest a bug
> in the VIM conversion?  Note that I did not set my Windows locale to
> Chinese (because I have non-unicode multibyte files in other
> languages.)
>
> mt 2008-03-12
>
> PS, My first post here... my first post in Google in fact.

You don't need to change your Windows locale (I can display Chinese
characters in Unicode perfectly in an fr_BE locale), but you may need to
use gvim and/or change some of its options.

- Are you using gvim or console Vim? If Console Vim, does your console
support Unicode?
- In any case, what is the reply to

        :set encoding?

If it isn't a Unicode encoding (typically utf-8, but Vim also uses UTF-8
internally if 'encoding' is set to any flavour of UCS-2, UCS-4, UTF-16
or UTF-32), then Vim cannot represent all Unicode codepoints -- it
probably won't be able to convert your file then.

See also http://vim.wikia.com/wiki/Working_with_Unicode


Best regards,
Tony.
--
GUARD #2:  Wait a minute -- supposing two swallows carried it together?
GUARD #1:  No, they'd have to have it on a line.
GUARD #2:  Well, simple!  They'd just use a standard creeper!
GUARD #1:  What, held under the dorsal guiding feathers?
GUARD #2:  Well, why not?
                                   The Quest for the Holy Grail (Monty
Python)

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Mansing
Thanks Tony for the advice.  More details on my problem:

I am using gVim on Windows Vista Business which has full (normal) Chinese support.  I set "enc=ucs-2le" and "fenc=utf-8" for conversion (saving to utf-8 format);  I use the "Save As" menu option to do the same on Notepad.  The (ucs-2le) input file was displayed correctly on gVim with guifont "MingLiU".

To examine the output files, I set "enc=utf-8" and the guifont to "MingLiU", for both cases.  The utf-8 file converted by Notepad was displayed correctly, while that by gVim wasn't.  The hex dumps, for both output files, were copied from the gVim window after ":%!xxd".

mt 2008-03-12


Tony Mechelynck wrote:
[hidden email] wrote:
  
I am playing with CSV files with Chinese contents.  Windows (Excel)
outputs a text file in ucs-2le coding. . .

VIM converted utf-8:

00040b0: 7169 3320 796f 6e67 3120 28c3 a5c2 91c2  qi3 yong1 (.....
00040c0: a8c3 a5c2 95c2 93c3 a5c2 bac2 b829 2c20  .............),

Notepad converted utf-8:

0004090: 7169 3320 796f 6e67 3120 28e5 91a8 e595  qi3 yong1 (.....
00040a0: 93e5 bab8 292c 200d 0a43 6f73 6d6f 7320  ....), ..Cosmos

As can be seen, English characters are fine, but Chinese characters
(those "..." inside parantheses) are different.  Does it suggest a bug
in the VIM conversion? . . .
    
. . .
- Are you using gvim or console Vim? If Console Vim, does your console 
support Unicode?
- In any case, what is the reply to

	:set encoding?
. . .
  

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

Mansing wrote:

> Thanks Tony for the advice.  More details on my problem:
>
> I am using gVim on Windows Vista Business which has full (normal)
> Chinese support. I set "enc=ucs-2le" and "fenc=utf-8" for conversion
> (saving to utf-8 format); I use the "Save As" menu option to do the same
> on Notepad. The (ucs-2le) input file was displayed correctly on gVim
> with guifont "MingLiU".
>
> To examine the output files, I set "enc=utf-8" and the guifont to
> "MingLiU", for both cases. The utf-8 file converted by Notepad was
> displayed correctly, while that by gVim wasn't. The hex dumps, for both
> output files, were copied from the gVim window after ":%!xxd".
>
> mt 2008-03-12

Try the following after starting gvim afresh (lines starting with a
double-quote are comments; you don't need to type them)

        :if &tenc == "" | let &tenc = &enc | endif
        :set enc=utf-8 fencs=ucs-bom,utf-8,utf-16le,latin1
        :set gfn=MingLiU:h16:cDEFAULT
        :e inputfile.txt
        " (the input UCS-2le file). Is it displayed correctly?

        " only if it isn't:
        :e ++enc=utf-16le

        :setlocal fenc=utf-8
        " don't change 'encoding'
        :saveas! outfile.txt
        " the output (UTF-8) file

        :enew
        " to clear the current window
        :e outfile.txt
        " is it displayed correctly now?
        :setlocal fenc?
        " gvim should reply: fileencoding=utf-8

        " if it isn't displayed correctly
        :e ++enc=utf-16le
        :e ++enc=utf-8
        "...etc., until you get it to display correctly

If it totally doesn't work, retry the above after invoking the editor as

        gvim -N -u NONE

in a cmd.exe window. You may or may not need to cd to the directory
containing gvim.exe beforehand.


Best regards,
Tony.
--
When I was younger, I could remember anything, whether it had happened
or not; but my faculties are decaying now and soon I shall be so I
cannot remember any but the things that never happened.  It is sad to
go to pieces like this but we all have to do it.
                -- Mark Twain

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Mansing

The instruction works! Seems I must set both "enc" and "fenc" correctly
~before~ loading an input file.

I used to set only the "enc" to match input file format before loading,
and then the "fenc" to match the desired output file format before
saving --everything looked fine until you reopen the output file!

Thank you!
mt 2008-03-13


Tony Mechelynck wrote:
> . . .
> Try the following after starting gvim afresh (lines starting with a
> double-quote are comments; you don't need to type them)
>
> . . .
>  

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

Mansing wrote:
> The instruction works! Seems I must set both "enc" and "fenc" correctly
> ~before~ loading an input file.
>
> I used to set only the "enc" to match input file format before loading,
> and then the "fenc" to match the desired output file format before
> saving --everything looked fine until you reopen the output file!
>
> Thank you!
> mt 2008-03-13

'enc' means how Vim represents the data in memory. 'fenc' means how the
data is represented on disk, it will usually be set automagically at
load-time depending on 'fencs' (plural) which defines the heuristics
used by Vim to determine which encoding the file is in. If you want to
edit one particular file whose 'fileencoding' (singular) cannot be
properly detected by the 'fileencodings' (plural), then you will have to
use ++enc=<something> in the ":edit" command itself, as shown under
":help ++opt".

You don't have to change 'enc' as long as it contains glyphs for all the
characters in the file. For instance, gvim with GTK2 GUI normally
represents its data in UTF-8 internally, and since that Unicode encoding
can represent anything, if your gvim (like mine) is for GTK2 (which
usually means X11, which usually means Unix-like) you never need to
change the 'enc' setting. Similarly, in any +multi_byte version of gvim,
if you've set 'enc' to UTF-8 at the start of your vimrc, you can leave
it so forever and never change it, because this way, any 'fileencoding'
will be "representable" in memory.

You may change the 'fenc' of a loaded file, if you want to _change_ its
disk representation. That's why whenever you do it, the file acquires
'modified' status. You will then usually want to save your file under a
different name (using ":saveas") so you'll have both versions (in both
encodings) on your disk under different names and/or in different folders.


Best regards,
Tony.
--
When more and more people are thrown out of work, unemployment
results.
                -- Calvin Coolidge

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

François Pinard

[Tony Mechelynck]

>[...] and since that Unicode encoding can represent anything [...]

This is a common misconception.  Unicode can represent many things, not
anything.  On one side, the W3C consortium has dispositions against
attributing, in the future, single code points where combination
characters would do, while keeping in Unicode what has already been
lobbied by richer countries.  That is, Unicode is meant to be easier to
use for some than for others.  Unicode is also set for supporting "main"
scripts, not necessarily all of them.  It means that poorer nations have
less chance to get their script well represented in Unicode, if at all.

--
François Pinard   http://pinard.progiciels-bpi.ca

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

how to avoid GVIM lose syntax highlighting when multi-buffer/ multi-tab editing?

za-2

Did you come upon this kind of trouble?

my GVIM lose syntax hightlighting oftenly.
especially when I switch to another tab or buffer
I checked the plugins i used for filetypes or global, no extras besides
native things there, or maybe I once put in and then forgot?

by the way, I always edit .html, php, .c, .java files in same time, this
seems to heavy the problem,  which is really boreing to type ":syntax on
" or even shorthand every minute. :(


 



--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: how to avoid GVIM lose syntax highlighting when multi-buffer/ multi-tab editing?

Nico Weber-3

> Did you come upon this kind of trouble?
>
> my GVIM lose syntax hightlighting oftenly.
> especially when I switch to another tab or buffer
> I checked the plugins i used for filetypes or global, no extras  
> besides
> native things there, or maybe I once put in and then forgot?
>
> by the way, I always edit .html, php, .c, .java files in same time,  
> this
> seems to heavy the problem,  which is really boreing to type  
> ":syntax on
> " or even shorthand every minute. :(

Which platform are you using? Does it also happen if you start vim  
with `gvim -u NONE -U NONE`? That prevents that your .vimrc file is  
loaded.

You can use `:verbose set syntax` to see where 'syntax' was modified  
the last time. Perhaps this gives you a hint.

HTH,
Nico

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: how to avoid GVIM lose syntax highlighting when multi-buffer/ multi-tab editing?

za-2

Thanks for the comments, I think this will be a long-term question to
find the answer.
the platform is Linux/ubuntu 7.10 desktop with ZH_CN.UTF-8 locale
The gvim is start from console without parameters.

the output of :verbose set syntax is:
  syntax=xhtml
        最近修改于 /usr/share/vim/vim71/syntax/syntax.vim  

when i lost syntax, i am editing HTML file contain javascript

在 2008-03-14五的 16:10 +0100,Nico Weber写道:

> > Did you come upon this kind of trouble?
> >
> > my GVIM lose syntax hightlighting oftenly.
> > especially when I switch to another tab or buffer
> > I checked the plugins i used for filetypes or global, no extras  
> > besides
> > native things there, or maybe I once put in and then forgot?
> >
> > by the way, I always edit .html, php, .c, .java files in same time,  
> > this
> > seems to heavy the problem,  which is really boreing to type  
> > ":syntax on
> > " or even shorthand every minute. :(
>
> Which platform are you using? Does it also happen if you start vim  
> with `gvim -u NONE -U NONE`? That prevents that your .vimrc file is  
> loaded.
>
> You can use `:verbose set syntax` to see where 'syntax' was modified  
> the last time. Perhaps this gives you a hint.
>
> HTH,
> Nico
>
> >
>



--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon
In reply to this post by François Pinard

This is Nakagawa, Japanese user on unix and Windows XP SP2.

( Windows Japanese version's native character set is cp932 )


I found same problem 30 April, another way.
( I did'n t read this ML usually )


I found.

Windows notepad can read no-bomb UTF-8 file, but if edited or created
file is saved with bomb.

By unicode consocium standard, UTF-8 file can added bomb.
but UTF-8 not need bomb.


Vim use libiconv by FSF.
I search on unix platform, `man iconv_open`, libiconv don't know
UTF-8 with bomb. Then, libiconv can't detect UTF-8 with bomb.
this is problem of libiconv, not Vim, by strict logic.


In this reason, In my vimrc set filencodings sorted earlier UTF-8 than cp932
but notepad saved file detected cp932.

On justice way, We have to patch to libiconv, add UTF-8 bomb.
But Can't save this weak point by Vim side?


--
   Nakagawa Tsuneo         mailto:[hidden email]
   Web site ( Jpanese only )  http://www.kikansha.jp/~yaemon/


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

On 02/05/08 03:29, T.P.S.Nakagawa wrote:

> This is Nakagawa, Japanese user on unix and Windows XP SP2.
>
> ( Windows Japanese version's native character set is cp932 )
>
>
> I found same problem 30 April, another way.
> ( I did'n t read this ML usually )
>
>
> I found.
>
> Windows notepad can read no-bomb UTF-8 file, but if edited or created
> file is saved with bomb.
>
> By unicode consocium standard, UTF-8 file can added bomb.
> but UTF-8 not need bomb.
>
>
> Vim use libiconv by FSF.
> I search on unix platform, `man iconv_open`, libiconv don't know
> UTF-8 with bomb. Then, libiconv can't detect UTF-8 with bomb.
> this is problem of libiconv, not Vim, by strict logic.
>
>
> In this reason, In my vimrc set filencodings sorted earlier UTF-8 than cp932
> but notepad saved file detected cp932.
>
> On justice way, We have to patch to libiconv, add UTF-8 bomb.
> But Can't save this weak point by Vim side?
>
>

The BOM is a valid codepoint, ZERO-WIDTH NO-BREAK SPACE; however its use
in that capacity is now deprecated (ZERO-WIDTH NON-JOINER is preferred
IIUC). Still, if any software accepts UTF-8 files "only" if there is no
BOM in them, that software cannot be called UTF-8-compliant.

I don't know which libiconv you are using, but my version of Vim (which
has +iconv and a GTK2/Gnome2 GUI) accepts the BOM with no problem, not
only in UTF-16/UTF-32 but also in UTF-8.


Best regards,
Tony.
--
The church is near but the road is icy; the bar is far away but I will
walk carefully.
                -- Russian Proverb

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon

Thanks for reading my broken english, and retry and reply.


         2008-05-02 11:36 (JST) , Tony Mechelynck sent follow message:
 >> Vim use libiconv by FSF.
 >> I search on unix platform, `man iconv_open`, libiconv don't know
 >> UTF-8 with bomb. Then, libiconv can't detect UTF-8 with bomb.
 >> this is problem of libiconv, not Vim, by strict logic.
 >>
 >>
 >> In this reason, In my vimrc set filencodings sorted earlier UTF-8 than cp932
 >> but notepad saved file detected cp932.
 >>
 >> On justice way, We have to patch to libiconv, add UTF-8 bomb.
 >> But Can't save this weak point by Vim side?
 >>
 >>
<>
 > I don't know which libiconv you are using, but my version of Vim (which
 > has +iconv and a GTK2/Gnome2 GUI) accepts the BOM with no problem, not
 > only in UTF-16/UTF-32 but also in UTF-8.
 >

Sorry , I did't try by gvim on Windows ( downloaded, version 7.1 / 2007 May 12
compiled by Mr. Bram ) + related by http://www.vim.org/download.php#pc
old version sourceforge libiconv for Win32
http://sourceforge.net/project/showfiles.php?group_id=25167&package_id=51458
( Release 1.9.1 / January 14 2004 )

on unix ( FreeBSD 5-stable ), with libiconv version 1.11 (+ OS patch 1 )
same (.|_)vimrc file, console vim can detect UTF-8 + BOM by notepad.
( vim version is self compile by cvs source 2007 July 31, he say version 7.1.147 )


Someone compile libiconv ver 1.11 or later for Win 32 platform?
( I tried 6 month ago, and I fail and busy )

--
    Nakagawa Tsuneo            mailto:[hidden email]
    Web site ( Jpanese only )  http://www.kikansha.jp/~yaemon/


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

On 02/05/08 10:17, T.P.S.Nakagawa wrote:
[...]
> Sorry , I did't try by gvim on Windows ( downloaded, version 7.1 / 2007 May 12
> compiled by Mr. Bram ) + related by http://www.vim.org/download.php#pc
> old version sourceforge libiconv for Win32
> http://sourceforge.net/project/showfiles.php?group_id=25167&package_id=51458
> ( Release 1.9.1 / January 14 2004 )

This is 7.1.000. I recommend the updated Vim and gvim compiled by Steve
Hall (currently 7.1.293), obtainable from
https://sourceforge.net/project/showfiles.php?group_id=43866&package_id=39721 
. It has +iconv/dyn, its full ":version" text can be seen by clicking on
the word "Notes" or the clipboard-like icon next to the version number
on that same page.

>
> on unix ( FreeBSD 5-stable ), with libiconv version 1.11 (+ OS patch 1 )
> same (.|_)vimrc file, console vim can detect UTF-8 + BOM by notepad.
> ( vim version is self compile by cvs source 2007 July 31, he say version 7.1.147 )
>
>
> Someone compile libiconv ver 1.11 or later for Win 32 platform?
> ( I tried 6 month ago, and I fail and busy )
>

Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
http://gnuwin32.sourceforge.net/packages/libiconv.htm . This is one
sub-sub-level above what you already have -- you might try using this
until or unless you succeed to get (or make) a later version. Since
Steve Hall's builds include +iconv/dyn and the "Compilation" and
"Linking" parts of its ":version" text mention no iconv version, I
suppose upgrading iconv (if and when) means simply dropping the DLL over
the former version (in the PATH or in $VIMRUNTIME). I expect, though,
that upgrading Vim (see above) will be more important than upgrading iconv.

Libiconv 1.12 is distributed in source form at
http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.12.tar.gz but I suppose
these sources are meant primarily for Unix -- I don't know what
adaptations (if any) might be necessary to build a Windows DLL from them.


Best regards,
Tony.
--
The human mind ordinarily operates at only ten percent of its capacity
-- the rest is overhead for the operating system.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon

Thank you Tony, and excuse me too late reply.


         2008-05-02 18:46 (JST) , Tony Mechelynck sent follow message:

 >> Someone compile libiconv ver 1.11 or later for Win 32 platform?
 >> ( I tried 6 month ago, and I fail and busy )
 >>
 >
 > Libiconv 1.9.2 is available precompiled from GnuWin32 from the page

O.K. I install all GnuWin32 and add path to it's bin.
I erase old iconv.dll, result of all test pattern is same to old iconv.dll.

# And Thank you webmaster of http://www.vim.org/ (is Mr. Bram? ) to link change.


 > This is 7.1.000. I recommend the updated Vim and gvim compiled by Steve
 > Hall (currently 7.1.293), obtainable from
 > https://sourceforge.net/project/showfiles.php?group_id=43866&package_id=39721

Next, I upgrade vim to 7.1.293.
result of all test pattern is same :-(


I think, in this time, return to need install new version of iconv.dll for
Windows.

My Win box is too poor.
If I try, need cross compile on Free BSD box. But I didn't try cross compile, too.


Best regards,
Nakagawa, a.k.a. yaemon

--
    Nakagawa Tsuneo            mailto:[hidden email]
    Web site ( Jpanese only )  http://www.kikansha.jp/~yaemon/

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

On 03/05/08 20:30, T.P.S.Nakagawa wrote:
> Thank you Tony, and excuse me too late reply.

No problem.

[...]
>   >  Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
>
> O.K. I install all GnuWin32 and add path to it's bin.
> I erase old iconv.dll, result of all test pattern is same to old iconv.dll.

Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
the PATH (which, on Windows, is a semicolon-separated list). How to set
that depends on your Windows version. IIRC, in XP it is at "Control
Panel -> System -> Advanced -> Environment Variables" or something similar.

>
> # And Thank you webmaster of http://www.vim.org/ (is Mr. Bram? ) to link change.
[...]

I think it is Bram, yes.

Best regards,
Tony.
--
"I am ready to meet my Maker.  Whether my Maker is prepared for the
great ordeal of meeting me is another matter."
                -- Winston Churchill

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon

Thank you Tony.


         2008-05-04 8:08 (JST) , Tony Mechelynck sent follow message:
 >>   >  Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
 >>
 >> O.K. I install all GnuWin32 and add path to it's bin.
 >> I erase old iconv.dll, result of all test pattern is same to old iconv.dll.
 >
 > Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
 > the PATH (which, on Windows, is a semicolon-separated list). How to set
 > that depends on your Windows version. IIRC, in XP it is at "Control
 > Panel -> System -> Advanced -> Environment Variables" or something similar.

Oh , Yes. ( My english is too poor, but not begenner of PC )

I see :fileencodings on gvim , and my _vimrc set

 > if has('iconv')
 >         set
fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,utf-16,cp932,java,ucs-2-internal,euc-jis0213,utf-16,ISO-8859-1
 > endif

if not path success, `:fileencodings?` will return another value, is'not it?


best regards,
Nakagawa, a.k.a yaemon


--
   NAKAGAWA Tsuneo                  mailto:[hidden email]
      Web site ( Japanese ony )     http://www.kikansha.jp/~yaemon/

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

Tony Mechelynck

On 04/05/08 03:06, T.P.S.Nakagawa wrote:

> Thank you Tony.
>
>
>           2008-05-04 8:08 (JST) , Tony Mechelynck sent follow message:
>   >>    >   Libiconv 1.9.2 is available precompiled from GnuWin32 from the page
>   >>
>   >>  O.K. I install all GnuWin32 and add path to it's bin.
>   >>  I erase old iconv.dll, result of all test pattern is same to old iconv.dll.
>   >
>   >  Make sure to have iconv.dll (and possibly all GnuWin32 executables) in
>   >  the PATH (which, on Windows, is a semicolon-separated list). How to set
>   >  that depends on your Windows version. IIRC, in XP it is at "Control
>   >  Panel ->  System ->  Advanced ->  Environment Variables" or something similar.
>
> Oh , Yes. ( My english is too poor, but not begenner of PC )
>
> I see :fileencodings on gvim , and my _vimrc set
>
>   >  if has('iconv')
>   >          set
> fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,utf-16,cp932,java,ucs-2-internal,euc-jis0213,utf-16,ISO-8859-1
>   >  endif
>
> if not path success, `:fileencodings?` will return another value, is'not it?
>
>
> best regards,
> Nakagawa, a.k.a yaemon
>
>

Yes, a Vim build compiled with +iconv/dyn will act as -iconv
(has("iconv") == 0) if it cannot establish contact with _any_ iconv.dll
so the other ":if" branch will be followed, and ":set fileencodings?"
(not ":fileencodings" which returns an error) will display a different
value.

That 'fileencodings' setting makes me wonder:
- Shouldn't it start with "ucs-bom" to detect those Unicode files which
have a BOM?
- Can "ascii" give a fail signal (doesn't Vim treat it as an alias for
"latin1")? -- If it can, then it's OK there, but otherwise not.
- What is "ucs-2-internal"? Won't it be detected as "utf-16" (which is a
superset of UCS-2) by the "utf-16" entry three steps earlier?
- Why is "utf-16" mentioned twice? (The second entry does no harm, but
will never be used.)
- Won't you ever receive UCS-2/UTF-16 files in little-endian ordering
(which is standard on Intel ix86 and therefore on Windows)?


Best regards,
Tony.
--
If everybody minded their own business, the world would go
around a deal faster.
                -- The Duchess, "Through the Looking Glass"

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon

Sorry every typo and send my dirty setting by historical reason.


         2008-05-04 11:24 (JST) , Tony Mechelynck sent follow message:
 > That 'fileencodings' setting makes me wonder:
 > - Shouldn't it start with "ucs-bom" to detect those Unicode files which
 > have a BOM?
 > - Can "ascii" give a fail signal (doesn't Vim treat it as an alias for
 > "latin1")? -- If it can, then it's OK there, but otherwise not.
 > - What is "ucs-2-internal"? Won't it be detected as "utf-16" (which is a
 > superset of UCS-2) by the "utf-16" entry three steps earlier?
 > - Why is "utf-16" mentioned twice? (The second entry does no harm, but
 > will never be used.)
 > - Won't you ever receive UCS-2/UTF-16 files in little-endian ordering
 > (which is standard on Intel ix86 and therefore on Windows)?

Thank you. I correct this sort.

 > set
fileencodings=ascii,iso-2022-jp,utf-8,euc-jp,java,utf-16,ucs-bomb,cp932,utf-16LE,euc-jis0213,ISO-8859-1


First is "ascii" , that's my intention.
( Is it really alias of latin1? latin1 have another code of ascii ).

If not exist this first, all ascii code ( ex. programming source ) detected
iso-2022-jp and Can't add multibyte comment by UTF-8.

I have a trick in this.


----- last of vimrc -------
if has( 'autocmd' )
     source $HOME/.vim/mine/filetype.vim        " about filetype
     source $HOME/.vim/mine/encode.vim          " about encoding to save
endif

------ cat encode.vim -----
"  $Id: encode.vim,v 1.6 2007/11/08 05:03:18 yaemon Exp $
"       set fileencode to save
"

:autocmd BufNewFile,BufRead *.jis set fileencoding=iso-2022-jp
:autocmd BufNewFile,BufRead *.sjis set fileencoding=shift-jis
:autocmd BufNewFile,BufRead *.euc-jp set fileencoding=euc-jp
:autocmd BufNewFile,BufRead *.mozex    set fileencoding=utf-8
:autocmd BufNewFile,BufRead *.elm    set fileencoding=utf-8
:autocmd BufNewFile,BufRead cddbread.* set fileencoding=utf-8

:autocmd BufNewFile,BufRead * call s:DefaultSaveCode()

function s:DefaultSaveCode()
        if ( ( &fileencoding == "" ) || ( &fileencoding == "ascii" ) )
                let &fileencoding = "utf-8"
        endif
endfunction
----------------------------------


Isn't is elegant for add UTF-8 to ascii file?



Best, Best, regard
yaemon

--
    NAKAGAWA Tsuneo (a.k.a. yaemon )   mailto:[hidden email]
    Web site ( Japanese ony )          http://www.kikansha.jp/~yaemon/

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply | Threaded
Open this post in threaded view
|

Re: Unicode conversion bug?

yaemon
In reply to this post by yaemon

Good morning ( it's 8:25 in Japan )


         2008-05-04 3:30 (JST) , I sent follow message:
 > I think, in this time, return to need install new version of iconv.dll for
 > Windows.
 >
 > My Win box is too poor.
 > If I try, need cross compile on Free BSD box. But I didn't try cross compile, too.

Today, I success compile iconv-1.12 for Windows.
by cross compile on unix box, mingw32.

If you try this, please get here
http://www.kikansha.jp/~yaemon/mingw/libiconv-1.12-mingw.zip


...but, I can't correctry open file edited and saved UTF-8 by notepad :(


Best regard

--
    NAKAGAWA Tsuneo (a.k.a. yaemon )   mailto:[hidden email]
    Web site ( Japanese ony )          http://www.kikansha.jp/~yaemon/

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_multibyte" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

12