Hi,
The current version of vim doesn't handle non-utf8 multibyte encodings such as EUC and/or GBK in FreeBSD. Cursor moves around weird places inside a character and the last character on each lines disappears sometimes. This problem is due to vim's dependency to undefined behavior of mblen(3). Looking vim's source code mbyte.c:653, the routine assumes that mblen(3) isn't stateful. On glibc or Solaris libc, mblen(3) does not change the internal state when EILSEQ or EINVAL is occurred. But FreeBSD libc changes the internal state even when it meets an error. The mblen(3) behavior is undefined in POSIX [1] and none of each libc implementations are wrong. So I think it's required to reset multibyte states before a mblen(3) call to work the routine free from implementation. My patch is attached. [1] http://www.opengroup.org/onlinepubs/009695399/functions/mblen.html Hye-Shik |
Hye-Shik Chang wrote: > The current version of vim doesn't handle non-utf8 multibyte encodings > such as EUC and/or GBK in FreeBSD. Cursor moves around weird places > inside a character and the last character on each lines disappears > sometimes. > > This problem is due to vim's dependency to undefined behavior of > mblen(3). Looking vim's source code mbyte.c:653, the routine assumes > that mblen(3) isn't stateful. On glibc or Solaris libc, mblen(3) > does not change the internal state when EILSEQ or EINVAL is occurred. > But FreeBSD libc changes the internal state even when it meets an > error. The mblen(3) behavior is undefined in POSIX [1] and none > of each libc implementations are wrong. So I think it's required > to reset multibyte states before a mblen(3) call to work the routine > free from implementation. > > My patch is attached. > > [1] http://www.opengroup.org/onlinepubs/009695399/functions/mblen.html > --- mbyte.c.orig Fri Apr 23 17:44:36 2004 > +++ mbyte.c Thu May 12 08:48:35 2005 > @@ -650,6 +650,7 @@ > * where mblen() returns 0 for invalid character. > * Therefore, following condition includes 0. > */ > + (void)mblen(NULL, 0); > if (mblen(buf, (size_t)1) <= 0) > n = 2; > else The behavior of mblen() on various systems has always been a bit unclear to me. Your remark makes a lot of sense, but I wonder why nobody had this problem before. I'll include this now in Vim 7 and await further comments. Hopefully there is no mblen() implementation that crashes when invoked with a NULL pointer. -- CUSTOMER: Well, can you hang around a couple of minutes? He won't be long. MORTICIAN: Naaah, I got to go on to Robinson's -- they've lost nine today. CUSTOMER: Well, when is your next round? MORTICIAN: Thursday. DEAD PERSON: I think I'll go for a walk. The Quest for the Holy Grail (Monty Python) /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net \\\ /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html /// |
On Sat, Jul 16, 2005 at 12:44:34PM +0200, Bram Moolenaar wrote:
> > Hye-Shik Chang wrote: > > > The current version of vim doesn't handle non-utf8 multibyte encodings > > such as EUC and/or GBK in FreeBSD. Cursor moves around weird places > > inside a character and the last character on each lines disappears > > sometimes. [snip] > > The behavior of mblen() on various systems has always been a bit unclear > to me. Your remark makes a lot of sense, but I wonder why nobody had > this problem before. > In fact, many of Japanese FreeBSD users seems to have been suffered from the problem: http://www.queen.ne.jp/iMA/showmdir.pl?ports-jp=Current&num=14694&link=20040430015955%2eGA52106%25st%40be%2eto (even if you can't read japanese, you still can discover some alphabets on the page. :) I didn't aware of the problem because I'm using UTF-8 locale, but few friends of mine asked a help to me. > I'll include this now in Vim 7 and await further comments. Hopefully > there is no mblen() implementation that crashes when invoked with a NULL > pointer. Thanks for applying the fix! I think the fix will not harm any platform. mblen(NULL, 0); is clearly defined in POSIX as a reset method. Thanks, Hye-Shik |
Free forum by Nabble | Edit this page |