Bram wrote:
> > > This in fact changed between C standards. Thus it's impossible to > > > write code that works with every compiler/library. Same applies > > > to the backslash, which is an even bigger problem. We ran into > > > this problem with message translation. Fortunately, there it > > > could be solved by telling the message programs to use the old > > > style strings. > > > > Or change the program appropriately. Some time ago I changed > > something like this > > > > if (str[len - 1] == '\\') > > > > to > > > > if (_tcsrchr(str, '\\') == str + len - 1) > > > > But _tcsrchr is not a cross-platform solution, I suppose. Also it > > is less efficient. :-( > > No, you cannot solve it that way. The C89 compiler will recognize the > backslash in a trail byte as a backslash, the C99 compiler doesn't, it > sees one character (depending on the locale). Thus for C89 you need > to double the backslash, for C99 you don't. This is a clear > incompatibility that is impossible to solve without #ifdefs. > > I would say this is a bug in the C99 standard, since you can break a > program by compiling it in another locale. The only way out seems to > be using utf-8, which is locale-neutral. If I understand you correctly, you meant that a MBCS character can be wrongly recognized in C89 compilers and/or another locale. I have not read the C89 standard, but MSVC, as a C89 compiler, obviously behaves more like what you attributes to C99 compilers. I never need to "double the backslash" in MSVC when I was writing native strings. In fact, it is very difficult to do that (because I never know the second byte of a character without using a hex editor or writing a program). However, I see that GCC can have problems on this. I do not see it a bug of the standard that a program can break when compiled in another locale. Javac has an -encoding exactly to specify the source encoding. Of course, one *should* not directly specify localized string in C source files in non-ASCII-clear ways. I am afraid UTF-8 strings (if not encoded in ASCII-clear way) can have problems too, because they are not valid Chinese strings as seen by MSVC, when compiled on Chinese Windows. I have one experience having to reboot in an English locale in order to build GSView32 for Windows. > > So setlocale(LC_CTYPE, "C") is safe? Glad to know that. > > I didn't say it was safe, just that we won't have problems with those > functions. It's hard to know for sure it's safe for all functions. I > could introduce this and wait for things to go wrong... > > Actually, I would expect X libraries to break when the locale isn't > set properly. Perhaps we should do it for Win32 only. What impacts it could have? I would love to see a non-Win32-only solution (but I would love more to see it in mainstream :-)). Best regards, Yongwei |
Yongwei wrote: > > No, you cannot solve it that way. The C89 compiler will recognize the > > backslash in a trail byte as a backslash, the C99 compiler doesn't, it > > sees one character (depending on the locale). Thus for C89 you need > > to double the backslash, for C99 you don't. This is a clear > > incompatibility that is impossible to solve without #ifdefs. > > > > I would say this is a bug in the C99 standard, since you can break a > > program by compiling it in another locale. The only way out seems to > > be using utf-8, which is locale-neutral. > > If I understand you correctly, you meant that a MBCS character can be > wrongly recognized in C89 compilers and/or another locale. I have not > read the C89 standard, but MSVC, as a C89 compiler, obviously behaves > more like what you attributes to C99 compilers. I never need to "double > the backslash" in MSVC when I was writing native strings. In fact, it > is very difficult to do that (because I never know the second byte of a > character without using a hex editor or writing a program). However, I > see that GCC can have problems on this. > > I do not see it a bug of the standard that a program can break when > compiled in another locale. Javac has an -encoding exactly to specify > the source encoding. Of course, one *should* not directly specify > localized string in C source files in non-ASCII-clear ways. I am afraid > UTF-8 strings (if not encoded in ASCII-clear way) can have problems too, > because they are not valid Chinese strings as seen by MSVC, when > compiled on Chinese Windows. I have one experience having to reboot in > an English locale in order to build GSView32 for Windows. In my opinion the right solution would have been to introduce a new string type that depends on the locale, instead of changing the meaning of the string that everybody uses and breaking old programs. Something like L"Chinese-chars". Perhaps Microsoft pushed the standard for the solution that they were already using (commercial interest often overrules good choices). > > > So setlocale(LC_CTYPE, "C") is safe? Glad to know that. > > > > I didn't say it was safe, just that we won't have problems with those > > functions. It's hard to know for sure it's safe for all functions. I > > could introduce this and wait for things to go wrong... > > > > Actually, I would expect X libraries to break when the locale isn't > > set properly. Perhaps we should do it for Win32 only. > > What impacts it could have? I would love to see a non-Win32-only > solution (but I would love more to see it in mainstream :-)). The problem is that libraries are different on every system. It's nearly impossible to predict what breaks if you change something global to the whole program, which is what the locale is. We might end up putting lots of library functions inside Vim (such as the snprintf() that I now added). -- Never under any circumstances take a sleeping pill and a laxative on the same night. /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net \\\ /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html /// |
In reply to this post by adah
Bram wrote:
> Yongwei wrote: > > > If I understand you correctly, you meant that a MBCS character can > > be wrongly recognized in C89 compilers and/or another locale. I > > have not read the C89 standard, but MSVC, as a C89 compiler, > > obviously behaves more like what you attributes to C99 compilers. I > > never need to "double the backslash" in MSVC when I was writing > > native strings. In fact, it is very difficult to do that (because I > > never know the second byte of a character without using a hex editor > > or writing a program). However, I see that GCC can have problems on > > this. > > > > I do not see it a bug of the standard that a program can break when > > compiled in another locale. Javac has an -encoding exactly to > > specify the source encoding. Of course, one *should* not directly > > specify localized string in C source files in non-ASCII-clear ways. > > I am afraid UTF-8 strings (if not encoded in ASCII-clear way) can > > have problems too, because they are not valid Chinese strings as > > seen by MSVC, when compiled on Chinese Windows. I have one > > experience having to reboot in an English locale in order to build > > GSView32 for Windows. > > In my opinion the right solution would have been to introduce a new > string type that depends on the locale, instead of changing the > meaning of the string that everybody uses and breaking old programs. > Something like L"Chinese-chars". Perhaps Microsoft pushed the > standard for the solution that they were already using (commercial > interest often overrules good choices). A little off-topic. But I do not agree with you on this. Few existing program will suffer from the "C99 change", if any. I really cannot imagine people inserting backslashes among localized strings. On the other hand, there are already many existing internationalized programs using Microsoft's solution, possibly developed on a Far East version of Windows. It is not only in Microsoft's interest to make them legal. I think the only thing wanting is a Java-like -encoding option on the side of the compilers. Best regards, Yongwei |
Yongwei wrote: > > > If I understand you correctly, you meant that a MBCS character can > > > be wrongly recognized in C89 compilers and/or another locale. I > > > have not read the C89 standard, but MSVC, as a C89 compiler, > > > obviously behaves more like what you attributes to C99 compilers. I > > > never need to "double the backslash" in MSVC when I was writing > > > native strings. In fact, it is very difficult to do that (because I > > > never know the second byte of a character without using a hex editor > > > or writing a program). However, I see that GCC can have problems on > > > this. > > > > > > I do not see it a bug of the standard that a program can break when > > > compiled in another locale. Javac has an -encoding exactly to > > > specify the source encoding. Of course, one *should* not directly > > > specify localized string in C source files in non-ASCII-clear ways. > > > I am afraid UTF-8 strings (if not encoded in ASCII-clear way) can > > > have problems too, because they are not valid Chinese strings as > > > seen by MSVC, when compiled on Chinese Windows. I have one > > > experience having to reboot in an English locale in order to build > > > GSView32 for Windows. > > > > In my opinion the right solution would have been to introduce a new > > string type that depends on the locale, instead of changing the > > meaning of the string that everybody uses and breaking old programs. > > Something like L"Chinese-chars". Perhaps Microsoft pushed the > > standard for the solution that they were already using (commercial > > interest often overrules good choices). > > A little off-topic. But I do not agree with you on this. Few existing > program will suffer from the "C99 change", if any. I really cannot > imagine people inserting backslashes among localized strings. It did break the message translations for Vim. Fortunately the people making the GNU tools were willing to profide a fix, otherwise we would have to make a clumsy solution. > On the other hand, there are already many existing internationalized > programs using Microsoft's solution, possibly developed on a Far East > version of Windows. It is not only in Microsoft's interest to make > them legal. I think the only thing wanting is a Java-like -encoding > option on the side of the compilers. There is always a good reason to make a bad choice. -- Back off man, I'm a scientist. -- Peter, Ghostbusters /// Bram Moolenaar -- [hidden email] -- http://www.Moolenaar.net \\\ /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html /// |
Free forum by Nabble | Edit this page |