Hi,
I developed a website with Vim, working both on linux and windows and never had any problems. The other day someone else needed to edit some files and tried to use Mac and Windows. Apparently in the files he edited there is this Byte-Order Mark. I discovered this only via the w3c validator that gave me this warning: "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported." The only way I could solve the problem was using notepad++ which has an option to explicitly save the file without the BOM. Is there a way to do the same thing in Vim? Maybe even to display this BOM? Thanks, Carlo -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
Around about 09/08/11 12:37, Carlo Trimarchi typed ...
> The only way I could solve the problem was using notepad++ which has > an option to explicitly save the file without the BOM. Is there a way > to do the same thing in Vim? Maybe even to display this BOM? :set bomb? Do ':set nobomb' before saving to remove a BOM. -- [neil@fnx ~]# rm -f .signature [neil@fnx ~]# ls -l .signature ls: .signature: No such file or directory [neil@fnx ~]# exit -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
In reply to this post by Carlo Trimarchi
On 09/08/11 13:37, Carlo Trimarchi wrote:
> Hi, > I developed a website with Vim, working both on linux and windows and > never had any problems. The other day someone else needed to edit some > files and tried to use Mac and Windows. Apparently in the files he > edited there is this Byte-Order Mark. I discovered this only via the > w3c validator that gave me this warning: > > "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark > (BOM) in UTF-8 encoded files is known to cause problems for some text > editors and older browsers. You may want to consider avoiding its use > until it is better supported." That message is outdated. The BOM is supported in all Unicode encodings including UTF-8 by all "reasonably recent" browers. It is also part of the HTML standard. Some text editors (such as Notepad, I think) choke on it, but the answer to that is to use a better editor, such as Vim or even WordPad, which know about the BOM and handle it correctly, even in UTF-8. For some other kinds of text files (most source files and shell scripts, for instance), it is better to save the file without a BOM, but for momst "web" formats including HTML, CSS, and, I think, XML, XHTML, etc., a BOM is no problem and can even be a help (e.g. in case the web server sets the charset incorrectly or not at all in its Content-Type header). > > The only way I could solve the problem was using notepad++ which has > an option to explicitly save the file without the BOM. Is there a way > to do the same thing in Vim? Maybe even to display this BOM? > > Thanks, > Carlo > To save the file without a BOM: :setlocal nobomb :w To ask Vim if there is a BOM: :setlocal bomb? The answer is bomb for "BOM present" or nobomb for "BOM absent". Note that regardless of the state of the 'bomb' option, a BOM can only exist if the 'fileencoding' is one of UTF-8, UTF-16 (or its UCS-2 subset) or UTF-16 (aka UCS-4), any of them (other than UTF-8 for which endianness is not relevant) in any endianness. For other 'fileencoding' values the 'bomb' option is irrelevant. To display the presence or absence of the BOM on the status line: see http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line Best regards, Tony. -- George Orwell was an optimist. -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
On Tue, August 9, 2011 5:13 pm, Tony Mechelynck wrote:
> To save the file without a BOM: > > :setlocal nobomb > :w :w ++bin should also work IIRC. regards, Christian -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
In reply to this post by Tony Mechelynck
On 9 August 2011 17:13, Tony Mechelynck <[hidden email]> wrote:
> That message is outdated. The BOM is supported in all Unicode encodings > including UTF-8 by all "reasonably recent" browers. It is also part of the > HTML standard. Well, with the BOM the whole layout of the website appeared broken in Internet Explorer 7. No problem with Firefox. Still it seems is not an issue to understimate. > For some other kinds of text files (most source files and shell scripts, for > instance), it is better to save the file without a BOM, but for momst "web" > formats including HTML, CSS, and, I think, XML, XHTML, etc., a BOM is no > problem and can even be a help (e.g. in case the web server sets the charset > incorrectly or not at all in its Content-Type header). It was a php file, so maybe that's problem. > To save the file without a BOM: > > :setlocal nobomb > :w > > To ask Vim if there is a BOM: > > :setlocal bomb? > > The answer is bomb for "BOM present" or nobomb for "BOM absent". > > > To display the presence or absence of the BOM on the status line: > > see > http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line Thanks for all the info and the commands. Very useful. -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
In reply to this post by Tony Mechelynck
On Aug 9, 10:13 am, Tony Mechelynck <[hidden email]> wrote: > On 09/08/11 13:37, Carlo Trimarchi wrote: > > > Hi, > > I developed a website with Vim, working both on linux and windows and > > never had any problems. The other day someone else needed to edit some > > files and tried to use Mac and Windows. Apparently in the files he > > edited there is this Byte-Order Mark. I discovered this only via the > > w3c validator that gave me this warning: > > > "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark > > (BOM) in UTF-8 encoded files is known to cause problems for some text > > editors and older browsers. You may want to consider avoiding its use > > until it is better supported." > > That message is outdated. The BOM is supported in all Unicode encodings > including UTF-8 by all "reasonably recent" browers. It is also part of > the HTML standard. Some text editors (such as Notepad, I think) choke on > it, but the answer to that is to use a better editor, such as Vim or > even WordPad, which know about the BOM and handle it correctly, even in > UTF-8. > Not true. W3C still explicitly recommends against using a BOM for UTF-8 (but I don't remember the link off-hand, sorry, I think it was either in the HTML4.01 or HTML5 spec somewhere). Even modern browsers like Firefox and Opera choke on a BOM in UTF-8 files for XHTML served as XML. Using a BOM for UTF-8 on the internet is a bad idea. A BOM is however recommended and useful on UTF-16 or UTF-32 and the like. -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
In reply to this post by Tony Mechelynck
On Tue, Aug 9, 2011 at 11:13 PM, Tony Mechelynck
<[hidden email]> wrote: > > That message is outdated. The BOM is supported in all Unicode encodings > including UTF-8 by all "reasonably recent" browers. It is also part of the > HTML standard. BOM is a standard for UCS2 or UTF-16, not for UTF-8. BOM for utf-8 will cause problem for most programs which expect text streams. gcc is a good example, most GNU CLI utilities will reject utf-8 with BOM. And, W3C validator will of course complain about it... -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
On 10/08/11 02:18, pansz wrote:
> On Tue, Aug 9, 2011 at 11:13 PM, Tony Mechelynck > <[hidden email]> wrote: >> >> That message is outdated. The BOM is supported in all Unicode encodings >> including UTF-8 by all "reasonably recent" browers. It is also part of the >> HTML standard. > > BOM is a standard for UCS2 or UTF-16, not for UTF-8. According to the Unicode FAQ, http://www.unicode.org/faq//utf_bom.html#bom4 (two successive FAQ questions) a BOM can be used in UTF-8 as well as in UTF-16 or UTF-32; but since UTF-8 doesn't have endianness variants, with UTF-8 it specifies encoding only, not endianness. BTW, "good" editors (including at least Vim and WordPad, possibly others) handle the BOM correctly, even in UTF-8. In fact, in my experience WordPad won't read UTF-8 text correctly _unless_ there is a BOM. However (about your next paragraph), when UTF-8 is fed "transparently" to a program which expects ASCII, and in particular to any program which expects #! at the start of a file, the BOM should not be used (see the 2nd FAQ question linked above, and also http://www.unicode.org/faq//utf_bom.html#bom10 "How I should deal with BOMs?", point 3. > > BOM for utf-8 will cause problem for most programs which expect text > streams. gcc is a good example, most GNU CLI utilities will reject > utf-8 with BOM. I explicitly mentioned in the part you snipped that for some other kinds of text than HTML or CSS (such as, I said, source files and shell scripts) it is better to save the file without a BOM. > > And, W3C validator will of course complain about it... > ...with a warning, not an error; and Tidy won't. Best regards, Tony. -- "My weight is perfect for my height -- which varies" -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
On Aug 10, 6:19 am, Tony Mechelynck <[hidden email]> wrote: > On 10/08/11 02:18, pansz wrote: > > > On Tue, Aug 9, 2011 at 11:13 PM, Tony Mechelynck > > <[hidden email]> wrote: > > >> That message is outdated. The BOM is supported in all Unicode encodings > >> including UTF-8 by all "reasonably recent" browers. It is also part of the > >> HTML standard. > > > BOM is a standard for UCS2 or UTF-16, not for UTF-8. > > According to the Unicode FAQ,http://www.unicode.org/faq//utf_bom.html#bom4(two successive FAQ > questions) a BOM can be used in UTF-8 as well as in UTF-16 or UTF-32; > but since UTF-8 doesn't have endianness variants, with UTF-8 it > specifies encoding only, not endianness. BTW, "good" editors (including > at least Vim and WordPad, possibly others) handle the BOM correctly, > even in UTF-8. In fact, in my experience WordPad won't read UTF-8 text > correctly _unless_ there is a BOM. > > However (about your next paragraph), when UTF-8 is fed "transparently" > to a program which expects ASCII, and in particular to any program which > expects #! at the start of a file, the BOM should not be used (see the > 2nd FAQ question linked above, and alsohttp://www.unicode.org/faq//utf_bom.html#bom10"How I should deal with > BOMs?", point 3. > > > > > BOM for utf-8 will cause problem for most programs which expect text > > streams. gcc is a good example, most GNU CLI utilities will reject > > utf-8 with BOM. > > I explicitly mentioned in the part you snipped that for some other kinds > of text than HTML or CSS (such as, I said, source files and shell > scripts) it is better to save the file without a BOM. > > > > > And, W3C validator will of course complain about it... > > ...with a warning, not an error; and Tidy won't. > W3C specifically recommends you do NOT use a BOM for UTF-8 on HTML/ XHTML/CSS documents. See http://www.w3.org/International/questions/qa-byte-order-mark#bomhow While developing TOhtml, I ran into problems in some browsers when using UTF-8 with BOM. If I remember correctly, browsers which actually handle XHTML correctly, like Opera and Firefox, were interpreting the BOM as characters appearing before the XML prolog <?xml..., which makes the XML be not well-formed and therefore (somewhat correctly) the browser bailed without rendering anything. Re-parsing the document as HTML of course may allow these browsers to render the document correctly, but according to the W3C link above, some user agents will still have problems and attempt to reder characters instead of treating it as an invisible BOM. For this reason, syntax/2html contains (after opening the buffer for the generated file): " According to http://www.w3.org/TR/html4/charset.html#doc-char-set, the byte " order mark is highly recommend on the web when using multibyte encodings. But, " it is not a good idea to include it on UTF-8 files. Otherwise, let Vim " determine when it is actually inserted. if s:settings.vim_encoding == 'utf-8' setlocal nobomb else setlocal bomb endif -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
May I add some observation to this discution?
The better way to use BOM is when you know your target. I work in a MacBook which has UTF-8 as default. When I'm working with Objective-C that will be compiled using LLVM there is no problem using BOM (which is a good thing since the encoding can be easily recognized). But when I'm working with Java, doing something for the Android platform, I use ISO-8859-1 because the Google guys had defined the 'encoding' argument of the 'javac' compiler as 'ASCII' in an ANT XML somewhere. I known, also, that PHP doesn't handle BOM well. So I decided to work with PHP also in ISO-8859-1. But, my e-mails are all HTML formated using UTF-8 with BOM (edited on VIM), always seen in Firefox, Safari or Chrome with no problems. I believe that the problem with major browsers is in respect with user configuration. You can left the browser discover the character set of a page or configure it to use one based in the assumption that you are in an occidental country (or another part of the world). This causes no problems if you don't open pages from another countries. In the current days, is preferable if you let the browser handle the encoding it self. Regards. -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
On 11/08/11 00:03, Alessandro Antonello wrote:
> May I add some observation to this discution? > > The better way to use BOM is when you know your target. I work in a MacBook > which has UTF-8 as default. When I'm working with Objective-C that will be > compiled using LLVM there is no problem using BOM (which is a good thing since > the encoding can be easily recognized). But when I'm working with Java, doing > something for the Android platform, I use ISO-8859-1 because the Google guys > had defined the 'encoding' argument of the 'javac' compiler as 'ASCII' in an > ANT XML somewhere. > > I known, also, that PHP doesn't handle BOM well. So I decided to work with PHP > also in ISO-8859-1. But, my e-mails are all HTML formated using UTF-8 with BOM > (edited on VIM), always seen in Firefox, Safari or Chrome with no problems. > > I believe that the problem with major browsers is in respect with user > configuration. You can left the browser discover the character set of a page > or configure it to use one based in the assumption that you are in an > occidental country (or another part of the world). This causes no problems if > you don't open pages from another countries. In the current days, is > preferable if you let the browser handle the encoding it self. > > Regards. > Yeah, the idea is to know what your file will be used with. Recently I discovered that when feeding a local *.txt file to SeaMonkey (or, I suppose, Firefox), it will try to read it as Latin1 unless there is a BOM. I'm not sure if that depends on my Appearance preferences. Of course, for a *.txt on my local disk there is no metadata (no HTTP headers etc.) to tell the MIME type and the encoding to the browser. For the MIME type, *.txt means text/plain but it could be any charset. This means that when I want to display (and possibly print) multilingual text (let's say, who knows? maybe a *.txt file in French with some Russian and some Hebrew in it), something Gecko (the display engine used by Firefox, Thunderbird and SeaMonkey) does better than gvim, I'll have to record it with a BOM. OTOH any file starting with #! MUST, as has already been said, be recorded with no BOM because the shebang is only looked for in the first two bytes of the file (which would be part of the BOM if there were one). Best regards, Tony. -- hundred-and-one symptoms of being an internet addict: 156. You forget your friend's name but not her e-mail address. -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php |
Free forum by Nabble | Edit this page |