Printing with utf-8 characters on Windows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Printing with utf-8 characters on Windows

Mike Williams
On 06/01/2010 18:38, Chris Jones wrote:

> On Wed, Jan 06, 2010 at 06:20:42AM EST, Mike Williams wrote:
>> Hi,
>>
>> I wrote the original PS driver for VIM, several years ago now.  This is
>> somewhat OT from the OP as it is not Windows related.
>
> I think he wrote somewhere that he prints from Windows because his
> printer is better supported.
>
>> If you are not  interested, stop reading now.
>
> I'm not sure who wouldn't be. As far as I'm concerned, you are salvaging
> the thread from guesswork and speculations, thank goodness for that.
>
>> The PS driver relies on fonts being present in the printer.  The only
>> ones guaranteed to be there are the base 35 western fonts (Courier,
>> Times, etc).  However, far east printers will have a few multi-byte
>> fonts to support CJK printing, for which the printmbcharset et al
>> options and handling of multi-byte encodings was added.  It is
>> possible to install additional multi-byte fonts on the printer which
>> could also be used
>
> I have an old HP LaserJet 2100 that's still running on the original
> cartridge. Do you mean that if I wanted to be able to use :hardcopy to
> successfully print any character from the Unicode BMP, I would be able
> to do so after installing a universal font such as GNU/Unifont on the
> printer?
Sorry, that will have to be a "that depends".  The font has to be in a
format that that era of PS understands.  AFAICR Level 2 PS did not
support Unicode encoding with PS fonts.  They can support mult-byte
encoded text, which means that the text to be printed and your Unicode
font would need translating to a form that the printer can use.  As I
said, it is complicated.

>> Technically PostScript is text encoding agnostic - it just deals with
>> sequences of byte values.  The selected font defines how to interpret
>> the byte sequence, as single bytes or a multi-byte encoding of some
>> kind.
>
> So, in a UTF-8 context and with multi-byte characters, I'm still unclear
> as to why I can use paps to create a .ps file that will print correctly
> on my printer, and unable to use Vim's :hardcopy command to do the same
> thing.

I have had a quick look at paps.  It is based on top of Pango, which is
a large piece of software to handle layout and rendering of Unicode
text.  AFAICS paps interprets the pango output to draw each character as
a filled path.  Not the quickest and most efficient method, and the
output will be poor at smaller font sizes - but it does work.  This
removes the need for PS fonts altogether

> Why can't the :hardcopy command perform the same magic?

Writing a Unicode layout print engine is not trivial.  paps leverages a
lot of the work done by the Pango developers that would need to be
written from scratch.  Plus the normal aim of VIM has been to be
platform independent - using Pango for Unicode printing would prevent
multi-byte printing in environments that don't support Pango.

In general this level of complexity is usually supported by some level
of host OS service.  This means that multi-byte printing becomes
platform and OS dependent - for example, on a box without X11/gtk2
multi-byte printing would not be supported.  With sufficient work
implementing what is needed in VIM it could be, but I don't know if that
is what Bram wants.

>> A lot depends on the characters being used.  If you are using UTF-8
>> encoding for text that exists in a single ISO-8859 character set then
>> you can just set printencoding and VIM should translate the UTF-8
>> encoded text to single bytes for printing.  If you are using
>> characters from multiple ISO-8859 character sets then things start to
>> get complicated.
>
>> If you are just using ISO-8859 characters then it would be possible
>> (but  not currently implemented) to support many such character sets
>> when  printing with a single font.
>
>> If you are using true multiple-byte characters (i.e. ones not present
>> in  any of the ISO-8859 or cp character sets) then you will need to
>> use a  multi-byte font and the big issue is with handling them - their
>> discovery on the host system, metrics calculation for text layout,
>> selection of a sub-set of the contents (multi-byte fonts tend to be
>> large - do you want to generate a 12MB PS file to print<1K of text?),
>> and embedding in the generated PS.
>
> Yes, GNU/unifont, at least the file on my HDD is 16MB and it would
> hardly make sense to download it to the printer with each an every print
> job. But that would not be necessary if the font resided on the printer.
Assuming there was space and it could be used as a PS font, then yes.
things can get tricky if anyone wants to use commercial fonts since you
cannot copy them around all over the place.  This also makes sharing PS
files hard - embedding fonts (or a subset containing just the characters
used) in the generated file is usually the best way to do things.

> In any event, the size of the .ps file created by paps from an one-line
> Vim buffer containing 'Bột bột' and nothing else is only 7.2. I looked
> at a 16K UTF8-encoded text file containing multi-byte  characters and
> the resulting .ps file that paps created was 329K.
>
> So, I definitely missing something [some things] :-)

As noted above I believe paps generates a lot of PS commands to draw the
outline of each character which is then filled.  This can result in very
large PS files for large amounts of text, slower printing since it
doesn't take advantage of the PS font cache, and the output can be poor
at smaller font sizes.  It may even have memory issues on larger paper
sizes.

>> Not a trivial problem to solve at the time.  When discussed with Bram
>> it  was decided this was not wanted.  Dunno if time has changed the
>> argument  at all.
>
> Maybe these aspects should be clarified under :h postscript-printing
> under limitations:multi-byte support.
>
> Sorry if I'm asking the wrong questions, I don't know Postscript and I
> have no experience with printers.

No problem, it is a bit of a specialist issue, and it is my day job. ;-)

>> TTFN
>>
>> Mike
>> --
>> yip yip yip yip yap yap yip *BANG* - NO TERRIER
>
> That can't have been a *BULL*Terrier, then.. ;-)
>
> CJ
>
Mike
--
Education is what you get from reading the small print; experience is
what you get from not reading it.


--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Printing with utf-8 characters on Windows

Mike Williams
In reply to this post by bill lam
On 07/01/2010 10:24, bill lam wrote:

> mer, 06 Jan 2010, Mike Williams skribis:
>> If you are using true multiple-byte characters (i.e. ones not present
>> in any of the ISO-8859 or cp character sets) then you will need to
>> use a multi-byte font and the big issue is with handling them - their
>> discovery on the host system, metrics calculation for text layout,
>> selection of a sub-set of the contents (multi-byte fonts tend to be
>> large - do you want to generate a 12MB PS file to print<1K of
>> text?), and embedding in the generated PS.
>>
>> Not a trivial problem to solve at the time.  When discussed with Bram
>> it was decided this was not wanted.  Dunno if time has changed the
>> argument at all.
>
> While I don't know how to print in PS or gtk, however, from my
> experience in using gdi api to print unicode CJK in window, I don't
> think it is all that difficult to print CJK character.
Assuming you are talking gdi as in windows, then no it isn't.  I believe
it just needs an appropriate call to re-encode the character for the
encoding being used for printing.  It just hasn't been a big enough itch
for any VIM developer.

Mike
--
Education is what you get from reading the small print; experience is
what you get from not reading it.


--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Vim's :hardcopy command - utf-8 printing limitations.

Chris Jones-44
In reply to this post by Mike Williams
On Thu, Jan 07, 2010 at 09:11:43AM EST, Mike Williams wrote:

[..]

Thank yoy very much for taking the time to explain what the problem is
and why this is not a simple issue.

Since afaict this has nothing to do with Windows & to make the thread
searchable, I changed the title to something more relevant.

Thanks,

CJ

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
Reply | Threaded
Open this post in threaded view
|

Re: Printing with utf-8 characters on Windows

Tony Mechelynck
In reply to this post by Đức Minh Thái
On 20/12/09 17:36, Đức Minh Thái wrote:

> Hello,
> I cannot get utf-8 characters printed correctly. For example:
> bột
> becomes
> bá»™t
> My printing options are:
> set printfont=LMMono10:h10 " This is the LMMono from LaTeX Latin Modern
> set printoptions=number:y
> set printencoding=ucs-2le bomb
> Please help. Thank you!
>
> --
> You received this message from the "vim_use" maillist.
> For more information, visit http://www.vim.org/maillist.php

After reading about half of this thread, I have the following remarks:

- I haven't succeeded to print "full-Unicode" text with :hardcopy. When
I have a file with some exotic characters in it (Hebrew, maybe, or
Chinese, embedded in French text), I write it to disk as a *.txt file
(in UTF-8 with BOM), then print it in my browser.

- IIUC, valid 'printencoding' values are those for which there is a
PostScript conversion file in $VIMRUNTIME/print/ -- anything else is
treated as Latin1, including UTF-8, UTF-16, UTF-16le, UTF-32 and UTF-32le.

- Most gvim versions for Windows are built with +printer but
-postscript. In that case, according to its help, the 'printencoding'
option is not supported.


Best regards,
Tony.
--
GALAHAD:   Camelot ...
LAUNCELOT: Camelot ...
GAWAIN:    It's only a model.
                  "Monty Python and the Holy Grail" PYTHON (MONTY)
PICTURES LTD

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
12