This is nice, I learned a couple of new tricks. The attr one is great.
For many years now, and alongside my "regular" CV, I've had a Markdown to PDF script that uses @page and @media in a small HTML template and essentially creates custom 1-pagers in US format. Instant recruiter turnaround.
In between this and the OP's table splitting trick, this was a good 15m spent on HN.
While they're not valid, I believe the browser will still respect non standard headings. The browser will treat them like non semantic elements (similar to a div), but you can kind of make it legit with a couple different aria attributes. I would recommend staying within the standard heading sizes though.
This is useful, but note that with `table {break-inside: avoid;}`, if there are tables longer than the page, the browser will add a break before it, and that could be not desirable.
- Similar to headings, usually it is not desirable to produce a break after the summary element (for details element). If the details elements are never longer than the page, `details {break-inside: avoid;}`can be useful.
I just laid out a book using Pagedjs.org. Although there were a few bugs in the preview experience, the final output was perfect, and probably took me a quarter the time it would have in InDesign (which I’m fairly competent in).
The real power is in being able to hook up all the templating, CMS APIs, and whatever else you want into your content pipeline. It’s Just HyperText™.
I’d go so far to say that if you’re equally comfortable in HTML/CSS and InDesign, Pagedjs is a superior choice for long form layout.
I've been test-driving the web pdf build tool for Asciidoc, asciidoctor-web-pdf[1], for a few years, which uses Paged.js as the template engine before CSS PMM has its go. I like it - I like it a LOT[2] - but Puppeteer-Chrome bugs breaks the build on the regular, or requires a rework of templates. So the web-pdf team started just releasing docker images that include a tested Chromium version (among other things), so as to keep that from being such a PITA. Which is fine. Howaaaayyyyyyyver . . that shines a spotlight on a problem with this workflow[3]: the dependency on browser rendering kit.
[2] Asciidoc has four major PDF pipelines in some form of maintenance. The vanilla, asciidoctor-pdf, is a Ruby-based Prawn PDF generator, and it's good but limited in terms of layout - you end up having to extend the core Asciidoctor processor for a lot of tricks like LoT/LoF. Asciidoctor-pdf, and I really want to emphasize this, is the official PDF pipeline for adoc files. The older pipeline, FOPUB, is based on DocBook, which Asciidoc has equivalency with. But DocBook means XSL, and XSL means cheating on Russian roulette because just one out of six chance of ENDING THE PAIN is not enough. The odd man out is DBLATEX, which goes Asciidoc->DocBook->LaTeX-> PDF, and . . whew, ok man, you're jumping three markup languages, dude. Other than that, LaTeX is pretty much the gold standard for layout markup. Finally, we got this thing, asciidoctor-web-pdf, which honestly is more or less dead in terms of activity, which sucks. Web-pdf gives me the flexibility of DocBook-XSL (and then some!) but with CSS and, if I need it, JS. Unfortunately, LoT/LoF still needs Asciidoctor extensions . . but you get a lot of toys that aren't possible with asciidoctor-pdf/prawn.
I wonder how the finer issues of typesetting are handled with print CSS compared to dedicated typesetting software nowadays.
CSS is very compelling, but then based on my personal experience as print media designer the likes of InDesign are full of various special tricks (and language-specific, too) for alleviating white space corridors, handling hanging punctuation, hyphenation, etc. that the enterprise-level software tends to accumulate (due to the business model if nothing else).
It is limited and depends on the browser used to print the document. The most glaring missing feature is CMYK colour.
I’ve just set up our publications to use PagedJS, and with a fair amount of hacking I was able to set up a baseline grid. Hyphenation depends on the browser implementation, which is OK for English in recent Chromium. The new text-wrap property has also been helpful. Hanging punctuation is only supported in Safari if I remember correctly.
But being able to produce nice looking PDFs from a markdown document, with automated endnotes and table of contents, is a much nicer process than going through InDesign, and I also say this as someone equally comfortable with both. And the result is good enough for our purposes.
I agree that InDesign generally is clunky (the same reason it has a lot of typesetting black magic: business model where you make money by satisfying requests of perfectionist designers and their managers). However, in the past, while I was still doing relevant work, I had mild success abstracting some of it out by defining a template and populating it with data imported from JSON.
Now I’m not using ID much, and struggling with generating PDFs from HTML, so I really appreciated the example you gave. Great work!
> but then based on my personal experience as print media designer the likes of InDesign are full of various special tricks (and language-specific, too)
That seems to concur with an informative article by AtaDistance from 2019 that I read the other day [0]:
> ...
> The result was InDesign 1.0 J which shipped in early 2001. InDesign J was the first, and only, major software application developed outside of Japan that followed the Japanese Industrial Standard (JIS) X4051 typesetting and composition specification (the kumihan “bible”) and traditional Japanese print production methods.
> I have covered some basics of Japanese layout before, but a review is helpful for first time readers. I’ll use a mix of my material and McCully’s presentation to explain.
> ...
> Western created DTP layout is graphics-driven and calculated by margins and font baselines. The western baseline typography model and font metrics is how PostScript and OpenType fonts, and all layout engines evolved. Adobe was well acquainted with the shortcomings of their own font technology and InDesign J got around the problems by adding proprietary Kanji virtual body font metrics and Japanese line break algorithms. None of this exists as an open standard that benefits everybody.
> That is fine for InDesign and print production, but web layout and typography via CSS is an entirely different world. There are 3 huge obstacles for good vertical Japanese typography on the web:
> * No font metrics for virtual body/em-box glyph space placement: everything has to be accomplished with baseline metrics
This brings me back to when I did a gig typesetting a Japanese language book (short, maybe couple dozens of pages) once. I don’t even speak Japanese all that well, the guy basically looked for someone with IDJ (which I had coincidentally) and basic skill to operate it. The software handles most of the magic. As a designer, I won’t pretend I didn’t feel a little used.
The lack of manual kerning to avoid rivers of white space is something that I wouldn't even begin to know how to tackle in CSS. Maybe I'm just too muscle memory familiar with the tools in DTP vs CSS for Web, that I'm just not familiar with the proper CSS for it. The wysiwyg of DTP apps is like pure training wheels or a warm fuzzy blanket in getting work done
A large amount of and spans with letter-spacing or font-kerning can help, but no one’s going to do that (unless you go for an extremely fixed layout or engage JavaScript, which I would consider in poor taste for these purposes).
I concur on the warm fuzzy blanket of DTP software, personally.
exactly. that sounds like a pretty good description of hell to me. i much prefer highlighting the text, alt/option-left/right to adjust the kerning in real time.
this type of description of using CSS for DTP reminds me of using tables for layout before CSS. it was a bullshit solution waiting for something better. only, in this case, DTP software is already there and better. so why would someone do this to themselves?
I just worry that the growing generations don’t degrade typesetting quality too much… Not that it’s particularly stellar in the mainstream, but I sure hope it wouldn’t get even worse!
Luckily for the younger generations, there's no text to typeset in a tiktok video. It's not like they are reading text anyways. Everything now is a TL;DR or some other summary in short tweet like blurbs which may or may not even be a complete sentence. Any post with extra text providing the background or finer details is lambasted as being too wordy.
Cool! This was one of the tools I considered to help get the open-access textbook to its print version (to be published later in 2024). Since I was using Hugo I went with some custom CSS that gets enabled during the PDF build phase and website2pdf to handle the generation: https://github.com/jgazeau/website2pdf
Browser support for printing CSS is spotty. Worse: some features, like footnotes on every page, don't have any equivalent in CSS I know of.
Is there any easy to use/hack HTML layouting engine where I could experiment with custom CSS attributes and bridge that gap? Would anything from Servo be suitable?
Modifying an entire browser with its bloat is too much effort. There is no JS or cookies on paper (they can be in paper if you wrap them).
> Is there any easy to use/hack HTML layouting engine where I could experiment with custom CSS attributes and bridge that gap? Would anything from Servo be suitable?
Servo could be used for this. You'd want to add support for parsing the CSS properties themselves to the style crate in https://github.com/servo/stylo and then the layout implementation to the layout2020 crate in https://github.com/servo/servo. You do effectively get a whole browser though.
PrinceXML is expensive but I feel it's worth the money. I've used it to layout several RPGs with Markdown source files and HTML and CSS for templates and styling. RPG layouts can be quite complex with stat blocks and the like and it's handled everything I've thrown at it.
Nothing all that exciting, I'm afraid. The new director of InfoSec must have watched a Cable News Show about supply chain attacks, or something, so suddenly anything with package management - pip, npm, gem, etc - was banned from the official Windows policy. Since his flunkies didn't want to get nailed, they just went ahead and flushed any associated environments/runtimes too. It wasn't super consistent. It was, however, generally a surprise - you'd log in one Monday and whoops! Where'd my Python tooling go?
Now, ok, funny thing. Engineering could just get bare metal laptops, whenever they wanted, then blow the thrice-blessed CentOS image on it, and then do whatever the hell. So what happened - and this probably sounds real predictable - they used the CentOS machines to make all sorts of nutty crap, boxed it up, and then sent it back to their "official" Windows machines, now as a locked-in-amber config that never updates, even if five years later it had like fifty zero days in it and none of the libraries were good anymore.
I understand it took a new director and a LOT of meetings to explain whitelist mirrors for package managers, but I was long gone by then, even if I had a tiny hand in rolling out the demo whitelist mirror on-prem. Man, I had no idea what I was doing . . it still makes me shudder when I think of the things they were asking me to do.
I have been creating print labels with plot/cut lines using css and I used browsers to covert it to PDF. The experience was terrible. While all was perfect on my 1-page proof print, both large browsers messed up the final document (with a few hundred labels on several pages).
Firefox forgot to render images after a few pages. So on some labels the barcodes were not printed.
Chrome looked good at the fist glance. But it turned out that the plot/cut lines (which I created via CSS borders) had been shifted by 1-2mm on _some_ pages. Result was garbage.
I finally switched to https://github.com/flyingsaucerproject/flyingsaucer which is a high quality HTML/CSS to PDF library. Only drawback is that it only supports CSS 2.1, so some fancy features are not supported like rotating text.
Huh. I just did that a few months ago without issues in Chrome and Firefox after a little tweaking. Perhaps it was related to how you centred it? The only issue I had was that in order to print it properly I needed the margins set as small as possible, and doing that operation in the chrome print preview was horribly slow (firefox was fine). And yeah, I put in cut lines too, and was printing ~50 pages.
Another issue is ideally, before printing, you'd (I'd) like a chance to re-render canvas based illustrations at resolutions and/or colors that are appropriate for printing. But AFAICT, there's no event "onprint" or something to give a chance to to do that. You can offer a print button in HTML but if the user chooses print directly from the browser you have no chance to respond.
I have a library that uses a headless browser to generate pdfs, and supports "manual" (js-based) triggering of event for pdf generation, enabling the use of any client-side composition you like. The implementation is somewhat specific to my template library, but the templates themselves are just a zip file with the html/css/image/js/etc files: https://zipreport.github.io/zipreport/
Can you give some more information? Is it supported by Firefox or Safari? I found https://www.w3.org/TR/css-gcpm-3/#footnotes but it seems to be completely unsupported at this time.
Browser support for printing is so bad that I'm going to have to create a native app for https://www.thingybase.com to have a streamlined workflow of printing labels to a thermal label printer.
It works now, but first you have to download a PDF and print that. Google Docs does this too when you hit "Print".
I've tried hacks like load the PDF in an iframe and use some JavaScript to print that, but then I get footers with the URL of the webpage, which doesn't work for my purposes.
If you work on Chrome/Blink, Safari/Webkit, or Firefox/Mozilla please please please at least get the hacks working!
A CSS standard would be great, but really I'd be happy if I could call a function like `windows.print("/document.pdf")` and have it print the PDF without all the footer/headers stuff.
I realize you're working on the software side rather than the hardware side, and you mentioned you're trying to support existing printers, but your comment reminded me of something I'd like to throw out there:
I would like to have a thermal printer that plugs into the computer via USB and appears as a mass storage device. When you drop an image file (PNG, BMP) into it, it prints it out. A config file would tell the printer what paper stock you have loaded. This way the computer does not need any special drivers, and it would be so stupidly easy to write programs that generate image files to be printed. I think more hardware should take advantage of filesystem operations as a control method.
If someday you ever think about producing a first-party thingybase printer...
I did this a long time ago. Made a small qt app and hooked it to the `print://` scheme in the Windows registry, so in the web server I just had to change a link to `print://example.com/whatever` and the app would load the url and send it to the printer, for the website's domain it would skip any confirmation.
I’ve worked with thermal printers and they are actually quite capable. They have drawing primitives that allow you to lay out basic graphics and the one I was working with even had basic font support. A native driver is definitely the way to go with these.
> A native driver is definitely the way to go with these.
Not if you need to support a bunch of different printers.
For me PDF is the way to go, but browsers can't print a PDF via a "Print" button within the HTML, so instead I have to build apps that download the PDF and print it directly through operating system APIs.
I have this in my SaaS, as well. I built an Electron app that ushers the jobs from the cloud to the thermal printer. It all works until there is a failed job and the queue backs up. I haven't quite figured out how to monitor the queue remotely yet.
If so, then you have a full browser to generate your pdfs on. An approach I have used (more or less):
- have a print.html template page
- use iframe to render and load the page
- either on the print page, or from top frame, call window.print
You can use the aforementioned js libraries to generate footers and such, and use any print css. You can test the print layout in your dev tools, no need to wait for browser preview.
Note that browsers will try to render thead and tfooter elements on every page their parent table spans, that can be very useful.
At work we have a playwright container with express js waiting for incoming requests to load the print.html page and save it as pdf. We're also using volumes so that all data exchange happens in disk and not being passed around in http requests.
PDFs that took maybe 30 seconds with weasy print, now take 5 seconds.
We use HTML & CSS to create invoices and some eBooks, one resource I love is print-css.rocks [0] which has a comparison of tools available (for us Weasyprint [1] is good enough) and also several tutorials of how to accomplish tasks.
I've always (like in over a decade) knew CSS as by far the easiest way to print from any app I would build. Just output some concise HTML+CSS and a reasonably pretty printable document is ready. Perhaps learning PostScript or TeX could give me more but I found these prohibitively hard as long as I want to print something custom and not the template everyone uses. Meanwhile writing printer-optimized CSS typesetting just took me some minutes although I never had to dedicate any time to study it seriously (some w3schools was enough).
I've worked with this extensively on my site golfcourse.wiki, because I've build a tool that generates printable course books with CSS.
I originally was going to generate PDFs, but my backend is in python, and all the PDF generators have some issues with SVG images, especially with gradients. The site is a wiki, so SVG seemed like the best way to map courses while allowing them to easily be edited down the road. Since CSS and SVG work hand in hand, I decided to start working on that, and it's actually turned out okay.
I almost never do real printing (to paper), but this looks really easy to get something working. Also the cheatsheet at the end of the article is a super useful summary of the css rules used. Great article to save in case I face having to build something that prints in the future.
I was just coming here to say the same thing about. It has been pretty good for document generation when combined with AWS Lambdas + headless Chrome or wkhtmltopdf (if you don't need modern CSS functionality).
Back in 2015 i was the solo developer for a "social network" to offer jobs and hire people. I used this CSS technique so the profile of the job seeker could be printed by the companies as a curriculum (our niche was factory workers, so everyone involved still used paper rather than digital formats).
Before that i made a online editor to create certificates for workshops and courses, again used the same technique.
Both projects are kinda dead now, but it was nice getting everything done using only vanilla JS and CSS. Very powerful stuff.
I recently setup a label printer where the labels get printed from a full page app to a Brother label printer that takes rolls of 62mm wide and 30 meters long. The printer cuts after each page break. It's a regular usb printer but with a funny page size. Also, Brother's website for finding the mac drivers is a bit of a maze of misdirection that tries to get you to install their extremely shitty application. In the end I found a 3rd party website with a working printer driver for this thing.
After a bit of tinkering, I was able to print 6.2x12.4cm labels with a QR code. Really fun to watch it spit out these stickers. Each sticker has a unique QR code with the printed code in text form below it.
Some gotchas:
- you have to use margin 0 to hide the ugly browser headers that get added in the margin otherwise.
- Firefox has no print preview. This makes testing a bit hard. I used Chrome for testing this in the end.
- To print, you replace the dom tree with your print page, invoke print, and then copy back the app's root node from javascript. This is ugly as it shows the print page below your print dialog while that is open. I've not found a better way to do this. I guess I can do some non page layout that shows while that happens. Hacky but it works.
- I've yet to make this work in landscape mode where the page is vertical 62x31 layout to print small stickers.
The rest is just usual html layout. I used a simple flex row with two elements. Print quality is surprisingly good. Cost per sticker is around 0.3 cents per sticker. Amazing value.
> Brother's website for finding the mac drivers is a bit of a maze of misdirection that tries to get you to install their extremely shitty application.
That seems surprising, considering how no-muss no-fuss their hardware & cartridges are. (As compared to, say, H-P.)
They should just publish technical information and let the Linux community make the software for them. All hardware companies should work like this. The results will be much better than their shitty "cost center" approach to software.
Having been doing this for a few years, the answer currently requires JS. A certain amount of CSS is needed, but JS is required for the following realistic features.
Table of contents, reflowing content across pages, templated footer, randomly selected background elements on pages.
This let's the Ctrl-P version locally match up with a headless Chromium render, allowing for much easier testing.
If that's all you know, it can be a reasonable alternative. But for most professional print use-cases it's woefully inadequate. Starting from the awkward box model for layout, down to the very bare-bones typography support.
You can add some code into CSS that will tell you if someone printed your webpage (or at least print-previewed). Make a tracking background image that only loads in the print version of the css.
Just typeset a webnovel for print so I could read it physically using Pandoc, CSS, and Prince and honestly the experience was really nice. Way better than any other option I'm aware of.
By reading this sentence, you hereby agree to all terms and conditions
presented by voussoir.net, including but not limited to terms which entitle
voussoir.net to a fraction not less than forty (40) percent (%) of your
monthly income from now until an as-yet undetermined time not less than
thirty six (36) months from the present date.
One might be interested in looking at the Gutenberg CSS Print Framework. It hasn't failed me in all the years that I have used, recommended, and included in most needs.
> You should use Letter or A4 as appropriate for your relationship with the metric system.
That's pretty clearly stated near the top of the article. As a European, if I wrote a similar post I would do it all in metric and trust that my US readers were smart enough to substitute the units of their choice without whining about it. The concepts would not change, and a quick google would tell you that an inch is approx 25mm, which is enough for you to understand what's going on.
I have been making these generators for my workplace, so the project scope is very narrow. You can follow the idea with any dimensions you need.
If you want to use these techniques on a global site, I think you could use multiple @page styles and switch between them with a dropdown like I did with portrait/landscape. Although A4 and letter are very close in size.
I appreciated the read, but in practice I just link to the css file in Paper-CSS.
It may not be a lot of CSS, but it is much easier for me to just include the page and have it deal with my page size than to review articles like this when I want to print.
Edit: when I linked to the repo, my comment disappeared. You can find Paper-CSS by searching Github or Node for cognitom/paper-css.
Would love a more experienced HN'ers tip to how to use links in comments w/o causing the comments to disappear.
I recently had to figure out a good chunk of this (on my own, I wish I had this resource then) for a one-off project. The biggest thing I learned was to give up all hope of cross-browser printing: just use Chrome. Every browser treats almost every property differently or flat out ignores them, but I found that Chrome had the highest score in doing what I wanted.
There are two sites that share a common history that provide a tool for TTRPG players to create homebrew documents, GM Binder and Homebrewery, that both only target Chrome for this reason. It’s great if we can support all browsers and don’t want the site to break in any browser, but we advise users that Chrome is the target almost just so we know authors and readers are on the same page.
The area chrome has consistently failed horribly for me in is in tables spanning multiple pages. Firefox and the old IE engine were far more reliable at this. Chrome has improved a little in this regard lately, but will still routinely spill thead on top of tbody content when there is a page split.
It's interesting that he skips over any fancy css tricks to get headers and footers and goes straight to generating from js. One advantage of this is that you can add page numbers and such to each header, which is totally impossible with css hacks. But if you're fine going without that, it seems like there are methods that produce mostly acceptable results with less effort than the javascript solution he proposes.
What browser does it work in natively? I tried this a couple of years ago and a few people claimed that it worked in kiosk mode for Chrome but I couldn't get it to work in any browser.
@page probably works for some things but when I tried to do page numbers it definitely didn't.
I ended up hacking something together (absolute positioning a number every ~700px or whatever so it sat on the same place each page).
I'm sure there are smarter ways to do some of the things I'm doing. Could you let me know what fancy tricks you're referring to, perhaps ::before and ::after with content properties?
These generator pages already rely on javascript to put our database data onto the page, whether by URL parameters or API calls, so once I'm in that mode, yeah I'm generating everything with article.innerHTML = '...' to get a bulk template on the page, and a series of createElement/append to make smaller elements like table rows.
I just did a quick search, so I don't claim to be an expert. The two main methods seem to be thead/tfoot and position:fixed. I didn't mean to imply that this are smarter/better, but they might be easier in certain conditions.
Oh, I see, thanks. I did figure out that thead automatically repeats across pages, but I also had to repeat other stuff outside the table, like letterhead.
I also made this sample file that shows thead repeating, but since it's a single continuous table it doesn't preview very well and spills out of the 8.5x11 article element. I'm not sure how I would take advantage of the browser's automatic thead/tfoot repeat while also accurately previewing the result in the browser, especially with more elements below the table.
My company is working on a product simplifying the creation of printable PDFs filled with tables and charts, leveraging a visual designer and API. It helps with the usual headaches like page breaking, page numbering, table of contents, etc.
Having gone down the rabbit hole of creating printable media/pages from html/css, I also discovered paged media. Generating printable pdfs using only the standard is very cumbersome but thankfully pagedjs[1] exists, which makes this a breeze.
It's great really, we switched all our doc generation from questpdf to puppeteersharp. You can simply feed it a static HTML and css and it print perfectly. Takes so much less time messing with it.
1. Dividing some amount of text into pages on a computer screen is unnecessary and annoying.
2. Adobe has a stranglehold on the format and is constantly dicking around with it. Lately I encountered a fillable PDF that Acrobat Reader refused to fill. I could fill it with Firefox but after saving it was no longer a fillable form for any Acrobat user. What's the point of a fillable form that disallows filling it out?!
3. Adobe increasingly supports JavaScript for form validation, etc. I can only imagine what a nightmare mess that is. If we're going to shoehorn in a browser, might as well just use a browser.
PDF is really a family of formats. There is the "good" (=sane) PDF, PDF/A-2, and then bunch of less good stuff (all the rest). PDF/A-2 doesn't have any interactive stuff, like forms or JS, and is a ISO standard that is not going to change under your feet. Because it is such mature and limited format, most readers should have no problem with PDF/A-2 files.
- Avoid printing section headers at the bottom of one page with the section content left headerless at the top of the next page.
- Prefer printing graphics and figures on whole pages instead of split across pages.
- Print out the URL of every hyperlink instead of having links only as useless underlined text.