Send a bunch of URLs as PDF to a Kindle

November 17, 2020 5 minutes read | 1008 words by Ruben Berenguel

Adventures with Applescript, AWK and Things

added on 2020-11-22 Alternative using Pandoc

I found an alternative way that can convert directly to ePub (more or less) with pandoc. You can find it at the end.

Shell section
AppleScript
- Possible improvements
Alternative using pandoc

A few weeks, maybe a month ago I was talking with Josep Martínez about posts, reading and the Kindle, and he told me he reads everything as a PDF on Kindle.

As an avid reader (and happy Kindle owner) I thought why don’t I do that? One answer is that I add a lot of articles for my reading queue. Another would be that if my queue is very long, I need to cross-reference the Kindle and my reading list in Things. And yet another could be that converting webpages to PDF and sending to Kindle doesn’t sound awesome. Did I mention I use Safari? No fancy extensions for doing this for me.

Of course, I always have some crazy, scripty solution to try these things. This time involves:

Shell command to get URLs from my pending articles list in Things,
AppleScript to load the page in reader mode and print it.

Shell section

The first part, is extracting the URLs from the articles I have on my reading list on Things, with a bit of SQL and AWK:

sqlite3 /Users/ruben/Library/Group\ Containers/JLMPQHK86H.com.culturedcode.ThingsMac/Things\ Database.thingsdatabase/main.sqlite \
"SELECT t.notes from TMTask t JOIN TMTask p on p.uuid = t.project WHERE p.title = '📄 Articles' and t.trashed != 1 and t.status = 0;" \
| rg -v ".*.pdf$" \
| rg "http.*" \
| gawk 'BEGIN {acum=""} NR==1{acum="\""$1"\""}NR>1{acum = acum ", \"" $1 "\""} END {print acum}'

This is a set of several chained commands, with the following breakdown.

`sqlite3 …`

This is opening the Things database locally from my Mac and executing SQL that:

Gets the notes section of,
Tasks that are on project 📄 Articles (projects are actually tasks with a special flag), hence the self-join
With status 0 (due) and not trashed

This returns a list of notes for all due tasks.

`rg -v "..pdf$"`

A ripgrep filter, matches all lines ending in .pdf and keeps those not (-v) matching.

`rg "http.*"`

Another ripgrep filter, keeping only lines starting with http to remove any notes I may have added about the page.

`gawk …`

An inlined AWK script that:

Initialises an accumulator to empty string
Adds the first line to the accumulator, wrapped in quotes
Adds all the other lines to the accumulator, wrapped in quotes and separated by commas,
Prints the accumulator

AppleScript

I hate writing AppleScript (AS), but when it works it is surprisingly effective. As any other AS script ever (I suspect) it is a mash-up of many StackOverflow answers.

set urlList to {…}

repeat with theURL in urlList
	
	tell application "Safari"
		activate
		try
			tell window 1 to set current tab to make new tab with properties {URL:theURL}
		on error
			open location theURL
		end try
	end tell
	
	delay 15
	
	tell application "System Events"
		tell application process "Safari"
			set frontmost to true
			tell menu bar 1
				click menu item "Show Reader" of menu "View" of menu bar item "View"
				click menu item "Export as PDF…" of menu "File" of menu bar item "File"
			end tell
			tell window 1
				repeat until sheet 1 exists
				end repeat
				tell sheet 1
					click pop up button "Where:"
					repeat until menu 1 of pop up button "Where:" exists
					end repeat
					click menu item "Downloads" of menu 1 of pop up button "Where:"
					click button "Save"
				end tell
			end tell
		end tell
	end tell
end repeat

AppleScript is not pretty, but is pretty readable. Just in case,

The list of comma-separated, quoted URLs is pasted into urlList
For each URL:
- Open the URL in Safari
- Wait 15 seconds for it to load (there are more elegant ways with AS, but why bother)
- Switch to Safari
- Turn on Reader mode
- Export as PDF
- Save to Downloads

Finally, write a mail to your Kindle email account, attach all those PDFs and send it. Voilà!

Possible improvements

These are left as an exercise for the eager reader.

Clean after yourself: Closing the newly opened tab after saving the PDF
Avoid human intervention: Sending an email directly instead of saving to Downloads
Poor man’s parallelization: Opening all the URLs first, without waiting (to warm Safari’s cache) and then open them “again”.

Alternative using pandoc

The advantage of this approach is that once you have an ePub, you can convert it to AZW3 via Calibre and then use whatever font you want in your Kindle.

sqlite3 /Users/ruben/Library/Group\ Containers/JLMPQHK86H.com.culturedcode.ThingsMac/Things\ Database.thingsdatabase/main.sqlite \
"SELECT t.notes from TMTask t JOIN TMTask p on p.uuid = t.project WHERE p.title = '📄 Articles' and t.trashed != 1 and t.status = 0;" \
| rg -v ".*.pdf$" \
| rg "http.*" \
| gawk '{\
gsub("https", "http", $1)
url = $1
ret_val = system("pandoc -s -r html " url " -t html --self-contained --quiet --log pandoc.log -o " NR ".html > /dev/null 2>&1");\
printf(NR ".html ");\
if(ret_val!=0){\
    print("\033[31m "url" \033[0m (fail)") \
    } else {\
    print("\033[32m "url" \033[0m (success)")} \
}';\
pandoc -s -r html --self-contained --quiet -t epub --file-scope -o readings.epub *.html

This kind-of-almost-works. Problems:

Although pandoc can fetch URLs and convert them into different formats, it doesn’t like epub as output. Thus, we need to force it to use any format that can take standalone images (otherwise epubs won’t have images)
I found some HTTPS issues, so I replaced all https with http.
Some relative paths and inlined images make pandoc fail hard (or hard enough, it logs at least) without creating any output
Medium posts don’t embed the real image, but a blurry preview, so that’s what you get as output. You should never use medium for your blog.
The output will look just so

Even as “bad” as it is, for text-heavy pages this is awesome, you can convert the page to Markdown and then to ePub, and the result will be perfectly readable in your Kindle.

Shell section

sqlite3 …

rg -v ".*.pdf$"*

rg "http.*"

gawk …