Send a bunch of URLs as PDF to a Kindle
5 minutes read | 1008 words by Ruben BerenguelAdventures with Applescript, AWK and Things
I found an alternative way that can convert directly to ePub (more or less) with pandoc. You can find it at the end.
A few weeks, maybe a month ago I was talking with Josep Martínez about posts, reading and the Kindle, and he told me he reads everything as a PDF on Kindle.
As an avid reader (and happy Kindle owner) I thought why don’t I do that? One answer is that I add a lot of articles for my reading queue. Another would be that if my queue is very long, I need to cross-reference the Kindle and my reading list in Things. And yet another could be that converting webpages to PDF and sending to Kindle doesn’t sound awesome. Did I mention I use Safari? No fancy extensions for doing this for me.
Of course, I always have some crazy, scripty solution to try these things. This time involves:
- Shell command to get URLs from my pending articles list in Things,
- AppleScript to load the page in reader mode and print it.
Shell section
The first part, is extracting the URLs from the articles I have on my reading list on Things, with a bit of SQL and AWK:
sqlite3 /Users/ruben/Library/Group\ Containers/JLMPQHK86H.com.culturedcode.ThingsMac/Things\ Database.thingsdatabase/main.sqlite \
"SELECT t.notes from TMTask t JOIN TMTask p on p.uuid = t.project WHERE p.title = '📄 Articles' and t.trashed != 1 and t.status = 0;" \
| rg -v ".*.pdf$" \
| rg "http.*" \
| gawk 'BEGIN {acum=""} NR==1{acum="\""$1"\""}NR>1{acum = acum ", \"" $1 "\""} END {print acum}'
This is a set of several chained commands, with the following breakdown.
sqlite3 …
This is opening the Things database locally from my Mac and executing SQL that:
- Gets the notes section of,
- Tasks that are on project
📄 Articles
(projects are actually tasks with a special flag), hence the self-join - With status 0 (due) and not trashed
This returns a list of notes for all due tasks.
rg -v ".*.pdf$"*
A ripgrep filter, matches all lines ending in .pdf
and keeps those not (-v
) matching.
rg "http.*"
Another ripgrep
filter, keeping only lines starting with http
to remove any notes I may have added about the page.
gawk …
An inlined AWK script that:
- Initialises an accumulator to empty string
- Adds the first line to the accumulator, wrapped in quotes
- Adds all the other lines to the accumulator, wrapped in quotes and separated by commas,
- Prints the accumulator
AppleScript
I hate writing AppleScript (AS), but when it works it is surprisingly effective. As any other AS script ever (I suspect) it is a mash-up of many StackOverflow answers.
set urlList to {…}
repeat with theURL in urlList
tell application "Safari"
activate
try
tell window 1 to set current tab to make new tab with properties {URL:theURL}
on error
open location theURL
end try
end tell
delay 15
tell application "System Events"
tell application process "Safari"
set frontmost to true
tell menu bar 1
click menu item "Show Reader" of menu "View" of menu bar item "View"
click menu item "Export as PDF…" of menu "File" of menu bar item "File"
end tell
tell window 1
repeat until sheet 1 exists
end repeat
tell sheet 1
click pop up button "Where:"
repeat until menu 1 of pop up button "Where:" exists
end repeat
click menu item "Downloads" of menu 1 of pop up button "Where:"
click button "Save"
end tell
end tell
end tell
end tell
end repeat
AppleScript is not pretty, but is pretty readable. Just in case,
- The list of comma-separated, quoted URLs is pasted into
urlList
- For each URL:
- Open the URL in Safari
- Wait 15 seconds for it to load (there are more elegant ways with AS, but why bother)
- Switch to Safari
- Turn on Reader mode
- Export as PDF
- Save to Downloads
Finally, write a mail to your Kindle email account, attach all those PDFs and send it. Voilà!
Possible improvements
These are left as an exercise for the eager reader.
- Clean after yourself: Closing the newly opened tab after saving the PDF
- Avoid human intervention: Sending an email directly instead of saving to Downloads
- Poor man’s parallelization: Opening all the URLs first, without waiting (to warm Safari’s cache) and then open them “again”.
Alternative using pandoc
The advantage of this approach is that once you have an ePub, you can convert it to AZW3 via Calibre and then use whatever font you want in your Kindle.
sqlite3 /Users/ruben/Library/Group\ Containers/JLMPQHK86H.com.culturedcode.ThingsMac/Things\ Database.thingsdatabase/main.sqlite \
"SELECT t.notes from TMTask t JOIN TMTask p on p.uuid = t.project WHERE p.title = '📄 Articles' and t.trashed != 1 and t.status = 0;" \
| rg -v ".*.pdf$" \
| rg "http.*" \
| gawk '{\
gsub("https", "http", $1)
url = $1
ret_val = system("pandoc -s -r html " url " -t html --self-contained --quiet --log pandoc.log -o " NR ".html > /dev/null 2>&1");\
printf(NR ".html ");\
if(ret_val!=0){\
print("\033[31m "url" \033[0m (fail)") \
} else {\
print("\033[32m "url" \033[0m (success)")} \
}';\
pandoc -s -r html --self-contained --quiet -t epub --file-scope -o readings.epub *.html
This kind-of-almost-works. Problems:
- Although pandoc can fetch URLs and convert them into different formats, it doesn’t like epub as output. Thus, we need to force it to use any format that can take standalone images (otherwise epubs won’t have images)
- I found some HTTPS issues, so I replaced all https with http.
- Some relative paths and inlined images make pandoc fail hard (or hard enough, it logs at least) without creating any output
- Medium posts don’t embed the real image, but a blurry preview, so that’s what you get as output. You should never use medium for your blog.
- The output will look just so
Even as “bad” as it is, for text-heavy pages this is awesome, you can convert the page to Markdown and then to ePub, and the result will be perfectly readable in your Kindle.