Topic: https://brettterpstra.com/2015/01/06/reading-list-catcher/
hide preview

What's next? verify your email address for reply notifications!

unverified 3y, 262d ago

I took Zach Fine's updates and made several of my own to bring this up-to-date with changes in plistlib and the Pinboard lib, as well as to make it work using the 'all' command and work better from the command line as a cronjob or shell script with better output and error handling.

My edits/version are in a Gist here https://gist.github.com/samuelkordik/6124c6d4d0d5c090594ae17531e733e8

remark link
hide preview

What's next? verify your email address for reply notifications!

Brett 3y, 262d ago

This is great, props to both you and Zach Fine.

hide preview

What's next? verify your email address for reply notifications!

unverified 4y, 261d ago [edited]

The script wouldn't run for me, likely due to problems with newer libraries and python3.7. I played whack-a-mole fixing item by item with no knowledge of python other than what I can google. So my changes are messy, and I've removed a test that related to dates. But I'm offering it below in case it's of use to anyone.

I also added one feature -- the items in pinboard will now have the same "date added" value as the original reading list item, rather than all inheriting the date they were synced to pinboard.

To run the script, note that it takes arguments of 'pb', 'md', or 'all' to export to pinboard, markdown file, or both.

Thanks Brett for posting the original script!

#!/usr/bin/python
# ReadingListCatcher
# - A script for exporting Safari Reading List items to Markdown and Pinboard
#   Brett Terpstra 2015
# Modified (clumsily) by Zach Fine 2020
# Uses code from <https://gist.github.com/robmathers/5995026>
# Requires Python pinboard lib for Pinboard.in import:
#     `easy_install pinboard` or `pip install pinboard`
import plistlib
from shutil import copy
import subprocess
import os
from tempfile import gettempdir
import sys
import atexit
import re
import time
from time import mktime
from datetime import date, datetime, timedelta
from os import path
import pytz

DEFAULT_EXPORT_TYPE = 'md' # pb, md or all
PINBOARD_API_KEY = 'username:API_KEY' # https://pinboard.in/settings/password
BOOKMARKS_MARKDOWN_FILE = '~/Dropbox/Reading List Bookmarks.markdown' # Markdown file if using md export
BOOKMARKS_PLIST = '~/Library/Safari/Bookmarks.plist' # Shouldn't need to modify

bookmarksFile = os.path.expanduser(BOOKMARKS_PLIST)
markdownFile = os.path.expanduser(BOOKMARKS_MARKDOWN_FILE)

# Make a copy of the bookmarks and convert it from a binary plist to text
tempDirectory = gettempdir()
sys.stdout.write('tempDirectory is ' + tempDirectory + '\n')
copy(bookmarksFile, tempDirectory)
bookmarksFileCopy = os.path.join(tempDirectory, os.path.basename(bookmarksFile))

def removeTempFile():
    os.remove(bookmarksFileCopy)

#atexit.register(removeTempFile) # Delete the temp file when the script finishes

class _readingList():
    def __init__(self, exportType):

        sys.stdout.write('running readinglist \n')

        self.postedCount = 0
        self.exportType = exportType

        if self.exportType == 'pb':
            sys.stdout.write('self.exportType=' + self.exportType + '\n')
            import pinboard
            self.pb = pinboard.Pinboard(PINBOARD_API_KEY)

        converted = subprocess.call(['plutil', '-convert', 'xml1', bookmarksFileCopy])

        if converted != 0:
            print('Couldn\'t convert bookmarks plist from xml format')
            sys.exit(converted)

        with open(bookmarksFileCopy,'rb') as fp:
            plist=plistlib.load(fp)

# this method of opening the plist no longer works, gotta use plistlib.load (see above)
#        plist = plistlib.readPlist(bookmarksFileCopy)

         # There should only be one Reading List item, so take the first one
        readingList = [item for item in plist['Children'] if 'Title' in item and item['Title'] == 'com.apple.ReadingList'][0]

        if self.exportType == 'pb':
            lastRLBookmark = self.pb.posts.recent(tag='.readinglist', count=1)
#            last = lastRLBookmark['date']
# this test seems to make no items get synced, so I'm bypassing it as I plan to clear my reading list completely after sending to pinboard:
            last = time.strptime("2013-01-01 00:00:00 UTC", '%Y-%m-%d %H:%M:%S UTC')

        else:
            self.content = ''
            self.newcontent = ''
            # last = time.strptime((datetime.now() - timedelta(days = 1)).strftime('%c'))
            last = time.strptime("2013-01-01 00:00:00 UTC", '%Y-%m-%d %H:%M:%S UTC')

            if not os.path.exists(markdownFile):
                open(markdownFile, 'a').close()
            else:
                with open (markdownFile, 'r') as mdInput:
                    self.content = mdInput.read()
                    matchLast = re.search(re.compile('(?m)^Updated: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} UTC)'), self.content)
                    if matchLast != None:
                        last = time.strptime(matchLast.group(1), '%Y-%m-%d %H:%M:%S UTC')

            last = datetime.strptime(*last[:6])

            rx = re.compile("(?m)^Updated: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) UTC")
            self.content = re.sub(rx,'',self.content).strip()

        if 'Children' in readingList:
            cleanRx = re.compile("[\|\`\:_\*\n]")
            for item in readingList['Children']:
                last_dt = datetime.fromtimestamp(mktime(last))
                if item['ReadingList']['DateAdded'] > last_dt:
                    addtime = pytz.utc.localize(item['ReadingList']['DateAdded']).strftime('%c')
                    titletemp = item['URIDictionary']['title']
#                    title = re.sub(cleanRx, ' ', item['URIDictionary']['title'].encode('utf8'))
                    title = re.sub(cleanRx, ' ', titletemp)
                    title = re.sub(' +', ' ', title)
                    title = title.encode('utf8') # moved encode to the end of processing
                    url = item['URLString']
                    description = ''

                    if 'PreviewText' in item['ReadingList']:
                        description = item['ReadingList']['PreviewText']
#                        description = item['ReadingList']['PreviewText'].encode('utf8')
                        description = re.sub(cleanRx, ' ', description)
                        description = re.sub(' +', ' ', description)
                        description = description.encode('utf8') #moved the encode to the end of processing


                    if self.exportType == 'md':
                        self.itemToMarkdown(addtime, title.decode(), url, description.decode())
                    else:
                        if not title.strip(): #Z need to handle the case of no title as pinboard requires one
                            title='no title'
                            title=title.encode('utf8')
                        post_time=pytz.utc.localize(item['ReadingList']['DateAdded'])
                        self.itemToPinboard(post_time, title.decode(), url, description.decode())
                else:
                    break

        pluralized = 'bookmarks' if self.postedCount > 1 else 'bookmark'
        if self.exportType == 'pb':
            if self.postedCount > 0:
                sys.stdout.write('Added ' + str(self.postedCount) + ' new ' + pluralized + ' to Pinboard')
            else:
                sys.stdout.write('No new bookmarks found in Reading List')
        else:
            mdHandle = open(markdownFile, 'w')
            mdHandle.write('Updated: ' + datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S') + " UTC\n\n")
            mdHandle.write(self.newcontent + self.content)
            mdHandle.close()
            if self.postedCount > 0:
                sys.stdout.write('Added ' + str(self.postedCount) + ' new ' + pluralized + ' to ' + markdownFile)
            else:
                sys.stdout.write('No new bookmarks found in Reading List')

        sys.stdout.write("\n")

    def itemToMarkdown(self, addtime, title, url, description):
        sys.stdout.write('running itemToMarkdown \n')
        self.newcontent += '- [' + title + '](' + url + ' "Added on ' + addtime + '")'
        if not description == '':
            self.newcontent += "\n\n    > " + description
        self.newcontent += "\n\n"
        self.postedCount += 1

    def itemToPinboard(self, post_time, title, url, description):
        sys.stdout.write('running itemToPinboard \n')
        suggestions = self.pb.posts.suggest(url=url)
        tags = suggestions[0]['popular']
        tags.append('.readinglist')

        # sys.stdout.write('post_time = ' + post_time + '\n')

        self.pb.posts.add(url=url, dt=post_time, description=title, \
                extended=description, tags=tags, shared=False, \
                toread=True)
        print(title)
        print('\n')
        self.postedCount += 1

if __name__ == "__main__":
    exportTypes = []
    if len(sys.argv):
        for arg in sys.argv:
            if re.match("^(md|pb|all)$",arg) and exportTypes.count(arg) == 0:
                exportTypes.append(arg)
    else:
        exportTypes.append(DEFAULT_EXPORT_TYPE)

    for eType in exportTypes:
        _readingList(eType)


sys.stdout.write('running\n')
hide preview

What's next? verify your email address for reply notifications!

Rakesh 10y, 29d ago

managed to get this to work. there seems to be an issue with below lines --


lastRLBookmark = self.pb.posts.recent(tag='.readinglist', count=1)
last = lastRLBookmark['date']

Pinboard returns current datetime if '.readinglist' tag doesn't exist yet (true in my case). So I had to pick an old Pinboard bookmark and tag it with '.readinglist'.

Also had to change the first line to --


#!/usr/bin/env python
remark link
hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 29d ago

Tried both. Workflow throws error: Traceback (most recent call last).

hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 29d ago

D'oh. That should test for null/empty response and have a fallback plan. Limit to 30 most recent if it's the first check?

remark link parent
hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 29d ago

Maybe it has to do with datetime format? Which one do you use?

hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 32d ago

Script also throws error: 'Syntax Error Expected end of line, etc. but found identifier.'

remark link
hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 32d ago

That's an AppleScript error, I believe, which doesn't make any sense. Running as a workflow or from the command line?

remark link parent
hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 32d ago

I run as workflow from automator (generic error message) and as script from script editor (where I got this error message). Error refers to line 'import plistlib'.

remark link parent
hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 32d ago

I'm going to assume it has to do with your Python version. plistlib is available in Python 2.6+. What do you get when you run:

python --version

remark link parent

load more (1 remarks)
hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 32d ago

Thank you for following this up. Python version 2.7.6. I installed it yesterday, following your instructions, specially to try your workflow.

remark link parent
hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 32d ago

It's probably not running the script properly with python, or your PYTHON_PATH isn't picking up the correct version. I honestly don't know how to debug that, assuming your workflow is still set to use python to run the shell script.

hide preview

What's next? verify your email address for reply notifications!

Dimitris 10y, 33d ago

Workflow results in a generic error message as soon as it starts to run, just after it switches focus to Safari. Sadly, I'm not a programmer, so at a loss as to what can be wrong. All I changed was inserting my Pinboard token. I'd love for this to work!

hide preview

What's next? verify your email address for reply notifications!

Guest 10y, 72d ago

The script didn't work for me. I guess the section of the code which, according to the comments, should convert the binary plist to a regular one, doesn't. I edited a copy of the script to use Python's `biplist` library—hopefully that'll work without a hitch for me.

remark link
hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 32d ago

Given my unfamiliarity with Python, could you throw me a gist of your modifications?

hide preview

What's next? verify your email address for reply notifications!

Alex 10y, 72d ago

Have you considered talking to iCloud directly instead of parsing the plist?

(Really Good safari+pinboard integration is something I've wanted for ages, because Reading List on iOS is so convenient, but I've never had the time+employer permission to finish a project. Glad to see you release this!)

remark link
hide preview

What's next? verify your email address for reply notifications!

ttscoff 10y, 72d ago

Is iCloud parsing of Reading List possible from a script? If so, I'd be very interested in resources...

remark link parent
hide preview

What's next? verify your email address for reply notifications!

Alex 10y, 72d ago

Yep! I started, but never had the chance to finish, and employer permission is...difficult to get. It's also not entirely clear if this is OK with Apple (but then again, directly reading the PLIST almost certainly isn't either.) I figured most of this out by using Charles and watching what Safari did when I added and removed reading list entries.

https://github.com/abl/iclo... (based off of another project which appears to have been deleted; I just added bookmark API support.)

You'll need to pip install httplib2 and pydes, at which point (and I just verified this) you can run:

python -i test_bookmark_list.py

Take a look at the 'b' object, which represents all of your Safari bookmarks. Reading List is implemented as a specially named folder (com.apple.ReadingList, probably at b[0].) Unfortunately it looks like Apple changed something, as trying to actually read an individual bookmark yields an assertion error.

https://github.com/picklepe... appears to be an attempt to make a solid Python library for iCloud - they don't support bookmarks, but adding support there might be easier than working with my hack of a hack. :)

hide preview

What's next? verify your email address for reply notifications!