Install Django and Pulling URLs and Titles from Safari Tabs - Parts 1 and 2

October - 2020

youtube: https://www.youtube.com/watch?v=WoAD9nUEQJA

youtube: https://www.youtube.com/watch?v=pDlonZSL6q4

[Time: 00:00:00] TextExpander ISO 8601 Snippet Fix

Started off the stream with a quick fix for an old post. I wrote this TextExpander snippet back in 2015. It outputs an ISO 8601 timestamp (e.g. 2020-10-04T16:09:34-0400). It's been super handy over the years. The one in my TextExpander works fine, but the one in the post had a bug. I didn't know about it until I got this tweet from @pulamusic.

I took a look at the source code for the page and it looked fine (though, I realized when reviewing the stream it wasn't). So, I sshed into my server and fixed it that way. I had to do that because the link was actually to the old instance of my site. It's still live at http://alanwsmith.com as compared to the new production version of my site which is https://www.alanwsmith.com. (It's on my list to get everything setup so the non secure, non-www version redirects. Just gotta get to it.)

[Time: 00:19:00] Installing Drupal I mean Django

I've decided to use Django to replace my local launchpad website that's just a bunch of PHP files. (Django, not Drupal, which I confuse the name of every time.) Someone asked why I was going with Django instead of sticking with PHP. My thinking:

  • When I write code these days, it's mostly python and that's what Django uses
  • I want to use a framework instead of just writing a bunch of code myself. (I've made my own frameworks. I'm happy to not have to do that anymore.)
  • I'm not religious about any framework or language stuff so it just works for me

Once I figured out that I was after Django instead of Drupal, I did the quick install then spent about an hour going through the tutorial. Half the time was me getting frustrated with it. I'd looked at Django a few years back and remember the same frustrations. It's not broken. The code works, but there is so much room for improvement. I added "Make a Better Django Tutorial" to my list.

If it wasn't so frustrating, I would have spent more time on it. But, I was beat, so I bailed. I'll get back to it on another stream.

[Time: 01:26:00] Getting Browser Tab URLs

This is a new one that occurred to me when writing up earlier stream notes. I spent a lot of time going back and forth between my text editor and my browser copying URLs and then typing in the titles and notes for the various links I used. My goal was to create a script to automate that process as much as possible.

What's awesome is that most of the work was already done for me. I found this AppleScript that grabs titles and URLs from all the tabs in all the windows of Safari and copies them to the clipboard. One quick edit to put the output in Markdown format and I could have stopped there, but I wanted a little more.

The first thing I was after was a way to fire off the AppleScript from a PHP page on my local launchpad tools site. I'd done some osascript calls to fire AppleScript over the past week so I was optimistic I could get it to work.

I couldn't.

I spent some decent time on it and kept seeing behavior that looked like it should have worked, but didn't. I have some more ideas to try, but given that I'm moving to Django it wasn't worth spending more time on it. Instead, I moved over to using a plain old python script. The reason for that instead of just using the working AppleScript is because I wanted to capture meta tag descriptions of the pages as well.

It might be possible to pull down a web page and parse it with AppleScript, but I have no desire to try. Hence, python. I started using Selenium to do the parsing but ran into some aggravating issues with getting the title of the page (which I didn't need it was just what I was using a tone test).

Seems that for every element on a page, Selenium uses:

    element = driver.find_element_by_tag_name("title")
    print(element.text)

Every element except for the title that is. You have to get the title with driver.title. Here's an example:

#!/usr/bin/env python3

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options

def get_details(url):
    options = Options()
    options.headless = True
    driver = webdriver.Firefox(options=options)
    driver.get(url)
    wait = WebDriverWait(driver, 10)
    page_title = driver.title
    driver.quit()
    return page_title

print(get_details("https://www.alanwsmith.com/"))

Once I got that figured out, I moved on to getting the description. This is around the time I ran out of steam in the first stream and picked up in the second one. In that second stream, the code got complicated enough that I moved over from just a little script to a more structured one that included tests. I made a lot of progress on that, but haven't finished it up yet. I'll do that in the next stream.

Miscellaneous

Random stuff from the stream.


Python snippet that calls an external command and returns the STDOUT results into a variable

command_response = subprocess.run(['osascript', 'tab-parser.scpt'], stdout=subprocess.PIPE).stdout.decode('utf-8')

These PHP calls to osascript fired and got Safari to active and bring itself to the front window.

shell_exec("osascript -e 'tell application \"Safari\" to activate'");

shell_exec("osascript -l JavaScript -e 'var Safari = new Application(\"/Applications/Safari.app\"); Safari.activate();'");

But, when I tried to run the AppleScript file, I couldn't get it to work.

// no go
echo(shell_exec('osascript tab-parser.scpt'));

I tried this and about a thousand variations. I expect there's a security thing involved. There's probably a way to do it, but it wasn't worth more effort for me.


I discovered that when you use python's urllib.request.urlopen(url), you need to wrap it in a try. Otherwise, your script will explode if it hits something like a 403 error. (And since we're talking about it, you also need to decode the .read() call to utf-8). Here's a sample:

#!/usr/bin/env python3

import urllib.request

def get_web_page(url):
    try:
        with urllib.request.urlopen(url) as response:
            return response.read().decode("utf-8")
    except:
        return ""

if __name__ == "__main__":
    html_doc = get_web_page("https://www.alanwsmith.com/")
    print(html_doc)

Links From The Stream

Here's some of the various links from the stream.