Test The Small Stuff

November - 2020

Preface: While I generally prefer shorter examples, these are functional pieces that actually do something. I'm using them instead of contrived examples because they are small enough to digest quickly and provide a good code reading exercise.



I'm working on a tool to pull videos from NASA's Images site via their API. The initial step is to grab search query results to identify collections of videos. Each result page is a JSON object confined to a certain number of items. Pagination URLs point to follow up pages if necessary.

Going through the pagination and getting the results is straight forward. You just pull one page and look to see if a 'next' URL exists. If it does, you use it to grab the next page. Otherwise, you're done.

In the past, I would have just thrown together a quick script like this one to do the pulls.

#!/usr/bin/env python3

import json
import requests


def get_pages():
    
    keep_going = True
    counter = 1
    url = 'https://images-api.nasa.gov/search?media_type=video&description=apollo'
    while keep_going:
        keep_going = False
        response = requests.get(url)
        json_data = json.loads(response.content)
        print(f'Save File #{counter} stub with: {url}')
        for link in json_data['collection']['links']:
            if link['rel'] == 'next':
                keep_going = True
                url = link['href']
                counter += 1


get_pages()

It works, but it's ugly.

Also, it doesn't have tests.

While it's easy enough to get away without tests on something like this, I've decided against that. I'm testing everything.

Projects like these are perfect candidates for testing. First of all, I couldn't figure out how to test it. That's a code smell. Further, it breaks the second of Sandi Metz's five rules for development: A method can be no longer than 5 lines of code. Both factors point to an opportunity to practice testing.

I won't go through all the details of how I got there (you can watch me struggle through it if you want), but here's the first thing I ended up with via TDD:

#!/usr/bin/env python3

import json
import requests


class PageGetter:

    def __init__(self):
        self.counter = 1
        self.current_url = 'https://images-api.nasa.gov/search?media_type=video&description=apollo'
        self.get_next_json = True
        self.latest_json = json.loads('{}')

    def get_json(self):
        response = requests.get(self.current_url)
        self.latest_json = json.loads(response.content)
        print(f'Save File #{self.counter} stub with: {self.current_url}')

    def get_pages(self):
        while self.get_next_json:
            self.get_json()
            self.process_json()

    def process_json(self):
        self.get_next_json = False
        for link in self.latest_json['collection']['links']:
            if link['rel'] == 'next':
                self.counter += 1
                self.current_url = link['href']
                self.get_next_json = True


if __name__ == '__main__':
    pg = PageGetter()
    pg.get_pages()

The key was to hoist the code that was doing the work to a method outside the while loop. Once it was outside, I could point tests at it. By moving everything into a class, I could use instance variables to pass the state around.

Moving everything outside the while loop seems obvious in retrospect, but I'd never done it before. That's why testing everything is valuable. I learned how to do this on a small, straightforward project. If it was a big, hairy one, I would have skipped the testing. All my mental energy would have been on other concerns.

Seeing everything split out gave me another way to think about the while loop too. Instead of riding on the the self.get_next_json boolean, I setup a killer variable for the get_pages() while loop. As long as process_json() sees more data to pull, self.counter stays one step ahead of killer. As soon as there's no more data, self.counter doesn't get updated and killer catches up and ends the loop.

Here's what it looks like:

#!/usr/bin/env python3

import json
import requests


class PageGetterViaCounter:

    def __init__(self):
        self.counter = 1
        self.latest_json = json.loads('{}')
        self.current_url = 'https://images-api.nasa.gov/search?media_type=video&description=apollo'

    def get_json(self):
        response = requests.get(self.current_url)
        self.latest_json = json.loads(response.content)
        print(f'Save File #{self.counter} stub with: {self.current_url}')

    def get_pages(self):
        killer = 0
        while killer < self.counter:
            self.get_json()
            self.process_json()
            killer += 1

    def process_json(self):
        for link in self.latest_json['collection']['links']:
            if link['rel'] == 'next':
                self.current_url = link['href']
                self.counter += 1


if __name__ == '__main__':
    pg = PageGetterViaCounter()
    pg.get_pages()

I'm not sure how I feel about this approach yet. I'm inclined to like it because process_json() only has to worry about it's own stuff, there's no need to flip self.get_next_json off and on, and it's a little shorter. While get_pages() grows, everything belongs to it directly.

But that's all besides the point. The tests are the key:

#!/usr/bin/env python3

import json
import unittest

from page_getter_via_counter import PageGetterViaCounter


class PageGetterViaCounterTest(unittest.TestCase):

    def setUp(self):
        global pgvc
        pgvc = PageGetterViaCounter()

        global sample_json_with_next
        sample_json_with_next = '''{
            "collection": { "links": [ { "rel": "next", "href": "https://www.example.com/next_page" } ] }
        }'''

        global sample_json_without_next
        sample_json_without_next = '''{
            "collection": { "links": [ { "rel": "prev", "href": "https://www.example.com/prev_page" } ] }
        }'''

    def test_counter_is_updated_by_next(self):
        # Given
        pgvc.latest_json = json.loads(sample_json_with_next)
        pgvc.counter = 4
        # When
        pgvc.process_json()
        # Then
        expected = 5
        actual = pgvc.counter
        self.assertEqual(expected, actual)

    def test_counter_is_not_updated_when_no_next(self):
        # Given
        pgvc.latest_json = json.loads(sample_json_without_next)
        pgvc.counter = 4
        # When
        pgvc.process_json()
        # Then
        expected = 4
        actual = pgvc.counter
        self.assertEqual(expected, actual)

    def test_url_is_updated_by_next(self):
        # Given
        pgvc.latest_json = json.loads(sample_json_with_next)
        # When
        pgvc.process_json()
        # Then
        expected = 'https://www.example.com/next_page'
        actual = pgvc.current_url
        self.assertEqual(expected, actual)


if __name__ == '__main__':
    unittest.main()

It was easy enough to put together the first version of the code without the tests, but would have been a lost opportunity. Doing tests on small projects is like a musician doing scales, a baseball player going to the batting cage, or any of a thousand other examples of people practicing their craft. Its honing the skill. Getting the muscle memory and the timing. It's all comes down to one thing.

The way we practice is the way we play.