Recursion Example — Walking a file tree

Many developers use Python as a platform independent scripting language to perform file system operations. Sometimes it’s necessary to walk through a file system. Here is one way to navigate a file system recusively. (Of course, Python has libaries that do this!)

import os

def walk_fs(start_dir):
    # Get a list of everything in start_dir
    contents = os.listdir(start_dir)

    # This stores the output
    output = []

    # Loop through every item in contents
    for f in contents:
        # Use os.path.join to reassmble the path
        f_path = os.path.join(start_dir, f)

    # check if f_path is directory (or folder)
    if os.path.isdir(f_path):
        # Make recusive call to walk_fs
        output = output + walk_fs(f_path)
    else:
        # Add the file to output
        output.append(f_path)

    # Return a list of files in the directory
    return output

if __name__ == '__main__':
    try:
        result = walk_fs(input('Enter starting folder => '))
        for r in result:
            print(r)
    except FileNotFoundError:
    print('Not a valid folder! Try again!')

The key to this is to begin by using os.listdir, which returns a list of every item in a directory. Then we can loop through each item in contents. As we loop through contents, we need to reassemble the full path because f is only the name of the file or directory. We use os.path.join because it will insert either / (unix-like systems) or \ (windows) between each part of the path.

The next statement checks if f_path is a file or directory. The os.path.isdir function is True if the item is a directory, false otherwise. If f_path is a folder, we can make a recursive call to walk_fs starting with f_path. It will return a list of files that we can concat to output.

If f_path is a file, we just add it to output. When we have finished iterating through contents, we can return output. The output file will hold all of the files in start_dir and it’s subdirectorys.

Doubly Linked List—Python

We saw in Singly Linked List the benefits of using a linked list over an array. Linked lists are data structures that grow and expand as needed. Not only is it easy to add an element at the beginning or end of a linked list, but it is also easy to insert elements at arbitrary positions within the list. We can also easily remove elements from the linked list when they are no longer needed.

The doubly linked list is a varient of the singly linked list. The main complaint about a singly linked list is that it can only traverse the list in one direction starting at the head and working until it reaches the end of the list. Clients may not notice a performance hit when operating on small lists, but large lists will almost certainly have a performance hit.

Doubly linked lists work by tracking both the next node and previous nodes in the list. Tracking both nodes creates more overhead when inserting or removing from the list, but it allows the list to work in a bi-directional fashion. That can allow for a major performance boost when operating on large lists.

Here is the entire code followed by an explanation as to how it works. Note that the topic is the Doubly Linked List so I won’t be covering the unit testing code in this module, but I did include for those who are interested.

class Node:
    def __init__(self, element=None, next_node=None, prev_node=None):
        self.element = element
        self.next_node = next_node
        self.prev_node = prev_node

    def __str__(self):
        if self.element:
            return self.element.__str__()
        else:
            return 'Empty Node'

    def __repr__(self):
        return self.__str__()


class DoublyLinkedList:
    def __init__(self):
        self.head = Node(element='Head')
        self.tail = Node(element='Tail')

        self.head.next_node = self.tail
        self.tail.prev_node = self.head

    def size(self):
        count = 0
        current = self.head.next_node

        while current is not None and current != self.tail:
            count += 1
            current = current.next_node

        return count

    def insert_front(self, data):
        node = Node(element=data, next_node=self.head.next_node, prev_node=self.head)
        self.head.next_node.prev_node = node
        self.head.next_node = node

    def insert_last(self, data):
        node = Node(element=data, next_node=self.tail, prev_node=self.tail.prev_node)
        self.tail.prev_node.next_node = node
        self.tail.prev_node = node

    def insert(self, data, position):
        if position == 0:
            self.insert_front(data)
        elif position == self.size():
            self.insert_last(data)
        else:
            if 0 < position < self.size():
                current_node = self.head.next_node
                count = 0
                while count < (position - 1):
                    current_node = current_node.next_node
                    count += 1

                node = Node(element=data, next_node=current_node.next_node, prev_node=current_node)
                current_node.next_node.prev_node = node
                current_node.next_node = node
            else:
                raise IndexError

    def remove_first(self):
        self.head = self.head.next_node
        self.head.prev_node = None

    def remove_last(self):
        self.tail = self.tail.prev_node
        self.tail.next_node = None

    def remove(self, position):
        if position == 0:
            self.remove_first()
        elif position == self.size():
            self.remove_last()
        else:
            if 0 < position < self.size():
                current_node = self.head.next_node
                current_pos = 0

                while current_pos < position:
                    current_node = current_node.next_node
                    current_pos += 1

                next_node = current_node.next_node
                prev_node = current_node.prev_node

                next_node.prev_node = prev_node
                prev_node.next_node = next_node
            else:
                raise IndexError

    def fetch(self, position):
        if 0 <= position < self.size():
            current_node = self.head.next_node
            current_pos = 0

            while current_pos < position:
                current_node = current_node.next_node
                current_pos += 1

            return current_node.element
        else:
            raise IndexError


import unittest
from random import randint


class TestDoublyLinkedList(unittest.TestCase):
    names = ['Bob Belcher',
             'Linda Belcher',
             'Tina Belcher',
             'Gene Belcher',
             'Louise Belcher']

    def test_init(self):
        dll = DoublyLinkedList()
        self.assertIsNotNone(dll.head)
        self.assertIsNotNone(dll.tail)
        self.assertEqual(dll.size(), 0)

    def test_insert_front(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_front(name)

        self.assertEqual(dll.fetch(0), TestDoublyLinkedList.names[4])
        self.assertEqual(dll.fetch(1), TestDoublyLinkedList.names[3])
        self.assertEqual(dll.fetch(2), TestDoublyLinkedList.names[2])
        self.assertEqual(dll.fetch(3), TestDoublyLinkedList.names[1])
        self.assertEqual(dll.fetch(4), TestDoublyLinkedList.names[0])

    def test_insert_last(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_last(name)

        for i in range(len(TestDoublyLinkedList.names) - 1):
            self.assertEqual(dll.fetch(i), TestDoublyLinkedList.names[i])

    def test_insert(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_last(name)

        pos = randint(0, len(TestDoublyLinkedList.names) - 1)

        dll.insert('Teddy', pos)
        self.assertEqual(dll.fetch(pos), 'Teddy')

    def test_remove_first(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_last(name)

        for i in range(dll.size(), 0, -1):
            self.assertEqual(dll.size(), i)
            dll.remove_first()

    def test_remove_last(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_last(name)

        for i in range(dll.size(), 0, -1):
            self.assertEqual(dll.size(), i)
            dll.remove_last()

    def test_remove(self):
        dll = DoublyLinkedList()
        for name in TestDoublyLinkedList.names:
            dll.insert_last(name)

        dll.remove(1)

        self.assertEqual(dll.fetch(0), 'Bob Belcher')
        self.assertEqual(dll.fetch(1), 'Tina Belcher')
        self.assertEqual(dll.fetch(2), 'Gene Belcher')
        self.assertEqual(dll.fetch(3), 'Louise Belcher')


if __name__ == '__main__':
    unittest.main()

Node

Like the singly linked list, the doubly linked list begins with a Node class that holds the data contained in the list and references to the previous and next nodes. Here is the code for the Node.

class Node:
    def __init__(self, element=None, next_node=None, prev_node=None):
        self.element = element
        self.next_node = next_node
        self.prev_node = prev_node

    def __str__(self):
        if self.element:
            return self.element.__str__()
        else:
            return 'Empty Node'

    def __repr__(self):
        return self.__str__()

Here is a break down of each methods in the Node class.

__init__

The __init__ method creates the node. All three of its parameters are optional, but basically we are setting the data contained in the node, and building refrences to next_node and previous_node.

def __init__(self, element=None, next_node=None, prev_node=None):
    self.element = element
    self.next_node = next_node
    self.prev_node = prev_node

__str__ and __repr__

I found that there was more work invloved with debugging doubly linked list, so in this version of the Node class, I chose to implement __str__ and __repr__ so that I could easily identify each Node in my debugger.

def __str__(self):
    # Check if self.element is null
    if self.element:
        # Just return the string representation
        # of self.element
        return self.element.__str__()
    else:
        # Otherwise return Empty Node
        # to indicate the node does not
        # hold data
        return 'Empty Node'

# We are just going to reuse the
# code in __str__ here
def __repr__(self):
    return self.__str__()

Doubly Linked List

This is an example of a doubly linked list. It’s not opitmized for the simply fact that we are learning. If you need an optimized list structure, you Python’s list object.

class DoublyLinkedList:
    def __init__(self):
        self.head = Node(element='Head')
        self.tail = Node(element='Tail')

        self.head.next_node = self.tail
        self.tail.prev_node = self.head

    def size(self):
        count = 0
        current = self.head.next_node

        while current is not None and current != self.tail:
            count += 1
            current = current.next_node

        return count

    def insert_front(self, data):
        node = Node(element=data, next_node=self.head.next_node, prev_node=self.head)
        self.head.next_node.prev_node = node
        self.head.next_node = node

    def insert_last(self, data):
        node = Node(element=data, next_node=self.tail, prev_node=self.tail.prev_node)
        self.tail.prev_node.next_node = node
        self.tail.prev_node = node

    def insert(self, data, position):
        if position == 0:
            self.insert_front(data)
        elif position == self.size():
            self.insert_last(data)
        else:
            if 0 < position < self.size():
                current_node = self.head.next_node
                count = 0
                while count < (position - 1):
                    current_node = current_node.next_node
                    count += 1

                node = Node(element=data, next_node=current_node.next_node, prev_node=current_node)
                current_node.next_node.prev_node = node
                current_node.next_node = node
            else:
                raise IndexError

    def remove_first(self):
        self.head = self.head.next_node
        self.head.prev_node = None

    def remove_last(self):
        self.tail = self.tail.prev_node
        self.tail.next_node = None

    def remove(self, position):
        if position == 0:
            self.remove_first()
        elif position == self.size():
            self.remove_last()
        else:
            if 0 < position < self.size():
                current_node = self.head.next_node
                current_pos = 0

                while current_pos < position:
                    current_node = current_node.next_node
                    current_pos += 1

                next_node = current_node.next_node
                prev_node = current_node.prev_node

                next_node.prev_node = prev_node
                prev_node.next_node = next_node
            else:
                raise IndexError

    def fetch(self, position):
        if 0 <= position < self.size():
            current_node = self.head.next_node
            current_pos = 0

            while current_pos < position:
                current_node = current_node.next_node
                current_pos += 1

            return current_node.element
        else:
            raise IndexError

__init__

This doubly linked list implementation makes use of a head and a tail node. We traversing the list, the code will look for either head or tail as a means to detect if we are at the end of the list.

def __init__(self):
    self.head = Node(element='Head')
    self.tail = Node(element='Tail')

    self.head.next_node = self.tail
    self.tail.prev_node = self.head

We pass the strings “Head” and “Tail” as the element data for each of these nodes for debugging purpose. Since Node implements __str__ and __repr__, we will see Heads and Tail in the debugger (at least when using PyCharm).

The next two lines do the work of pointing head and tail at each other. When the list is created, head.next_node is tail. Conversly, tail.prev_node is head. This indicates an empty list.

size

This code inefficient on purpose. A better implementation would use a length varaible and increment or decrment it as Nodes are added and removed. However, in this case, I wanted to show how to traverse a list without the clutter of adding or removing Nodes.

def size(self):
    count = 0
    current = self.head.next_node

    while current is not None and current != self.tail:
        count += 1
        current = current.next_node

    return count

We start by making a count variable and a current variable that points at self.head.next_node. It’s really important that we use self.head.next_node instead of self.head because self.head’s purpose is to mark the beginning of the list, not contain data. If we failed to make this distinction, the size of the list will be off by one.

Now, we are going to traverse the list increment count by one an each iteration of the loop. We should check for None for defensive programming purposes, but the critical part of the loop condition is if current != self.tail. The self.tail Node marks the end of the list and so we do not wish to include it in the size of our list.

insert_front

This method inserts a new Node at the front of the Linked List.

def insert_front(self, data):
    node = Node(element=data, next_node=self.head.next_node, prev_node=self.head)
    self.head.next_node.prev_node = node
    self.head.next_node = node

Nodes inserted at the front of the list need their prev_node to point at self.head and their next_node to point at the Node located at self.head.next_node.

To complete the insertion, we now need to update self.head.next_node.prev_node to point back at the new Node that we created (otherwise, it would still point at self.head). Likewise, we need to update self.head.next_node to point at our new Node.

insert_last

The code for this is almost identical to insert_head with the main difference being that we are working on the tail Node rather than the head Node.

def insert_last(self, data):
    node = Node(element=data, next_node=self.tail, prev_node=self.tail.prev_node)
    self.tail.prev_node.next_node = node
    self.tail.prev_node = node

Once again, we create a new Node. It’s next_node has to point at self.tail and it’s prev_node needs to point at self.tail.prev_node.

To complete the insertion, we need to update self.tail.prev_node.next_node to point at our new Node (otherwise it will continue to point at self.tail). We also need to point self.tail.prev_node at our new Node.

insert

This method handles the insertion of a Node into the middle of the list. Keep in mind that it has to handle four possible cases.

  • Position could be 0 (front of list)
  • Position could be equal to size() (end of list)
  • Position is in the middle of the list
  • Position is size() (out of bounds)
def insert(self, data, position):
    if position == 0:
        # First case, we insert at front
        self.insert_front(data)
    elif position == self.size():
        # Second case, we insert at end
        self.insert_last(data)
    else:
        if 0 < position < self.size():
            # Third case, insert in middle
            current_node = self.head.next_node
            count = 0
            while count < (position - 1):
                current_node = current_node.next_node
                count += 1

            node = Node(element=data, next_node=current_node.next_node, prev_node=current_node)
            current_node.next_node.prev_node = node
            current_node.next_node = node
        else:
            # Fourth case, index out of bounds
            raise IndexError

The comments point out how this code handles each of the possible cases. Let's focus on the insertion. We begin by creating a count variable and current_node variable. Now, we need to traverse the list until we arrive at the node right before the desired position. Each interation of the loop requires us to point current_node at current_node.next_node.

Once we arrive at our destination, we create a new Node. We will point it's next_node at current_node.next_node and its prev_node at current_node. Doing the insertion at this point in the list causes the new node to appear at the position index specified in the method parameters.

To complete the insertion, we point current_node.next_node.prev_node at the new Node (otherwise it would still point at current_node). Likewise, we point current_node.next_node at our new Node.

Again, I should mention that this code is inefficient again. Normally we would make use of the bidirectional capabilities of the list and decide if we want to traverse the list in forward (like we do here) or in reverse. For example, if we are trying insert into a list at size() – 2, it doesn't make sense to start at the head node and traverse forward to find our insertion point when we could start at tail and move backwards.

remove_first

It’s really simple to remove nodes from the head of a doubly linked list.

def remove_first(self):
     self.head = self.head.next_node
     self.head.prev_node = None

You will notice that all we need to do is point self.head at self.head.next_node. Then we just set self.head.prev_node to None so that it doesn’t continue to point at the old self.head. The old Node will get garbage collected by the Python runtime.

remove_last

Removing the tail from the list is just as easy as removing the head.

def remove_last(self):
    self.tail = self.tail.prev_node
    self.tail.next_node = None

In this case, we just point self.tail at self.tail.prev_node. To remove the old Node, we next poitn self.tail.next_node to None. The old Node will get garbage collected by the Python runtime.

remove

Remove needs to handle the same cases that insert has to handle. Otherwise it works by traversing the list of the position to remove and deletes the Node.

def remove(self, position):
    if position == 0:
        # First case, remove at front
        self.remove_first()
    elif position == self.size():
        # Second case, remove from end
        self.remove_last()
    else:
        if 0 < position < self.size():
            # 3rd case, remove at middle
            current_node = self.head.next_node
            current_pos = 0

            while current_pos < position:
                current_node = current_node.next_node
                current_pos += 1

            next_node = current_node.next_node
            prev_node = current_node.prev_node

            next_node.prev_node = prev_node
            prev_node.next_node = next_node
        else:
            # 4th case, invalid position
            raise IndexError

Again, we will focus on the removal part of this code. We begin by traversing the list to find our desired position. This time, we stop at the actual position rather than the position before it. Once we arrive at our desired position, we store current_node.next_node and current_node.prev_node into their respective variables. To remove current_node, we simply need to point next_node.prev_node at prev_node and likewise point prev_node.next_node to next_node. Since there are no longer any references to current_node, it is garbage collected by the Python runtime.

fetch

Our final method is the fetch method, which let’s use retreive items from the Linked List.

def fetch(self, position):
    if 0 <= position < self.size():
        current_node = self.head.next_node
        current_pos = 0

        while current_pos < position:
             current_node = current_node.next_node
             current_pos += 1

        return current_node.element
    else:
        raise IndexError

By now readers should be familiar with how the list is traversed. We simply traverse the list to the desired position and return the Node.element to the caller.

Conclusion

Doubly linked list are a more advance form of linked list that supports bi-directional traversals. Bi-directional navigation is possible because each Node in the linked list stores a refrence to both the next and previous Node in the list.

Our linked list implementation only utilizes forward direction traversal to help keep the implementation simple. However, it would only require a small amount of work to make use of reverse traversals.

Recursive Binary Search—Python

It’s possible to implement a binary search using recursion. Just like the loop version, we slice our sorted collection into halfs and check one half of the collection and discard the other half.

Here is the code

def binary_search(search_list, data, low, high):
    # This is the base case that
    # terminates the recursion
    if low <= high:
        # Find the mid point
        mid = int(low + (high - low) / 2)

        if search_list[mid] == data:
            # We found our item. Return the index
            return mid
        elif search_list[mid] < data:
            # Search the top half of search_list
            return binary_search(search_list, data, mid + 1, high)
        else:
            # Search the bottom half of the search_list
            return binary_search(search_list, data, low, mid - 1)
    return None


if __name__ == '__main__':
    names = ['Bob Belcher',
             'Linda Belcher',
             'Tina Belcher',
             'Gene Belcher',
             'Louise Belcher']

    print('Sorting names...')
    names.sort()

    linda_index = binary_search(names, 'Linda Belcher', 0, len(names) - 1)
    if linda_index:
        print('Linda Belcher found at ', str(linda_index))
    else:
        print('Linda Belcher was not found')

    teddy_index = binary_search(names, 'Teddy', 0, len(names) - 1)
    if teddy_index:
        print('Teddy was found at ', str(teddy_index))
    else:
        print('Teddy was not found')

When run, we get the following output

Sorting names...
Linda Belcher found at  2
Teddy was not found

Binary Search—Python

Binary searches work by splitting a collection in half and only searching on half of the collection while discarding the other half. This algorithm requires the collection to be sorted first.

Here’s the code

def binary_search(search_list, data):
    low = 0
    high = len(search_list) - 1

    while low <= high:
        mid = int(low + (high - low) / 2)

        if search_list[mid] == data:
            # Return the index because we
            # found our item
            return mid

        elif search_list[mid] < data:
            # Continue search at the top
            # half of search list
            low = mid + 1
        else:
            # Continue search at the
            # bottom of search list
            high = mid - 1

    # The item wasn't found
    return None


if __name__ == '__main__':
    names = ['Bob Belcher',
             'Linda Belcher',
             'Tina Belcher',
             'Gene Belcher',
             'Louise Belcher']

    print('Sorting names...')
    names.sort()

    linda_index = binary_search(names, 'Linda Belcher')
    if linda_index:
        print('Linda Belcher found at ', str(linda_index))
    else:
        print('Linda Belcher was not found')

    teddy_index = binary_search(names, 'Teddy')
    if teddy_index:
        print('Teddy was found at ', str(teddy_index))
    else:
        print('Teddy was not found')

When run, we will get the following output.

Sorting names...
Linda Belcher found at  2
Teddy was not found

Ordered Linear Search—Python

Ordered Linear Searche’s are a varient of Unordered Linear Search. The main difference is that this algorithm expects the search list to be sorted prior to searching. For this reason, the Ordered Linear Search can be a little more efficient than it’s Unordered varient.

Here’s the code

from random import shuffle


def ordered_linear_search(search_list, data):
    for i in range(len(search_list) - 1):
        if search_list[i] == data:
            # Return the index of the item
            return i
        elif search_list[i] > data:
            # The idea here is to fail at this
            # point because search_list isn't sorted
            return None
    # We didn't find our item, so return None
    return None


if __name__ == '__main__':
    names = ['Bob Belcher',
             'Linda Belcher',
             'Tina Belcher',
             'Gene Belcher',
             'Louise Belcher']

    print('Shuffling names')
    shuffle(names)

    linda_index = ordered_linear_search(names, 'Linda Belcher')
    if linda_index:
        print('Linda Belcher found at ', str(linda_index))
    else:
        print('Linda Belcher was not found')

    teddy_index = ordered_linear_search(names, 'Teddy')
    if teddy_index:
        print('Teddy was found at ', str(teddy_index))
    else:
        print('Teddy was not found')

    print('Now sorting...')
    names.sort()

    linda_index = ordered_linear_search(names, 'Linda Belcher')
    if linda_index:
        print('Linda Belcher found at ', str(linda_index))
    else:
        print('Linda Belcher was not found')

When run, you will notice that first we do not find the name ‘Linda Belcher’ because the list is shuffled. After sorting, the search will identifiy ‘Linda Belcher’ at index 2.

Shuffling names
Linda Belcher was not found
Teddy was not found
Now sorting...
Linda Belcher found at  2

Unordered Linear Search—Python

There are many cases in computer programming we need to search for an element in a collection of data. The Unordered Linear Search is one of the simplest searching algorithms.

def unordered_linear_search(search_list, data):
    for i in range(len(search_list) - 1):
        if search_list[i] == data:
            # We have found the item at index i
            return i
        
    # If we got here, we didn't find the item
    return None

if __name__ == '__main__':
    names = ['Bob Belcher',
             'Linda Belcher',
             'Tina Belcher',
             'Gene Belcher',
             'Louise Belcher']

    linda_index = unordered_linear_search(names, 'Linda Belcher')
    if linda_index:
        print('Linda Belcher found at ', str(linda_index))
    else:
        print('Linda Belcher was not found')

    teddy_index = unordered_linear_search(names, 'Teddy')
    if teddy_index:
        print('Teddy was found at ', str(teddy_index))
    else:
        print('Teddy was not found')

In this code, we are basically looping through the search_list object until we found a match. If we found the match, we return the index of the item’s location. Otherwise, we return Python’s None object to indicate the item wasn’t found. When run, the program will produce this output.

Linda Belcher found at  1
Teddy was not found

Insertion Sort—Python

Insertion sorts are a common sorting technique that many computer science students are asked to learn. Here is a Python implementation of an Insertion Sort.

def insertion_sort(items):
    for i in range(2, len(items) - 1):
        item = items[i]
        j = i
        while (items[j - 1] > item) and (j >= 1):
            items[j] = items[j - 1]
            j -= 1
        items[j] = item

if __name__ == '__main__':
    items = ['Bob Belcher', 'Linda Belcher', 'Tina Belcher', 'Gene Belcher', 'Louise Belcher']
    print(items, '\n')

    print('Sorting...')
    insertion_sort(items)
    print(items)

Insertion sorts use nested loops and test if an item at the end of the list is less than an item in the front of the list. If they are, the code does a swap.

When run, we get this output:

['Bob Belcher', 'Linda Belcher', 'Tina Belcher', 'Gene Belcher', 'Louise Belcher'] 

Sorting...
['Bob Belcher', 'Gene Belcher', 'Linda Belcher', 'Tina Belcher', 'Louise Belcher']