Walk a Filetree in Python

Python has a powerful os.walk function that let’s a script walk through a file system in an efficient fashion. In this example, taken from Programming Python: Powerful Object-Oriented Programming, we will walk a file tree that will remove any p-code files that are present in the file tree.


Here is the code, with my comments added.

import os, sys

# Do we only want to find files only?
findonly = False

# Either use the CWD or a directly specified by command line arguments
rootdir = os.getcwd() if len(sys.argv) == 1 else sys.argv[1]

# Keep track of the found and removed files
found = removed = 0

# Walk through the file tree
for (thisDirLevel, subsHere, filesHere) in os.walk(rootdir):

    # Go through each file in the directory
    for filename in filesHere:

        # Check if it ends with .pyc
        if filename.endswith('.pyc'):

            # Assemble the full file name
            fullname = os.path.join(thisDirLevel, filename)
            print('=>', fullname)

            # Attempt to remove the file if asked to do so
            if not findonly:
                    # Attempt to delete the file

                    # Increment the removed count
                    removed += 1
                    # Handle the error
                    type, inst = sys.exc_info()[:2]

                    # Report that this file can't be removed
                    print('*'*4, 'Failed:', filename, type, inst)
            found += 1

# Output the total number of files removed
print('Found', found, 'files removed:', removed)

Detailed Explanation

This script functions in a findonly or remove mode. So the first variable we create on line 4 is a flag that decides if we are only looking for p-code files or if we are finding and removing such files. Next we create a rootdir varaible that is either the current working directory or a directory supplied by a command line argument. We create two variables on line 10, found and removed, which track how many files we have found and removed.

We get into the meat of the program on line 13 when we enter into a loop that iterates over os.walk. The os.walk function takes a directory path to start at and then goes through every single subdirectory in that file tree. It’s the standard way to walk a file tree in python. The function returns a tuple that includes the directory the os.walk function is currently examining, the number of subdirectories, and the number of files.

We create a nested loop on line 16 so that we can look at each file in the directory individually. On line 19, we check if the file ends with the .pyc extension. If it does, we use os.path.join to assemble a full file path in a platform agnostic fashion and then print out the full file path to the console.

If we are deleting files, we use os.remove on line 29 to attempt to delete a file. It’s critical that we wrap this in a try block because we may not hvae permission to delete the file. If deleting the file is successful, we increment the removed count. If it fails, the program execution will jump to line 35 and we report the error. The loop ends on line 39 and then repeats.

When the program is finished, we report how many files we found and removed.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s