Truthfully, most users aren’t very interested in finding the largest and smallest Python source files in their home directory, but doing so does provide for an exercise in walking the file tree and using tools from the os module. The program in this post is a modified example taken from Programming Python: Powerful Object-Oriented Programming where the user’s home directory is scanned for all Python source files. The console outputs the two smallest files (in bytes) and the two largest files.
Code
import os import pprint from pathlib import Path trace = False # Get the user's home directory in a platform neutral fashion dirname = str(Path.home()) # Store the results of all python files found # in home directory allsizes = [] # Walk the file tree for (current_folder, sub_folders, files) in os.walk(dirname): if trace: print(current_folder) # Loop through all files in current_folder for filename in files: # Test if it's a python source file if filename.endswith('.py'): if trace: print('...', filename) # Assemble the full file python using os.path.join fullname = os.path.join(current_folder, filename) # Get the size of the file on disk fullsize = os.path.getsize(fullname) # Store the result allsizes.append((fullsize, fullname)) # Sort the files by size allsizes.sort() # Print the 2 smallest files pprint.pprint(allsizes[:2]) # Print the 2 largest files pprint.pprint(allsizes[-2:])
Sample Output
[(0, '/Users/stonesoup/.local/share/heroku/client/node_modules/node-gyp/gyp/pylib/gyp/generator/__init__.py'), (0, '/Users/stonesoup/.p2/pool/plugins/org.python.pydev.jython_5.4.0.201611281236/Lib/email/mime/__init__.py')] [(219552, '/Users/stonesoup/.p2/pool/plugins/org.python.pydev.jython_5.4.0.201611281236/Lib/decimal.py'), (349239, '/Users/stonesoup/Library/Caches/PyCharmCE2017.1/python_stubs/348993582/numpy/random/mtrand.py')]
Explanation
The program starts with a trace flag that’s set to false. When set to True, the program will print detailed information about what is happening in the program. On line 8, we grab the user’s home directory using Path.home(). This is a platform nuetral way of finding a user’s home directory. Notice that we do have to cast this value to a String for our purposes. Finally we create an empty allsizes list that holds our results.
Starting on line 15, we use the os.walk function and pass in the user’s home directory. It’s a common pattern to combine os.walk with a for loop so that we can traverse an entire directory tree. Each iteration os.walk returns a tuple that contains the current_folder, sub_folders, and files in the current folder. We are interested in the files.
Starting on line 20, the program enters a nested for each loop that examines each file individually. On line 23, we test if the file ends with ‘.py’ to see if it’s a Python source file. Should the test return True, we continue by using os.path.join to assemble the full path to the file. The os.path.join function takes into account the underlying operating system’s path separator, so on Unix like systems, we get / while Windows systems get \ as a path separator. The file’s size is computed on line 31 using os.path.getsize. Once we have the size and the file path, we can add the result to allsizes for later use.
The program has finished scanning the user’s home folder once the program reaches line 37. At this point, we can sort our results from smallest to largest by using the sort() method on allsizes. Line 40 prints the two smallest files (using pretty print for better formatting) and line 43 prints the two largest files.
References
Lutz, Mark. Programming Python. Beijing, OReilly, 2013.