Python _thread Basic

Python 3 has the newer thread package, but the _thread package still exists for developers who are more comfortable with the 2.x API. This is a basic example derived from Programming Python: Powerful Object-Oriented Programming that demonstrates how to create threads using the _thread module.

Code

import _thread as thread, time


# This function will run in a new thread
def counter(tid, count):
    for i in range(count):
        # Simulate a delay
        time.sleep(1)
        # Print out the thread id (tid) and the current iteration
        # of our for loop
        print('[{}] => {}'.format(tid, i))


if __name__ == '__main__':
    # Enter a loop that creates 5 threads
    for i in range(5):
        # Start a new thread passing a callable and it's arguments
        # in the form of a tuple
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting')

Explanation

This script creates five new threads using _thread.start_new_thread. Each time a new thread is created, the counter function is called and is passed a tuple of (i, 5). That tuple corresponds to the tid and count parameters of the counter function. Counter enters a loop that runs 5 times since 5 was passed to the second parameter of counter on line 19. It will print the thread id and current iteration of the loop.

Meanwhile, the for loop in the parent thread continues to iterate because thread.start_new_thread does not block the for loop in the main thread. By calling start_new_thread, the program’s execution runs both the for loop in the main thread and the counter function in parallel. Allowing programs to run multiple portions of code at the same time is what gives threads their power. For example, you may wish to use a thread to handle a long running database query while the user continues to interact with the program in the main thread.

One final note about threads in Python. Threads give the appearance of allowing programs to multitask and for all intents and purposes, that is what is happening in the program. Nevertheless, what is really happening is that the Python Virtual Machine is time slicing the computer instructions and allowing a few lines of code to run before switching to another set of instructions.

In other words, if a program has three threads, A, B, and C, then Thread A runs for a few moments, then Thread B, and finally Thread C. Note that there is no guarantee to the order in which threads run. It is possible that one thread may run more often than other threads or that the order of running threads is different each time.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Advertisement

Use Python to Launch Another Program

Many Python programs operate as wrappers to other programs. For example, there are a wide variety of Python programs that work as a GUI wrapper for CLI programs. This tutorial uses an example from Programming Python: Powerful Object-Oriented Programming to demonstrate how a Python program can be used to launch another program.

Code

Parent Process

This code represents a parent process that is capable of launching a child process. I added my own comments to help further explain the code.

import os

# Track the number of spawned children
spawned_children = 0

# Enter an infinite loop
while True:
    
    # Increment the number of children
    spawned_children += 1
    
    # Now launch a new process
    # If we are still in the parent process, pid is non-zero
    # otherwise, pid is zero and indicates that we are in the
    # child process
    pid = os.fork()
    
    # Test if we are in the child process
    if pid == 0:
        
        # Now, since we are in the new process, we use
        # os.execlp to launch a child script.
        os.execlp('python', 'python', 'child.py', str(spawned_children))
        assert False, 'error starting program'
    else:
        # If we are here, then we are still in the parent process
        print('Child is', pid)
        if input() == 'q': break

Child Process

This is a simple script that simply represents a sample child process.

import os, sys

# Parent passes the child process number
# as a command line argument
print('Hello from child', os.getpid(), sys.argv[1])

Explanation

We have two core concepts used in this program. The first concept is forking the parent process into a child process. The other concept is actually launching a new program.

Forking

Forking is the process of launching a copy of the running program. The program that launched a copy of itself is called the parent process, while the copy of the program is called the child process. In theory, a parent and its children can spawn as many children as the underlying operating system allows.

The example program spawns a child process on line 16 and assigns the result to the variable called pid. When os.fork() returns, pid is zero if we are running in the child process or non-zero if we are still in the parent. The program uses an if – else statement (lines 19 and 25 respectively) to perform different operations depending if we are in the parent or child process.

The parent process executes lines 27 and 28. It simply prints out the child’s process id and then asks the user if they wish to continue the program. The user can enter ‘q’ to exit the parent process.

The child process executes code on lines 23 and 24. Line 23 uses os.execlp to start the companion child script and passes the number of spawned children as a command line argument. The child script prints its pid and the number of child processes and then exits.

Launching Programs

The Python os module provides a variety of ways to launch other programs via a Python script. This example makes use of os.execlp which looks for an executable on the operating system’s path. The first argument is the name of the program followed by any number of command line arguments.

The os.execlp is not supposed to return to the caller because if successful, the program is replaced by the new program. This is why it’s necessary to call os.fork() prior to using os.execlp (unless you wish your program to simply end with the new process). When the example script calls os.execlp, the child process created by os.fork is replaced by the new program launched by os.execlp.

Line 24 handles the case when os.execlp fails. Should os.execlp fail, control is returned back to the caller. In most cases, a develop should notify the user that launching the new program has failed. The example script accomplishes this by using an assert False statement with an error message.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Basic Forking

Many programs need to execute tasks simultaneously and Python provides us with a few different mechanisms for concurrent programming. One of those mechanisms is called forking, where a call is made to the underlying operating system to create a working copy of a program that’s already running. The program that created the new process is called the parent process, while the processes that are created by the parent are called the child process.

This post shows the most basic form of creating processes in Python and helps serve as a foundation to understanding forking. The example is derived from Programming Python: Powerful Object-Oriented Programming, and I added my own comments to help better explain the program.

import os


# This is a function called by the child process
def child():
    # Use os.getpid() to get the pid for this process
    print('Hello from child', os.getpid())

    # force the child process to exit right away
    # or the child process will return to the infinite loop in parent()
    os._exit(0)


def parent():
    while True:
        # Attempt to fork this program into a new process. When forking is complete
        # newpid will be non-zero for the original process, but it wil be
        # 0 in the child process
        newpid = os.fork()

        # The program now goes in two different directions at the same time
        # When newpid is 0, we call child() and the child process exits
        if newpid == 0:  # Test if this is a child process
            child()
        else:
            # If are here, then we are still in the parent process
            # We print the pid of the parent and the child process (newpid)
            print('Hello from parent', os.getpid(), newpid)
        if input() == 'q':
            break


if __name__ == '__main__':
    parent()

When run on my machine, the program shows the following output

Hello from parent 87800 87802
Hello from child 87802
k
Hello from parent 87800 87803
Hello from child 87803
k
Hello from parent 87800 87804
Hello from child 87804
k
Hello from parent 87800 87805
Hello from child 87805
q

Explanation

The hardest part to grasp about this program is that when os.fork() is called on line 19, the program actually launches a copy of itself. The operating system creates the new process and that new process gets a copy of all variables in memory and execution of the new and old programs continue after line 19. (Note: The OS may not exactly copy the parent process, but functionally speaking, the child process can be considered to be a copy of the parent).

The os.fork() function returns a number called a PID (process ID). We can test the pid to see if we are running in the parent or child process. When we are in the child process, the value returned by os.fork() is zero. So on line 23, we test for 0 and if newpid is zero, we call the child() function.

The alternative case is that we are still running in the parent process (bearing in mind, that the child process is also running at this point in time as well). If we are still in the parent process os.fork() returns a non-zero value. In that case, we use the else block to print the parent and child PID.

The parent process continues to loop until the user enters q to quit. Each time the loop iterates, a new child process is created by the parent. The parent prints its own PID (using os.getpid()) and the pid of the child on line 28.

The child process also uses os.getpid() to get its own PID. It prints its own PID on line 7 and then on line 11, we use os._exit(0) to force the child process to shut down. This is a critical step for this program! If we were to omit the call to os._exit on line 11, the child process would return to the parent function and enter the same infinite loop the parent is using.

Conclusion

This is the most basic example of creating child processes using Python. Keep in mind that processes do not share memory (unlike threads). In real world programs, processes often need to sycnchronize data from one process to another process using tools such as network sockets, databases, or files. When a child process is spawned it gets a copy of the memory of the parent process, but then functions as an independent program.

Source

Lutz, Mark. Programming Python. Beijing, OReilly, 2013

Python os.walk

It’s very typical for a program to have to walk a file tree. In Recursion Example — Walking a file tree, I demonstrated how to use recursion to traverse a file system. Although it’s totally possible to walk through a file system in that fashion, it’s less than ideal because Python provides os.walk for this purpose.

The following script is a modified example borrowed from Programming Python: Powerful Object-Oriented Programming that demonstrates how to traverse a file system using os.walk.

import os
import sys


def lister(root):
    # os.walk returns a tuple with the current_folder, a list of sub_folders,
    # and a list of files in the current_folder
    for (current_folder, sub_folders, files) in os.walk(root):
        print('[' + current_folder + ']')
        for sub_folder in sub_folders:

            # Unix uses / as path separators, while Windows uses \
            # If we use os.path.join, we don't need to worry about which
            # path separator to use since os.path.join tracks that for us.
            path = os.path.join(current_folder, sub_folder)
            print('\t' + path)

        for file in files:
            path = os.path.join(current_folder, file)
            print('\t' + path)


if __name__ == '__main__':
    lister(sys.argv[1])

When run, this code prints out all of the files and directories starting at the specified root folder.

Explanation

os.walk

The os.walk function does the work of traversing a file system. The function generates a tuple with three fields. The first field is the current directory that os.walk is processing. The second field is a list of sub folders found in the current folder and the last field is a list of files found in the current folder.

Combining os.walk with a for loop is a very common technique (shown on line 8). The loop continues to iterate until os.walk finishes walking through the file system. The tuple declared in the for loop is updated on each iteration of the loop, providing developers with all of the information needed to process the contents of the directory.

os.path.join

Line 15 shows an example of using os.path.join to assemble a full path to a target folder or file. It’s import to use os.path.join to assemble file paths because Unix-like system use ‘/’ to separate file paths, while Windows systems use ‘\’. Tracking the path separator could be tedious work since it requires making a determination about which operating system is running the script. That’s not very ideal so Python provides os.path.join to take care of such work. As long as os.path.join is used, the assembled file paths will use the proper path separator for the os.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Line Scanner

This post borrows from a code example found in Programming Python: Powerful Object-Oriented Programming that demonstrates collecting command line arguments, opening a file, reading the file, and passing a function as a callback to another function.

Code

Here is the entire script that accepts a file as a command line argument and prints the contents of the file to the console.


def scanner(name, func):

    # Open the file (with statement ensures closure even if there is an exception)
    with open(name, 'r') as f:
        # Iterate through the file
        for line in f:
            # Call our callback function
            func(line)

if __name__ == '__main__':
    import sys
    name = sys.argv[1]

    # This is a function we are passing to scanner
    # Python has first class functions which can be
    # get passed as arguments to other functions
    def print_line(str):
        print(str, end='')

    # Call the scanner function, which in turn
    # calls the print_line function for each line
    # in the file
    scanner(name, print_line)

Command Line Arguments

The first concept covered in this script is processing command line arguments. Python requires us to import the sys module (line 12) which maintains an argv property. The argv property is a list-like object that contains all of the command line arguments used to hold all of the command line parameters. The first index [0] is the name of the script, followed by all of the other arguments supplied to the program.

On line 13, we grab the target file (stored in argv[1]) and keep it in a name variable. At this point, our program knows which file to the open later on when we use the scanner function.

First Class Functions

Python treats functions as objects. As such, we can define any function in a Python program and store it in a variable just like anything else. Lines 18-19 define a print_line function that accepts a String parameter. On line 24, print_line is the second argument to the scanner function.

Once inside of the scanner function, the print_line function is referenced by the variable func. On line 9, we call print_line with the func(line) rather than print_line(line). This works because func and print_line both refer to the same function object in memory. Passing functions in this fashion is incredibly powerful because it allows the scanner function to accept different behaviors for each line it processes.

For example, we could define a function the writes each line processed by scanner to a file rather than printing it to the console. Later on, we may choose to write another function that sends each line over the network via network sockets. The beauty of the scanner function as defined is that it works the same regardless of the callback function passed to the func argument. This programming technique is sometimes known as programming to a behavior.

Opening and Reading Files

The final topic covered is opening and reading a file. Line 5 in the script uses the with statement combined with the open function to actually open the file in read mode. The as f assigns the result of the open function to the variable f. The f variable holds a Python file object.

Since Python file objects support the iterator protocol, they can be used in for loops. On line 7, we read through each line in the file with the statement for line in f:. On each execution of the loop, the line variable is updated with the next line in the file.

When the loop is complete, the with statement calls the file’s close() method automatically, even if there is an exception. Of course, Python’s garabage collection will also ensure a file is closed, but this pattern provides an extra level of safety, especially since there are a variety of Python interpretors that may act differently than the CPython.

Conclusion

The most powerful take away from this example if the first class functions. Python treats functions like any other data type. This allows functions to be stored as passed around the program as required. Using first class functions keeps code loosely coupled and highly maintanable!

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Pipe Operations

Python programs can link their input and output streams together using pipe (‘|’) operations. Piping isn’t a feature of Python, but rather comes from operating system. Nevertheless, a Python program can leverage such a feature to create powerful operating system scripts that are highly portable across platforms. For example, developer toolchains can be scripted together using Python. I personally have used Python to feed input into unit testing for my Java/Kotlin programs.

This post is a modified example of a demonstration found in Programming Python: Powerful Object-Oriented Programming. It uses a producer and consumer script to demonstrate one program feeding input into another Python program.

writer.py

Here is the code for writer.py

family = [
    'Bob Belcher',
    'Linda Belcher',
    'Tina Belcher',
    'Gene Belcher',
    'Louise Belcher'
]
for f in family:
    print(f)

Nothing special here. We are just building up a list that prints out the names of our favorite TV family, the Belchers.

reader.py

This code receives the output from writer.

while True:
    try:
        print('Entering {}'.format(input()))
    except EOFError:
        break

Once again, this is a simple script. Without a pipe operation, the input() statement on line 3 would normally collect the input from the keyboard. That’s not what is going to happen here.

Demonstration

We are going to execute these scripts by running the command below in the terminal.

python writer.py | python reader.py

The pipe '|' character does the job of connecting writer.py's output stream to reader.py's input stream. Thus, print statements in writer.py connect to input statements in reader.py. Here is the output of this operation.

Entering Bob Belcher
Entering Linda Belcher
Entering Tina Belcher
Entering Gene Belcher
Entering Louise Belcher

Python Console Streams

Programs on all major operating systems are connected to input and output streams (and usually an error stream). When run from the commandline without a redirect operator, a program is normally connnected to the shells standard input stream (where a user can type commands into the program) and the standard output stream (which prints output back on the console).

However, we aren’t limited to such streams. It’s perfectly possible to use the contents of a file as a program’s input stream or even use the output of one program and link it’s output to another program’s input stream. Such chaining of streams allows for powerful OS scripting.

This isn’t really a Python feature, but here is an example found in Programming Python: Powerful Object-Oriented Programming that demonstraes connecting the output of one OS stream to the input stream of a Python program. I added some comments that help explain the program.

teststreams.py

def interact():
    print('Hello stream world')
    while True:
        try:
            # Input normally reads from the keyboard because our program
            # is connected to that input stream. However, if we execute this program
            # in a way that connects the program's input to some other stream,
            # the input command reads from there instead!
            reply = input('Enter a number => ')
        except EOFError:
            # We have reached the end of our input stream (for example user entered ctrl+c at the shell)
            # So we exit the looop
            break
        else:
            num = int(reply)
            print("%d squared is %d" % (num, num ** 2))
    print('Bye')

if __name__ == '__main__':
    interact()

When this program is run on it’s own, it will collect input from the keyboard until we press ctrl+c. That’s not the part that we are demonstrating here. Let’s suppose we have a text file that has the following contents.

input.txt

1
2
3
4
5
6
7
8
9
10

Now when we run our program with a redirect operator, we get the following output.

Patricks-MacBook-Pro:Streams stonesoup$ python teststreams.py  1 squared is 1
Enter a number => 2 squared is 4
Enter a number => 3 squared is 9
Enter a number => 4 squared is 16
Enter a number => 5 squared is 25
Enter a number => 6 squared is 36
Enter a number => 7 squared is 49
Enter a number => 8 squared is 64
Enter a number => 9 squared is 81
Enter a number => 10 squared is 100
Enter a number => Bye

Notice the unix redirect operator. This program was run python teststreams.py < input.txt. That < input.txt connects the contents of input.txt to the teststreams.py script. Thus, when input is called, the function simply collects the next line in input.txt rather than waiting for the keyboard.

Python Environmental Variables

Every operating system allows system adminstrators and users to set and use environmental variables. Such variables can be highly useful when writing shell scripts that automate commands and executes tasks on the system. It’s highly feasible to imagine situations where build toolchains may need to know about the location of programs such as compilers, linkers, etc.

Python’s os module has an environ object. It acts like a dictionary and allows scripts to read and write to enviornmental variables. Here are two example scripts found in Programming Python: Powerful Object-Oriented Programming that demonstrate using environmental varaibles in Python. I added comments to help explain the programs.

setenv.py

import os

# First print the value associated with the USER
# environmental variable
print('setenv...', end=' ')
print(os.environ['USER'])

# Now overwrite the value stored in USER. It will
# propagate through the system
os.environ['USER'] = 'Brian'

# Run the echoenv script to prove the environmental
# variable has been updated
os.system('python echoenv.py')

# Repeat
os.environ['USER'] = 'Arthur'
os.system('python echoenv.py')

# This time, we ask the user for a name
os.environ['USER'] = input('?')

# Now we are going to run the echoevn script, but connect
# it's output to this process and print
print(os.popen('python echoenv.py').read())

echoenv.py

import os
print('echoenv...')

# Just print the value for USER found in os.environ
print('Hello, ', os.environ['USER'])

Detailed Explanation

Although the main purpose of this script is to demonstrate how to use environmental variables in Python. However, we are going to use the os.system and os.popopen functions to execute our helper echoenv script to help prove that changes to the environmental variables propogate throughout the system.

Line 6 is the first call to os.environ. We look up the USER value in this environ object which prints out the current user to the console. On line 10, we overwrite the value in USER with “Brian”. Now Brian is the user. Line 14 proves the change by calling os.system and executing echoenv.py. That script will print “Brain” in the console.

Lines 17 and 18 are a repeat of lines 10 and 14. Line 21 is only different in the sense that we ask the user for a value this time. Line 25 uses the os.popopen command to execute echoenv.py. This function returns an object that gives us a handle into the sister process. Rather than having echoenv print to the console directly this time, we use the read() method to print the output in the setenv process.

Example Output

Here is the output when run on my machine.

Patricks-MacBook-Pro:Environment stonesoup$ python setenv.py
setenv... stonesoup
echoenv...
('Hello, ', 'Brian')
echoenv...
('Hello, ', 'Arthur')
?Bob Belcher
echoenv...
('Hello, ', 'Bob Belcher')

Parse Command Line Arguments in Python

Most command line programs allow users to pass command line arguments into a program. Consider, for example, the unix ls:

ls -a

Python programs can utilize such techniques by using sys.argv and performing string operations on each element in the argv list. Here is an example script from Programming Python: Powerful Object-Oriented Programming with my own comments added to the script. (Of course, the Python standard library provides much more sophisticated getopt and optparse modules which should be used in production programs)

def getopts(argv):
    # Create a dictionary to store command line options
    opts = {}

    # Iterate through each command line argument
    while argv:
        # Look for the dash symbol
        if argv[0][0] == '-':

            # Create an entry in the opts dictionary
            opts[argv[0]] = argv[1]

            # Slice off the last entry and continue
            argv = argv[2:]
        else:

            # Otherwise just slice of the last entry since there
            # was no command line switch
            argv = argv[1:]

    # Return the dictionary of the command line options
    return opts

# Only run this code if this is a standalone script
if __name__ == '__main__':
    # Put the argv option into our namespace
    from sys import argv

    # call our getopts function
    myargs = getopts(argv)

    # Print our output
    if '-i' in myargs:
        print(myargs['-i'])

    print(myargs)

Detailed Explanation

Our goal in this script is to look for any command line switches that are passed to the script. To start with, we need to import the argv object from the sys module which is a list that provides the command line options passed to the script. On line 30, we call our getopts function and pass the argv object to the function. The program’s execution jumps up to line 3.

Line 3 creates an empty dictionary object that will store all command line switches with their arguments. For example, we may end up with ‘-n’:”Bob Belcher” in this dictionary. Starting on line 6, we enter into a while loop that examines each entry in argv.

Line 8 looks at the first character in the first entry of argv (remember that Python String implement the [] operator). We are just checking if the character in question is a dash (‘-‘) character. If this character is a dash, we are going to create an entry into our opts dictionary on line 11. Then we slice off the next 2 entries in argv from the beginning of the list (one for the switch, the other one for the argument).

Line 15 is the alternative case. If argv[0][0] is something other than a dash, we are going to ignore it. We just slice off the first entry in argv (only one this time since there is no switch). Once argv is empty, we can return the opts dictionary to the caller.

The program execution resumes on line 33. For demonstration purposes, it checks if the myargs dictionary has a ‘-i’ key. If it does, it will print the value associated with ‘-i’. Then line 36 prints out the myarg dictionary.

When run from the console, the script will show output such as this.

Patricks-MacBook-Pro:system stonesoup$ python testargv2.py -h "Bob Belcher" -w "Linda Belcher" -d1 "Tina Belcher" -s "Gene Belcher" -d2 "Louise Belcher" 
{'-h': 'Bob Belcher', '-w': 'Linda Belcher', '-d1': 'Tina Belcher', '-s': 'Gene Belcher', '-d2': 'Louise Belcher'}
%d bloggers like this: