Find Python Source Files in Home Directory

Truthfully, most users aren’t very interested in finding the largest and smallest Python source files in their home directory, but doing so does provide for an exercise in walking the file tree and using tools from the os module. The program in this post is a modified example taken from Programming Python: Powerful Object-Oriented Programming where the user’s home directory is scanned for all Python source files. The console outputs the two smallest files (in bytes) and the two largest files.

Code

import os
import pprint
from pathlib import Path

trace = False

# Get the user's home directory in a platform neutral fashion
dirname = str(Path.home())

# Store the results of all python files found
# in home directory
allsizes = []

# Walk the file tree
for (current_folder, sub_folders, files) in os.walk(dirname):
    if trace:
        print(current_folder)

    # Loop through all files in current_folder
    for filename in files:

        # Test if it's a python source file
        if filename.endswith('.py'):
            if trace:
                print('...', filename)

            # Assemble the full file python using os.path.join
            fullname = os.path.join(current_folder, filename)

            # Get the size of the file on disk
            fullsize = os.path.getsize(fullname)

            # Store the result
            allsizes.append((fullsize, fullname))

# Sort the files by size
allsizes.sort()

# Print the 2 smallest files
pprint.pprint(allsizes[:2])

# Print the 2 largest files
pprint.pprint(allsizes[-2:])

Sample Output

[(0,
  '/Users/stonesoup/.local/share/heroku/client/node_modules/node-gyp/gyp/pylib/gyp/generator/__init__.py'),
 (0,
  '/Users/stonesoup/.p2/pool/plugins/org.python.pydev.jython_5.4.0.201611281236/Lib/email/mime/__init__.py')]
[(219552,
  '/Users/stonesoup/.p2/pool/plugins/org.python.pydev.jython_5.4.0.201611281236/Lib/decimal.py'),
 (349239,
  '/Users/stonesoup/Library/Caches/PyCharmCE2017.1/python_stubs/348993582/numpy/random/mtrand.py')]

Explanation

The program starts with a trace flag that’s set to false. When set to True, the program will print detailed information about what is happening in the program. On line 8, we grab the user’s home directory using Path.home(). This is a platform nuetral way of finding a user’s home directory. Notice that we do have to cast this value to a String for our purposes. Finally we create an empty allsizes list that holds our results.

Starting on line 15, we use the os.walk function and pass in the user’s home directory. It’s a common pattern to combine os.walk with a for loop so that we can traverse an entire directory tree. Each iteration os.walk returns a tuple that contains the current_folder, sub_folders, and files in the current folder. We are interested in the files.

Starting on line 20, the program enters a nested for each loop that examines each file individually. On line 23, we test if the file ends with ‘.py’ to see if it’s a Python source file. Should the test return True, we continue by using os.path.join to assemble the full path to the file. The os.path.join function takes into account the underlying operating system’s path separator, so on Unix like systems, we get / while Windows systems get \ as a path separator. The file’s size is computed on line 31 using os.path.getsize. Once we have the size and the file path, we can add the result to allsizes for later use.

The program has finished scanning the user’s home folder once the program reaches line 37. At this point, we can sort our results from smallest to largest by using the sort() method on allsizes. Line 40 prints the two smallest files (using pretty print for better formatting) and line 43 prints the two largest files.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Multiprocessing Producer Consumer Pattern

Python3 has a multiprocessing module that provides an API that’s similar to the one found in the threading module. The main selling point behind multiprocessing over threading is that multiprocessing allows tasks to run in a truly concurrent fashion by spanning multiple CPU cores while threading is still limited by the global interpreter lock (GIL). The Process class found in multiprocessing works internally by spawning new processes and providing classes that allow for data sharing between processes.

Since multiprocessing uses processes rather than threads, child processes do not share their memory with the parent process. That means we have to rely on low-level objects such as pipes to allow the processes to communicate with each other. The multiprocessing module provides high level classes similar to the ones found in threading that allow for sharing data between processes. This example demonstrates the producer consumer pattern using processes and the Queue class sharing data.

Code

import time
import os
import random
from multiprocessing import Process, Queue, Lock


# Producer function that places data on the Queue
def producer(queue, lock, names):
    # Synchronize access to the console
    with lock:
        print('Starting producer => {}'.format(os.getpid()))
        
    # Place our names on the Queue
    for name in names:
        time.sleep(random.randint(0, 10))
        queue.put(name)

    # Synchronize access to the console
    with lock:
        print('Producer {} exiting...'.format(os.getpid()))


# The consumer function takes data off of the Queue
def consumer(queue, lock):
    # Synchronize access to the console
    with lock:
        print('Starting consumer => {}'.format(os.getpid()))
    
    # Run indefinitely
    while True:
        time.sleep(random.randint(0, 10))
        
        # If the queue is empty, queue.get() will block until the queue has data
        name = queue.get()

        # Synchronize access to the console
        with lock:
            print('{} got {}'.format(os.getpid(), name))


if __name__ == '__main__':
    
    # Some lists with our favorite characters
    names = [['Master Shake', 'Meatwad', 'Frylock', 'Carl'],
             ['Early', 'Rusty', 'Sheriff', 'Granny', 'Lil'],
             ['Rick', 'Morty', 'Jerry', 'Summer', 'Beth']]

    # Create the Queue object
    queue = Queue()
    
    # Create a lock object to synchronize resource access
    lock = Lock()

    producers = []
    consumers = []

    for n in names:
        # Create our producer processes by passing the producer function and it's arguments
        producers.append(Process(target=producer, args=(queue, lock, n)))

    # Create consumer processes
    for i in range(len(names) * 2):
        p = Process(target=consumer, args=(queue, lock))
        
        # This is critical! The consumer function has an infinite loop
        # Which means it will never exit unless we set daemon to true
        p.daemon = True
        consumers.append(p)

    # Start the producers and consumer
    # The Python VM will launch new independent processes for each Process object
    for p in producers:
        p.start()

    for c in consumers:
        c.start()

    # Like threading, we have a join() method that synchronizes our program
    for p in producers:
        p.join()

    print('Parent process exiting...')

Explanation

The program demonstrates the producer and consumer pattern. We have two functions that run in their own independent processes. The producer function places supplied names on the Queue. The consumer function monitors the Queue and removes names from it as they become available.

The producer function takes three objects: a Queue, a Lock, and a List of names. It start with acquiring a lock on the console. The console is still a shared resource so we need to make sure only one Process writes to the console at a time or they will write over the top of one another. After acquiring a lock on the console, the function prints out its process id (PID).

The producer function enters a for each loop on lines 14-16. It sleeps between 0-10 seconds on line 15 to simulate a delay in processing and then it places a name on the Queue on line 16. When the for each loop is complete, the function aquires another console lock and then notifies the user it is exiting. At this point, the process ends.

The consumer function runs in it’s own process as well. It takes the Queue and the Lock as it’s parameters and then acquires a lock on the console to notify the user it is starting. The consumer prints out it’s PID also. Next the consumer enters an infinte loop on lines 30-38. It similuates sleeping on line 31 and then makes a call the queue.get() on line 34. If the queue has data, the get() method returns that data immediately and the consumer prints the data on line 38. Otherwise, get() blocks execution until data is available.

Line 41 is the entry point to the programing, using the if __name__ == ‘__main__’ test. We begin on 44 by making a list of names. The Queue object is created on line 49 and the Lock() object is made on line 52. Then on lines 57-59, we enter a for-each loop and create our producer Process objects. We use the target parameter to point the Process at the producer function and then pass in a tuple for the arguments that the function is called with.

Creating the consumers processes has one extra that that isn’t needed when creating the Producers. Lines 62-68 creates the consumer processes, but on line 67, set the daemon property to True. This is needed because the consumer function uses and infinite loop and those processes will never terminate unless they are marked as daemon processes.

Once are processes are created, we start them by calling start() on each Process object (lines 72-76). Like threads, Processes also have a join() method that can be used to synchronize a program. Our consumer processes never return, so calling join() on them would cause the program to hang, but our producer processes do return so we use join() on line 80 to cause the parent process to wait for the producer processes to exit.

Resources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Programming Python: Powerful Object-Oriented Programming

Python Signals

Python has a signal module that is used to respond to signals generated by the operating system. Signals are a very low-level form of interprocess communication, but there are some cases where a program may wish to respond to a signal. For example, it may be useful to watch for program signals when writing developer toolkits.

This post demonstrates how to respond to an alarm signal. The example is borrowed from Programming Python: Powerful Object-Oriented Programming. I added my own comments to help explain the workings of the program.

Code

import sys, signal, time


# Function that returns the time
def now():
    return time.asctime()


# Function that handles the signal
def onSignal(signum, stackframe):
    print('Got alarm', signum, 'at', now())


while True:
    print('Setting at', now())
    
    # This tells the program to respond to the alarm signal
    # by calling the onSignal function
    signal.signal(signal.SIGALRM, onSignal)
    
    # Raise SIGALRM (Note this can be done by other processes also)
    signal.alarm(5)
    
    signal.pause()

Explanation

The code defines an onSignal function that works as a handler to operating system signals found on lines 10-11. All it does is prints text to the console. On line 19, we register onSignal as a handler for the SIGALRM os signal. Line 22 shows how to raise an os signal, which then invokes onSignal. Note that we don’t have to have our programs actually raise signals. We can also simply listen for other os signals raised by other programs (for example, the kill signal which is raised by executing killall in a unix shell).

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Sockets

Network sockets are extremely useful for interprocess communication (IPC). Not only do network sockets allow processes to communicate on the same machine, but we can also use sockets to communicate over a network. This post shows the most basic demonstration of network sockets using an example borrowed from Programming Python: Powerful Object-Oriented Programming. I added my own comments to help explain the program.

Code

from socket import socket, AF_INET, SOCK_STREAM

port = 50008
host = 'localhost'


# Function to create a server
def server():
    # Create a network socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Bind the socket to localhost with our port
    sock.bind(('', port))

    # Listen for up to 5 connections
    sock.listen(5)
    while True:
        # Wait for a client
        conn, addr = sock.accept()

        # Grab a megabyte of data from the client
        data = conn.recv(1024)

        # Create a reply string
        reply = 'server got: [{}]'.format(data)

        # Send the reply back to the client
        conn.send(reply.encode())


# Function to create a socket client
def client(name):
    # Create a socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Connect the socket to the server
    sock.connect((host, port))

    # Send a message to the server
    sock.send(name.encode())

    # Receive a megabyte of data from the server
    reply = sock.recv(1024)

    # Close our connection
    sock.close()

    # Print the output
    print('Client got: [{}]'.format(reply))


if __name__ == '__main__':
    from threading import Thread

    # Create a thread for the server
    sthread = Thread(target=server)
    sthread.daemon = True
    sthread.start()

    # Create 5 client threads
    for i in range(5):
        Thread(target=client, args=('client{}'.format(i),)).start()

Explanation

The example program creates a basic client / server program. The program uses threads to help keep the program simple. One thread calls the server function defined on lines 8-28 and the remaining five threads call the client function found on line 32-49. The server thread creates a network server that accepts up to five connections from the client threads.

The server function starts by creating a socket object (called sock). On line 13, the program binds our socket to the machines localhost address and the port number specified on line 3. On line 16, the socket waits for up to five connections. Then the server enters a loop on line 17.

Inside of the of the loop, we have a call to sock.accept(). This function accepts a connection from a client and returns a connection and address object. Out program only uses the connection object. The program reads data from the client on line 22 using conn.recv. The conn.recv function takes a number of bytes to read from the client. The conn.recv returns binary information and the program stores it in the data varaible. Lines 25 and 28 show how to send information back to the client using conn.send. The conn.send function expects binary information, which is why we call encode() on the reply variable.

The client function acts almost exactly like the server function. The socket client is created on line 34. We use the connect function (line 37) to connect to the server and pass it a tuple containing the host and the port number. Unlike the server, which has its own dedicated connection object, the client uses the socket object itself to send and receive information to and from the server. On line 40, the program calls sock.send() and passes it a binary string to send to the server. The response from the server is collected on line 43 using sock.recv(). When the client is finished, it needs to close its connection to the server using sock.close().

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Basic Pipes

Python provides two main avenues of parallel processing. One avenue is to use multithreading where a program itself multitasks, while the other approach is to have a program relaunch itself as a separate program in a new process. One approach is not necessarily better than the other approach but instead, should be throught of as tools for different use cases. Threads have low overhead and share a program’s memory space, which allows for easy communication between threads. Processes operate as if we launched a new copy of the program from our operating system and allow programs to spread themselves out over an operating system or even a network.

However, processes do not share a global memory space, which means they need a way to communicate with one another. One approach to interprocess communication (IPC) is to use pipes. This post shows an example of IPC using pipes taken from Programming Python: Powerful Object-Oriented Programming. I have added my own comments to the code for clarity.

Code

import os, time


# Function called by child processes
def child(pipeout):
    zzz = 0
    while True:
        time.sleep(zzz)

        # We have to encode our string to binary to use
        # with pipes
        msg = ('Spam {}'.format(zzz)).encode()

        # Send the data back to the parent process
        os.write(pipeout, msg)
        zzz = (zzz + 1) % 5


def parent():
    # Creates our pipes. The pipeout gets passed to the child
    # process while parent keeps pipein
    pipein, pipeout = os.pipe()

    if os.fork() == 0:
        # We are now in the child process so call child and supply
        # it with pipeout so that it can send information back to
        # the parent.
        child(pipeout)
    else:
        # This is the parent process
        while True:
            # Read data from the child process
            # This call blocks until there is data
            line = os.read(pipein, 32)

            # Print to the console
            print('Parent {} got [{}] as {}'.format(os.getpid(), line, time.time()))


if __name__ == '__main__':
    parent()

Explanation

We have two functions in the program named child() and parent(). The child() function is intended to run in child processes while parent() contains the main program. Parent() is defined on lines 19-37. The function begins by calling os.pipe() on line 22 which returns a tuple containing two ends of a single pipe. Pipes are unidirectional and thus pipein is used by the parent to read data that comes from the child process. The child process uses pipeout to send data to the parent.

The program forks into two different processes on line 24. The program is in the child process when os.fork() returns zero. Line 28 calls the child() function and passes pipeout to the child function so that the child process can send data back to the parent. The child process enters an infinite loop on line 7. On line 12, a msg variable is created that contains a String variable. Pipe send binary data, so we have to call encode() on the String to convert it to a binary string. Then on line 15, we send the msg varaiable back to the parent using os.write and supplying pipeout and msg to that function.

The parent process continues on line 31. It attempts to read data from the child process on line 34 using os.read. Notice that os.read requires a pipein variable and the size of binary data to read (32 bytes in this program). If the pipe contains data, os.read returns immedialy and stores the value in the line variable. Otherwise, os.read blocks the program until the pipe has data. The parent process prints the data on line 37.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Producer Consumer with Queue

The producer / consumer pattern is a common programming construct used in multithreaded applications where one thread acts as a producer of data while other threads consume the data. A web crawler application is a use case of the producer / consumer pattern. For example, the application may have a thread dedicated to crawling the web that gathers data (producer) while other threads index and store the data (consumers).

Producer and consumer threads need a way to share data. Python’s queue module provides one of many solutions. The Queue object is a FIFO object that lets the produce thread place data on the queue. Consumer threads are blocked by the Queue until the Queue has data for the consumer thread to read. When data becomes available, the consumer thread removes data from the Queue and does its work.

Below is an example program borrowed from Programming Python: Powerful Object-Oriented Programming that shows how to use a Queue to synchronize data between producer and consumer threads. I added my own comments to the code to help explain what is happening in the program.

Code

# Specify the number of consumer and producer threads
numconsumers = 2
numproducers = 4
nummessages = 4

import _thread as thread, queue, time

# Create a lock so that only one thread writes to the console at a time
safeprint = thread.allocate_lock()

# Create a queue object
dataQueue = queue.Queue()


# Function called by the producer thread
def producer(idnum):
    # Produce 4 messages to place on the queue
    for msgnum in range(nummessages):
        # Simulate a delay
        time.sleep(idnum)

        # Put a String on the queue
        dataQueue.put('[producer id={}, count={}]'.format(idnum, msgnum))


# Function called by the consumer threads
def consumer(idnum):
    # Create an infinite loop
    while True:
        # Simulate a delay
        time.sleep(0.1)
        try:
            # Attempt to get data from the queue. Note that
            # dataQueue.get() will block this thread's execution
            # until data is available
            data = dataQueue.get()
        except queue.Empty:
            pass
        else:
            # Acquire a lock on the console
            with safeprint:
                # Print the data created by the producer thread
                print('consumer ', idnum, ' got => ', data)


if __name__ == '__main__':
    # Create consumers
    for i in range(numconsumers):
        thread.start_new_thread(consumer, (i,))
        
    # Create producers
    for i in range(numproducers):
        thread.start_new_thread(producer, (i,))
        
    # Simulate a delay
    time.sleep(((numproducers - 1) * nummessages) + 1)
    
    # Exit the program
    print('Main thread exit')

Detailed Explanation

This program shows the producer / consumer pattern in action. We begin by defining variables that specify the number of consumer threads (line 2), the number of produce threads (line 3), and the number of messages the producer threads make (line 4). The program creates a lock on line 9 so that only one thread can use the console at the same time. Then on line 12, the queue is created as a global variable.

Our first function, producer, is defined on lines 16-23. There isn’t anything fancy going on in this function. The function simply enters a for-each loop and creates 4 strings that are placed on dataQueue (line 23). Since dataQueue is a FIFO structure, worker threads will remove these Strings from dataQueue in the order they are recieved.

Lines 27-43 define the consumer thread function, consumer. This code enters an infinite loop and removes data from dataQueue and prints the String to the console. Line 36 is the critical piece of code in the consumer function. The call to get() on dataQueue removes the item at the front of the queue and stores it in the variable data. If dataQueue is empty, the consumer thread is blocked until data becomes available.

Alternatively, we could pass false to the optional block parameter on get(). That would cause the thread to continue to execute even if the queue is empty. However, we need to be prepared for situations where the queue is empty and catch the queue.Empty exception that is thrown. Our program calls pass to skip over the exception should this happen (it shouldn’t be the way, because we are using the blocking version of get()).

Lines 48-49 create our producer threads and start them. Lines 52-53 create and start the consumer threads. The producer threads call the produce function while the consumer threads call the consume function. The dataQueue object does the job of synchronizing data between threads. The produce threads write to dataQueue and consumer threads read from it. Thus, our program has created the consumer / producer pattern.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python _thread Mutex

When Python programs create threads, each thread shares the same global memory as every other thread. Usually, but not always, multiple threads can safely read from shared resources without issue. Threads writing to shared resources are a different story because one thread could potentially overwrite the work of another thread.

This post demonstrates an example program shown in Programming Python: Powerful Object-Oriented Programming where threads acquire and release locks in the program. The locking mechanism ensures that only one thread has access to a shared resource at a time.

Code

Here is an example program with my own comments added.

import _thread as thread, time

# This mutex object is created by calling
# thread.allocate_lock()
# The mutex is responsible for synchronizing threads
mutex = thread.allocate_lock()


def counter(tid, count):
    for i in range(count):
        time.sleep(1)
        
        # The standard out is a shared resource
        # Unless the program controls access to the standard out
        # multiple threads can print to standard out at the same time
        # which results in garbage output
        
        # Acquire a lock
        mutex.acquire()
        
        # Now only the current thread can print to the console
        print('[{}] => {}'.format(tid, i))
        
        # Make sure to release the lock for other threads when finished
        mutex.release()


if __name__ == '__main__':
    for i in range(5):
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting...')

Explanation

The program creates five threads, each of which needs access to the standard output stream. The standard output stream is a global object that all of the threads share, which means that each thread can call print at the same time. That isn’t ideal because we can get garbage output printed to the console if two threads call the print() statement at the same time.

The solution is to lock access to the standard output stream so that only one thread may use it at a time. We do this by creating a mutex object on line 6 in the program by using thread.allocate_lock(). When a thread needs a lock, it calls acquire() on the mutex. At that point, all other threads that need protected resources have to sit and wait for mutex.release().

It’s important to keep the operations between mutex.acquire() and mutex.release() as brief as possible. Only one thread can hold a lock at a time, so the longer one thread holds a lock, the longer other threads need to wait for their turn to use the lock. That naturally impacts the performance of the overall program.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Threading

The Python threading module provides an OOP solution to threading. The base class, threading.Thread, follows a Java like pattern to creating and joining threads. This post provides a threading demonstration with an example program found in Programming Python: Powerful Object-Oriented Programming.

Code

This is the example program with my own comments added.

import threading

# Thread is the base class for creating OOP Style Threads
# It has a run() method that contains the code that runs in a new thread
class MyThread(threading.Thread):
    def __init__(self, myId, count, mutex):
        self.myId = myId
        self.count = count
        self.mutex = mutex
        threading.Thread.__init__(self)

    # Everything inside of run is executed in a seperate thread
    def run(self):
        for i in range(self.count):
            with self.mutex:
                print('[{}] => {}'.format(self.myId, i))


if __name__ == '__main__':
    stdoutmutex = threading.Lock()
    threads = []
    for i in range(10):
        # Create the new Thread Object
        thread = MyThread(i, 100, stdoutmutex)
        
        # The thread doesn't actually start running until
        # start() is called
        thread.start()
        threads.append(thread)

    for thread in threads:
        # join() is used to synchronize threads
        # Calling join() on a thread makes the parent thread wait
        # until the child thread has finished
        thread.join()

    print('Main thread exiting...')

Explanation

The OOP approach to Python threads requires developers to extend the threading.Thread class. The Thread class provides high level methods that support threading such as start() and join() which we discuss shortly. It has also an empty run() method that developers need to override. All of the code placed in the run() method runs in a new thread.

We are still free to use locks with OOP threads. On line 20, we acquire a lock by calling threading.Lock(). The mutex is passed to the thread object’s constructor on line 24 and is used by the thread on line 15 to aquire a lock to that standard output stream.

It’s important to note that the new thread doesn’t actually run until we call start(). The start() method is what submits the thread to the thread pool so that the Python runtime can use the thread. Never call run() directly on thread object because doing so will keep the program single threaded. The run() method is called by the Python environment when it’s the thread’s turn to run.

The example program also uses the join() method. The join() method is used to make a parent thread wait until a child thread completes. The example program creates 10 threads and needs to wait until all of the threads are finished. This is done by entering a for-each loop on line 31 and then calling join() on each of the threads. When join() is called, the parent thread sleeps until child thread’s run() method is finished. When all 10 threads are finished, the program exits.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python _thread Basic

Python 3 has the newer thread package, but the _thread package still exists for developers who are more comfortable with the 2.x API. This is a basic example derived from Programming Python: Powerful Object-Oriented Programming that demonstrates how to create threads using the _thread module.

Code

import _thread as thread, time


# This function will run in a new thread
def counter(tid, count):
    for i in range(count):
        # Simulate a delay
        time.sleep(1)
        # Print out the thread id (tid) and the current iteration
        # of our for loop
        print('[{}] => {}'.format(tid, i))


if __name__ == '__main__':
    # Enter a loop that creates 5 threads
    for i in range(5):
        # Start a new thread passing a callable and it's arguments
        # in the form of a tuple
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting')

Explanation

This script creates five new threads using _thread.start_new_thread. Each time a new thread is created, the counter function is called and is passed a tuple of (i, 5). That tuple corresponds to the tid and count parameters of the counter function. Counter enters a loop that runs 5 times since 5 was passed to the second parameter of counter on line 19. It will print the thread id and current iteration of the loop.

Meanwhile, the for loop in the parent thread continues to iterate because thread.start_new_thread does not block the for loop in the main thread. By calling start_new_thread, the program’s execution runs both the for loop in the main thread and the counter function in parallel. Allowing programs to run multiple portions of code at the same time is what gives threads their power. For example, you may wish to use a thread to handle a long running database query while the user continues to interact with the program in the main thread.

One final note about threads in Python. Threads give the appearance of allowing programs to multitask and for all intents and purposes, that is what is happening in the program. Nevertheless, what is really happening is that the Python Virtual Machine is time slicing the computer instructions and allowing a few lines of code to run before switching to another set of instructions.

In other words, if a program has three threads, A, B, and C, then Thread A runs for a few moments, then Thread B, and finally Thread C. Note that there is no guarantee to the order in which threads run. It is possible that one thread may run more often than other threads or that the order of running threads is different each time.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Basic Forking

Many programs need to execute tasks simultaneously and Python provides us with a few different mechanisms for concurrent programming. One of those mechanisms is called forking, where a call is made to the underlying operating system to create a working copy of a program that’s already running. The program that created the new process is called the parent process, while the processes that are created by the parent are called the child process.

This post shows the most basic form of creating processes in Python and helps serve as a foundation to understanding forking. The example is derived from Programming Python: Powerful Object-Oriented Programming, and I added my own comments to help better explain the program.

import os


# This is a function called by the child process
def child():
    # Use os.getpid() to get the pid for this process
    print('Hello from child', os.getpid())

    # force the child process to exit right away
    # or the child process will return to the infinite loop in parent()
    os._exit(0)


def parent():
    while True:
        # Attempt to fork this program into a new process. When forking is complete
        # newpid will be non-zero for the original process, but it wil be
        # 0 in the child process
        newpid = os.fork()

        # The program now goes in two different directions at the same time
        # When newpid is 0, we call child() and the child process exits
        if newpid == 0:  # Test if this is a child process
            child()
        else:
            # If are here, then we are still in the parent process
            # We print the pid of the parent and the child process (newpid)
            print('Hello from parent', os.getpid(), newpid)
        if input() == 'q':
            break


if __name__ == '__main__':
    parent()

When run on my machine, the program shows the following output

Hello from parent 87800 87802
Hello from child 87802
k
Hello from parent 87800 87803
Hello from child 87803
k
Hello from parent 87800 87804
Hello from child 87804
k
Hello from parent 87800 87805
Hello from child 87805
q

Explanation

The hardest part to grasp about this program is that when os.fork() is called on line 19, the program actually launches a copy of itself. The operating system creates the new process and that new process gets a copy of all variables in memory and execution of the new and old programs continue after line 19. (Note: The OS may not exactly copy the parent process, but functionally speaking, the child process can be considered to be a copy of the parent).

The os.fork() function returns a number called a PID (process ID). We can test the pid to see if we are running in the parent or child process. When we are in the child process, the value returned by os.fork() is zero. So on line 23, we test for 0 and if newpid is zero, we call the child() function.

The alternative case is that we are still running in the parent process (bearing in mind, that the child process is also running at this point in time as well). If we are still in the parent process os.fork() returns a non-zero value. In that case, we use the else block to print the parent and child PID.

The parent process continues to loop until the user enters q to quit. Each time the loop iterates, a new child process is created by the parent. The parent prints its own PID (using os.getpid()) and the pid of the child on line 28.

The child process also uses os.getpid() to get its own PID. It prints its own PID on line 7 and then on line 11, we use os._exit(0) to force the child process to shut down. This is a critical step for this program! If we were to omit the call to os._exit on line 11, the child process would return to the parent function and enter the same infinite loop the parent is using.

Conclusion

This is the most basic example of creating child processes using Python. Keep in mind that processes do not share memory (unlike threads). In real world programs, processes often need to sycnchronize data from one process to another process using tools such as network sockets, databases, or files. When a child process is spawned it gets a copy of the memory of the parent process, but then functions as an independent program.

Source

Lutz, Mark. Programming Python. Beijing, OReilly, 2013

	Rachel Glover on Kotlin Console Object
	Cara Horton on Kotlin Spring MVC
	Ashlee M on Kotlin Watch Service
	kishor676 on Node.js Rest Service Call…
	kishor676 on Node.js Handlebars Twitter

Code

Sample Output

Explanation

References

Share this:

Code

Explanation

Resources

Share this:

Code

Explanation

References

Share this:

Code

Explanation

References

Share this:

Code

Explanation

References

Share this:

Code

Detailed Explanation

References

Share this:

Code

Explanation

References

Share this:

Code

Explanation

References

Share this:

Code

Explanation

References

Share this:

Explanation

Conclusion

Source

Share this: