Kotlin Data Access Object

Even the most trivial of computer programs work with data. In many cases, the user will wish to save data and reuse it between one run of the program and a later session. Since different systems handle data differently, the user may need to save data in multiple formats for compatibility purposes. The program should know how to not only persist data but also read data from a variety of supported formats.

Let’s consider a word processes. When we write a document in a word processor, we may normally save documents in the word processor’s default file format. Later on, we may wish to share the document with another coworker or friend, but they have a different word processor than what we are using. The solution may be to use the word processor’s export feature to convert our document from the word processor’s default format to the file format our friend’s word processor is using.

Converting data isn’t just an issue for desktop programs either. We may have a web application deployed that may store data in an RBDMS but have to export the same data to XML or JSON. Likewise, it’s very likely a web application may need to consume JSON and export data as a CSV format. Clearly, there is plenty of need for an application to be flexible about file formats.

That means we should not tie the persistent mechanism to a data structure. It might seem to go against the OOP principle of encapsulation where each object is responsible for its own data but think about the implications of such a design. If we write the persistence logic into a class, the class is tied to a particular file format. What happens when the class needs to read and write to a different file format? We could add additional persistence mechanisms to the class, but eventually, the class becomes bloated as we add more and more persistence schemes to it.

Another approach might be to define abstract methods for saving and retrieving data, but this is equally as bad of an approach. For one thing, are we really going to create an entire class hierarchy just for reading and writing data? Furthermore, we have no means with which to enforce a certain storage mechanism since all instances of such a class would resolve their storage logic using virtual methods. Finally, we would most likely need to convert instances of one subclass to another subclass in order to switch between storage mechanisms, which could send us down the rabbit hole of deep cloning objects.

If you think about it, storing and retrieving data is a completely separate concern of the application. A data object should never end up being responsible for its own persistence. That needs to be the concern of a completely different class or set of classes. Thus we end up with the data access pattern.

The data access pattern relies on a transfer object to hold the data. The transfer object is passed between the view (which is normally the UI) and the source of the data. Kotlin has a special data class that we can use for this specific purpose. Let’s have a look at an example.

/**
 * This a model class that is persisted to disk in this program
 */
data class Cook(val fName : String,
                val lName : String,
                val position : String,
                val age : Int) : Serializable

Koltin data classes get a built-in implementation of hashCode(), equals(), and toString(). We are free to add additional methods to them if needed. We also have this class implementing Serializable for a later demonstration. The important part to note about the Cook class is that it does not handle its own persistence. That is done by another class that is specific for this purpose.

Once we have our transfer object, we can begin by defining our persistence classes. Since we plan on using different formats for reading and writing data, we should use an interface as an abstraction point.

/**
 * This interface describes the kinds of Data Access Operations available
 */
interface CookDao {
    fun save(c : Cook, outStream: OutputStream)

    fun read(inStream: InputStream) : Cook
}

Our CookDao interface defines two methods. The first method, save(), takes a Cook object and an OuputStream. The Cook object is, of course, the transfer object. As for the OutputStream, it’s best to use OutputStream rather than File because it allows this Dao class to work with non-file streams such as network sockets. The DAO class should be concerned with how the object is persisted and retrieved, but it should not be coupled to a particular source. This keeps it flexible and allows us to send and store data on File Systems, Network Sockets, Http Responses, etc.

The same principle holds true for the read() function as well. This method takes an InputStream, the reciprocal class for OutputStream, and returns a Cook object. This allows us to create a Cook from any input stream. Once again, the source of the InputStream isn’t important to the class, only that it contains the data needed to create a Cook object.

Now that we have our interface, let’s look out our implementations. At first, our application wishes to store Cooks as Serialized Java Objects. Let’s create a version of CookDao that accomplishes this task.

/**
 * Implementation of CookDao that uses JVM serialization
 */
class SerializedCookDao : CookDao{
    override fun save(c: Cook, outStream: OutputStream) {
        val o = ObjectOutputStream(outStream)
        outStream.use {
            o.writeObject(c)
        }
    }

    override fun read(inStream: InputStream) : Cook{
        val i = ObjectInputStream(inStream)
        inStream.use {
            return i.readObject() as Cook
        }
    }
}

The SerializedCookDao implements CookDao and stores Cook as a Serialized Java object. When we wish to restore the Cook, we use the readObject() method found on ObjectInputStream to restore the Cook. Either way, the application at large isn’t concerned about how cook is persisted and restored.

Later on, we decide to support command separated values format or CSV. We don’t need to change any client code to accomplish such as task. All we need to do is define another class that implements CookDao.

/**
 * Implementation of CookDao that converts Cooks to a CSV file
 */
class CsvCookDao : CookDao {

    //Private extension function on Cook for creating
    //CSV files
    private fun Cook.toCSV() : String {
        return "${this.fName},${this.lName},${this.age},${this.position}"
    }

    //Private function to convert a CSV line to a Cook
    private fun parseCsv(csv : String) : Cook{
        val parts = csv.split(",")
        return Cook(fName = parts[0],
                    lName = parts[1],
                    age = parts[2].toInt(),
                    position = parts[3])
    }

    override fun save(c: Cook, outStream: OutputStream) {
        outStream.use {
            PrintWriter(outStream, true).println(c.toCSV())
        }
    }

    override fun read(inStream: InputStream): Cook {
        inStream.use {
            val sc = Scanner(inStream)
            return parseCsv(sc.nextLine())
        }
    }
}

The CsvCookDao class does the job. There are a couple of things to note about this class that makes it more interesting. The first is the extension function Cook.toCSV(). Kotlin extension functions let us add extra functionality to a class. We could have added a toCSV() method to Cook, but let’s think about the implications of such a design. For one thing, only CsvCookDao is concerned about making a Cook into a line of CSV. If we added toCSV() to Cook, then we are saying that all Cook objects should be able to turn themselves into CSV at any time.

That’s bad because we are violating the separation of concerns principle again. We don’t want Cook to concern itself with persistence. That’s the job of our DAO classes. Thus, our CsvCookDao gets a private method to convert Cooks into CSV. Likewise, we have the parseCsv function which also turns CSV back into Cooks. We could violate the separation of concerns by adding a constructor to the Cook class, but that would be equally as bad as adding a toCSV() method.

Now that we have our DAO classes, we need a way to pass Cooks from the view to the DAO. We can define a service class for this purpose.


/**
 * CookService uses Kotlin's delegation mechanism. CookService will have
 * all of the same methods as CookDao but the implementation will depend
 * on the CookDao that was provided
 */
class CookService(private val cookDao: CookDao) : CookDao by cookDao

The CookService class is powerful but incredibly compact. Instances of this class are using Kotlin’s delegation mechanism where all of the methods of CookService are wrapped by methods found in CookDao. It’s this mechanism that lets our application switch between serialization and CSV so easily. The view will use CookService. The CookService is created by passing an instance of CookDao to the constructor of CookService. Whichever CookDao is used will determine the data format the application is currently using!

Let’s have a look at the final portion of the application to see this in action.

fun saveAndRestore(bob : Cook, cookService : CookService, inStream : InputStream, outStream : OutputStream){
    println("Bob before saving => " + bob)

    cookService.save(bob, outStream)

    var restoredBob = cookService.read(inStream)
    println("Bob after restoring => " + restoredBob)
}

fun createIfNeeded(name : String) : File{
    val f = File(name)
    if(!f.exists()){
        f.createNewFile()
    }
    return f
}

fun main(args : Array<String>){
    val bob = Cook("Bob", "Belcher", "Owner", 45)
    println("Using CSV")
    saveAndRestore(bob,
            CookService(CsvCookDao()), //Application is using CSV
            FileInputStream(createIfNeeded("bob.csv")),
            FileOutputStream(createIfNeeded("bob.csv")))
    println()
    println("Using serialization")
    saveAndRestore(bob,
            CookService(SerializedCookDao()), //Application is using Serialization
            FileInputStream(createIfNeeded("bob.ser")),
            FileOutputStream(createIfNeeded("bob.ser")))
}

The saveAndRestore function does the job of saving and restoring a Cook object. However, it has no idea of where or how a Cook is handled. The saveAndRestore function interacts soley with CookService. The CookService objects are created in the main method. When a CsvCookDao is used in the CookService constructor, the Cooks are saved in CSV format. When SerialiazedCookDao is used instead, the application will use JVM serialization to save and restore a Cook.

It’s easy to imagine how easily this program can be extended later on. Suppose we which to support XML. All we need to do is create a new CookDao class that handles the transformation of a Cook object to and from XML. Later on, we would pass this CookDao class to the constructor of CookService and the rest of the application would continue to work. In this fashion, we can easily continue to add additional file formats as needs change.

Putting it Together

Here is the program in its entirety followed by output.

package ch4.dataacesspattern

import java.io.*
import java.util.*

/**
 * This a model class that is persisted to disk in this program
 */
data class Cook(val fName : String,
                val lName : String,
                val position : String,
                val age : Int) : Serializable

/**
 * This interface describes the kinds of Data Access Operations available
 */
interface CookDao {
    fun save(c : Cook, outStream: OutputStream)

    fun read(inStream: InputStream) : Cook
}

/**
 * Implementation of CookDao that uses JVM serialization
 */
class SerializedCookDao : CookDao{
    override fun save(c: Cook, outStream: OutputStream) {
        val o = ObjectOutputStream(outStream)
        outStream.use {
            o.writeObject(c)
        }
    }

    override fun read(inStream: InputStream) : Cook{
        val i = ObjectInputStream(inStream)
        inStream.use {
            return i.readObject() as Cook
        }
    }
}

/**
 * Implementation of CookDao that converts Cooks to a CSV file
 */
class CsvCookDao : CookDao {

    //Private extension function on Cook for creating
    //CSV files
    private fun Cook.toCSV() : String {
        return "${this.fName},${this.lName},${this.age},${this.position}"
    }

    //Private function to convert a CSV line to a Cook
    private fun parseCsv(csv : String) : Cook{
        val parts = csv.split(",")
        return Cook(fName = parts[0],
                    lName = parts[1],
                    age = parts[2].toInt(),
                    position = parts[3])
    }

    override fun save(c: Cook, outStream: OutputStream) {
        outStream.use {
            PrintWriter(outStream, true).println(c.toCSV())
        }
    }

    override fun read(inStream: InputStream): Cook {
        inStream.use {
            val sc = Scanner(inStream)
            return parseCsv(sc.nextLine())
        }
    }
}

/**
 * CookService uses Kotlin's delegation mechanism. CookService will have
 * all of the same methods as CookDao but the implementation will depend
 * on the CookDao that was provided
 */
class CookService(private val cookDao: CookDao) : CookDao by cookDao

fun saveAndRestore(bob : Cook, cookService : CookService, inStream : InputStream, outStream : OutputStream){
    println("Bob before saving => " + bob)

    cookService.save(bob, outStream)

    var restoredBob = cookService.read(inStream)
    println("Bob after restoring => " + restoredBob)
}

fun createIfNeeded(name : String) : File{
    val f = File(name)
    if(!f.exists()){
        f.createNewFile()
    }
    return f
}

fun main(args : Array<String>){
    val bob = Cook("Bob", "Belcher", "Owner", 45)
    println("Using CSV")
    saveAndRestore(bob,
            CookService(CsvCookDao()), //Application is using CSV
            FileInputStream(createIfNeeded("bob.csv")),
            FileOutputStream(createIfNeeded("bob.csv")))
    println()
    println("Using serialization")
    saveAndRestore(bob,
            CookService(SerializedCookDao()), //Application is using Serialization
            FileInputStream(createIfNeeded("bob.ser")),
            FileOutputStream(createIfNeeded("bob.ser")))
}

Output

Using CSV
Bob before saving => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)
Bob after restoring => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)

Using serialization
Bob before saving => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)
Bob after restoring => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)
Advertisements

Python Sockets

Network sockets are extremely useful for interprocess communication (IPC). Not only do network sockets allow processes to communicate on the same machine, but we can also use sockets to communicate over a network. This post shows the most basic demonstration of network sockets using an example borrowed from Programming Python: Powerful Object-Oriented Programming. I added my own comments to help explain the program.

Code

from socket import socket, AF_INET, SOCK_STREAM

port = 50008
host = 'localhost'


# Function to create a server
def server():
    # Create a network socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Bind the socket to localhost with our port
    sock.bind(('', port))

    # Listen for up to 5 connections
    sock.listen(5)
    while True:
        # Wait for a client
        conn, addr = sock.accept()

        # Grab a megabyte of data from the client
        data = conn.recv(1024)

        # Create a reply string
        reply = 'server got: [{}]'.format(data)

        # Send the reply back to the client
        conn.send(reply.encode())


# Function to create a socket client
def client(name):
    # Create a socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Connect the socket to the server
    sock.connect((host, port))

    # Send a message to the server
    sock.send(name.encode())

    # Receive a megabyte of data from the server
    reply = sock.recv(1024)

    # Close our connection
    sock.close()

    # Print the output
    print('Client got: [{}]'.format(reply))


if __name__ == '__main__':
    from threading import Thread

    # Create a thread for the server
    sthread = Thread(target=server)
    sthread.daemon = True
    sthread.start()

    # Create 5 client threads
    for i in range(5):
        Thread(target=client, args=('client{}'.format(i),)).start()

Explanation

The example program creates a basic client / server program. The program uses threads to help keep the program simple. One thread calls the server function defined on lines 8-28 and the remaining five threads call the client function found on line 32-49. The server thread creates a network server that accepts up to five connections from the client threads.

The server function starts by creating a socket object (called sock). On line 13, the program binds our socket to the machines localhost address and the port number specified on line 3. On line 16, the socket waits for up to five connections. Then the server enters a loop on line 17.

Inside of the of the loop, we have a call to sock.accept(). This function accepts a connection from a client and returns a connection and address object. Out program only uses the connection object. The program reads data from the client on line 22 using conn.recv. The conn.recv function takes a number of bytes to read from the client. The conn.recv returns binary information and the program stores it in the data varaible. Lines 25 and 28 show how to send information back to the client using conn.send. The conn.send function expects binary information, which is why we call encode() on the reply variable.

The client function acts almost exactly like the server function. The socket client is created on line 34. We use the connect function (line 37) to connect to the server and pass it a tuple containing the host and the port number. Unlike the server, which has its own dedicated connection object, the client uses the socket object itself to send and receive information to and from the server. On line 40, the program calls sock.send() and passes it a binary string to send to the server. The response from the server is collected on line 43 using sock.recv(). When the client is finished, it needs to close its connection to the server using sock.close().

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Basic Pipes

Python provides two main avenues of parallel processing. One avenue is to use multithreading where a program itself multitasks, while the other approach is to have a program relaunch itself as a separate program in a new process. One approach is not necessarily better than the other approach but instead, should be throught of as tools for different use cases. Threads have low overhead and share a program’s memory space, which allows for easy communication between threads. Processes operate as if we launched a new copy of the program from our operating system and allow programs to spread themselves out over an operating system or even a network.

However, processes do not share a global memory space, which means they need a way to communicate with one another. One approach to interprocess communication (IPC) is to use pipes. This post shows an example of IPC using pipes taken from Programming Python: Powerful Object-Oriented Programming. I have added my own comments to the code for clarity.

Code

import os, time


# Function called by child processes
def child(pipeout):
    zzz = 0
    while True:
        time.sleep(zzz)

        # We have to encode our string to binary to use
        # with pipes
        msg = ('Spam {}'.format(zzz)).encode()

        # Send the data back to the parent process
        os.write(pipeout, msg)
        zzz = (zzz + 1) % 5


def parent():
    # Creates our pipes. The pipeout gets passed to the child
    # process while parent keeps pipein
    pipein, pipeout = os.pipe()

    if os.fork() == 0:
        # We are now in the child process so call child and supply
        # it with pipeout so that it can send information back to
        # the parent.
        child(pipeout)
    else:
        # This is the parent process
        while True:
            # Read data from the child process
            # This call blocks until there is data
            line = os.read(pipein, 32)

            # Print to the console
            print('Parent {} got [{}] as {}'.format(os.getpid(), line, time.time()))


if __name__ == '__main__':
    parent()

Explanation

We have two functions in the program named child() and parent(). The child() function is intended to run in child processes while parent() contains the main program. Parent() is defined on lines 19-37. The function begins by calling os.pipe() on line 22 which returns a tuple containing two ends of a single pipe. Pipes are unidirectional and thus pipein is used by the parent to read data that comes from the child process. The child process uses pipeout to send data to the parent.

The program forks into two different processes on line 24. The program is in the child process when os.fork() returns zero. Line 28 calls the child() function and passes pipeout to the child function so that the child process can send data back to the parent. The child process enters an infinite loop on line 7. On line 12, a msg variable is created that contains a String variable. Pipe send binary data, so we have to call encode() on the String to convert it to a binary string. Then on line 15, we send the msg varaiable back to the parent using os.write and supplying pipeout and msg to that function.

The parent process continues on line 31. It attempts to read data from the child process on line 34 using os.read. Notice that os.read requires a pipein variable and the size of binary data to read (32 bytes in this program). If the pipe contains data, os.read returns immedialy and stores the value in the line variable. Otherwise, os.read blocks the program until the pipe has data. The parent process prints the data on line 37.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Producer Consumer with Queue

The producer / consumer pattern is a common programming construct used in multithreaded applications where one thread acts as a producer of data while other threads consume the data. A web crawler application is a use case of the producer / consumer pattern. For example, the application may have a thread dedicated to crawling the web that gathers data (producer) while other threads index and store the data (consumers).

Producer and consumer threads need a way to share data. Python’s queue module provides one of many solutions. The Queue object is a FIFO object that lets the produce thread place data on the queue. Consumer threads are blocked by the Queue until the Queue has data for the consumer thread to read. When data becomes available, the consumer thread removes data from the Queue and does its work.

Below is an example program borrowed from Programming Python: Powerful Object-Oriented Programming that shows how to use a Queue to synchronize data between producer and consumer threads. I added my own comments to the code to help explain what is happening in the program.

Code

# Specify the number of consumer and producer threads
numconsumers = 2
numproducers = 4
nummessages = 4

import _thread as thread, queue, time

# Create a lock so that only one thread writes to the console at a time
safeprint = thread.allocate_lock()

# Create a queue object
dataQueue = queue.Queue()


# Function called by the producer thread
def producer(idnum):
    # Produce 4 messages to place on the queue
    for msgnum in range(nummessages):
        # Simulate a delay
        time.sleep(idnum)

        # Put a String on the queue
        dataQueue.put('[producer id={}, count={}]'.format(idnum, msgnum))


# Function called by the consumer threads
def consumer(idnum):
    # Create an infinite loop
    while True:
        # Simulate a delay
        time.sleep(0.1)
        try:
            # Attempt to get data from the queue. Note that
            # dataQueue.get() will block this thread's execution
            # until data is available
            data = dataQueue.get()
        except queue.Empty:
            pass
        else:
            # Acquire a lock on the console
            with safeprint:
                # Print the data created by the producer thread
                print('consumer ', idnum, ' got => ', data)


if __name__ == '__main__':
    # Create consumers
    for i in range(numconsumers):
        thread.start_new_thread(consumer, (i,))
        
    # Create producers
    for i in range(numproducers):
        thread.start_new_thread(producer, (i,))
        
    # Simulate a delay
    time.sleep(((numproducers - 1) * nummessages) + 1)
    
    # Exit the program
    print('Main thread exit')

Detailed Explanation

This program shows the producer / consumer pattern in action. We begin by defining variables that specify the number of consumer threads (line 2), the number of produce threads (line 3), and the number of messages the producer threads make (line 4). The program creates a lock on line 9 so that only one thread can use the console at the same time. Then on line 12, the queue is created as a global variable.

Our first function, producer, is defined on lines 16-23. There isn’t anything fancy going on in this function. The function simply enters a for-each loop and creates 4 strings that are placed on dataQueue (line 23). Since dataQueue is a FIFO structure, worker threads will remove these Strings from dataQueue in the order they are recieved.

Lines 27-43 define the consumer thread function, consumer. This code enters an infinite loop and removes data from dataQueue and prints the String to the console. Line 36 is the critical piece of code in the consumer function. The call to get() on dataQueue removes the item at the front of the queue and stores it in the variable data. If dataQueue is empty, the consumer thread is blocked until data becomes available.

Alternatively, we could pass false to the optional block parameter on get(). That would cause the thread to continue to execute even if the queue is empty. However, we need to be prepared for situations where the queue is empty and catch the queue.Empty exception that is thrown. Our program calls pass to skip over the exception should this happen (it shouldn’t be the way, because we are using the blocking version of get()).

Lines 48-49 create our producer threads and start them. Lines 52-53 create and start the consumer threads. The producer threads call the produce function while the consumer threads call the consume function. The dataQueue object does the job of synchronizing data between threads. The produce threads write to dataQueue and consumer threads read from it. Thus, our program has created the consumer / producer pattern.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python _thread Basic

Python 3 has the newer thread package, but the _thread package still exists for developers who are more comfortable with the 2.x API. This is a basic example derived from Programming Python: Powerful Object-Oriented Programming that demonstrates how to create threads using the _thread module.

Code

import _thread as thread, time


# This function will run in a new thread
def counter(tid, count):
    for i in range(count):
        # Simulate a delay
        time.sleep(1)
        # Print out the thread id (tid) and the current iteration
        # of our for loop
        print('[{}] => {}'.format(tid, i))


if __name__ == '__main__':
    # Enter a loop that creates 5 threads
    for i in range(5):
        # Start a new thread passing a callable and it's arguments
        # in the form of a tuple
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting')

Explanation

This script creates five new threads using _thread.start_new_thread. Each time a new thread is created, the counter function is called and is passed a tuple of (i, 5). That tuple corresponds to the tid and count parameters of the counter function. Counter enters a loop that runs 5 times since 5 was passed to the second parameter of counter on line 19. It will print the thread id and current iteration of the loop.

Meanwhile, the for loop in the parent thread continues to iterate because thread.start_new_thread does not block the for loop in the main thread. By calling start_new_thread, the program’s execution runs both the for loop in the main thread and the counter function in parallel. Allowing programs to run multiple portions of code at the same time is what gives threads their power. For example, you may wish to use a thread to handle a long running database query while the user continues to interact with the program in the main thread.

One final note about threads in Python. Threads give the appearance of allowing programs to multitask and for all intents and purposes, that is what is happening in the program. Nevertheless, what is really happening is that the Python Virtual Machine is time slicing the computer instructions and allowing a few lines of code to run before switching to another set of instructions.

In other words, if a program has three threads, A, B, and C, then Thread A runs for a few moments, then Thread B, and finally Thread C. Note that there is no guarantee to the order in which threads run. It is possible that one thread may run more often than other threads or that the order of running threads is different each time.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Use Python to Launch Another Program

Many Python programs operate as wrappers to other programs. For example, there are a wide variety of Python programs that work as a GUI wrapper for CLI programs. This tutorial uses an example from Programming Python: Powerful Object-Oriented Programming to demonstrate how a Python program can be used to launch another program.

Code

Parent Process

This code represents a parent process that is capable of launching a child process. I added my own comments to help further explain the code.

import os

# Track the number of spawned children
spawned_children = 0

# Enter an infinite loop
while True:
    
    # Increment the number of children
    spawned_children += 1
    
    # Now launch a new process
    # If we are still in the parent process, pid is non-zero
    # otherwise, pid is zero and indicates that we are in the
    # child process
    pid = os.fork()
    
    # Test if we are in the child process
    if pid == 0:
        
        # Now, since we are in the new process, we use
        # os.execlp to launch a child script.
        os.execlp('python', 'python', 'child.py', str(spawned_children))
        assert False, 'error starting program'
    else:
        # If we are here, then we are still in the parent process
        print('Child is', pid)
        if input() == 'q': break

Child Process

This is a simple script that simply represents a sample child process.

import os, sys

# Parent passes the child process number
# as a command line argument
print('Hello from child', os.getpid(), sys.argv[1])

Explanation

We have two core concepts used in this program. The first concept is forking the parent process into a child process. The other concept is actually launching a new program.

Forking

Forking is the process of launching a copy of the running program. The program that launched a copy of itself is called the parent process, while the copy of the program is called the child process. In theory, a parent and its children can spawn as many children as the underlying operating system allows.

The example program spawns a child process on line 16 and assigns the result to the variable called pid. When os.fork() returns, pid is zero if we are running in the child process or non-zero if we are still in the parent. The program uses an if – else statement (lines 19 and 25 respectively) to perform different operations depending if we are in the parent or child process.

The parent process executes lines 27 and 28. It simply prints out the child’s process id and then asks the user if they wish to continue the program. The user can enter ‘q’ to exit the parent process.

The child process executes code on lines 23 and 24. Line 23 uses os.execlp to start the companion child script and passes the number of spawned children as a command line argument. The child script prints its pid and the number of child processes and then exits.

Launching Programs

The Python os module provides a variety of ways to launch other programs via a Python script. This example makes use of os.execlp which looks for an executable on the operating system’s path. The first argument is the name of the program followed by any number of command line arguments.

The os.execlp is not supposed to return to the caller because if successful, the program is replaced by the new program. This is why it’s necessary to call os.fork() prior to using os.execlp (unless you wish your program to simply end with the new process). When the example script calls os.execlp, the child process created by os.fork is replaced by the new program launched by os.execlp.

Line 24 handles the case when os.execlp fails. Should os.execlp fail, control is returned back to the caller. In most cases, a develop should notify the user that launching the new program has failed. The example script accomplishes this by using an assert False statement with an error message.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.