Kotlin Data Access Object

Even the most trivial of computer programs work with data. In many cases, the user will wish to save data and reuse it between one run of the program and a later session. Since different systems handle data differently, the user may need to save data in multiple formats for compatibility purposes. The program should know how to not only persist data but also read data from a variety of supported formats.

Let’s consider a word processes. When we write a document in a word processor, we may normally save documents in the word processor’s default file format. Later on, we may wish to share the document with another coworker or friend, but they have a different word processor than what we are using. The solution may be to use the word processor’s export feature to convert our document from the word processor’s default format to the file format our friend’s word processor is using.

Converting data isn’t just an issue for desktop programs either. We may have a web application deployed that may store data in an RBDMS but have to export the same data to XML or JSON. Likewise, it’s very likely a web application may need to consume JSON and export data as a CSV format. Clearly, there is plenty of need for an application to be flexible about file formats.

That means we should not tie the persistent mechanism to a data structure. It might seem to go against the OOP principle of encapsulation where each object is responsible for its own data but think about the implications of such a design. If we write the persistence logic into a class, the class is tied to a particular file format. What happens when the class needs to read and write to a different file format? We could add additional persistence mechanisms to the class, but eventually, the class becomes bloated as we add more and more persistence schemes to it.

Another approach might be to define abstract methods for saving and retrieving data, but this is equally as bad of an approach. For one thing, are we really going to create an entire class hierarchy just for reading and writing data? Furthermore, we have no means with which to enforce a certain storage mechanism since all instances of such a class would resolve their storage logic using virtual methods. Finally, we would most likely need to convert instances of one subclass to another subclass in order to switch between storage mechanisms, which could send us down the rabbit hole of deep cloning objects.

If you think about it, storing and retrieving data is a completely separate concern of the application. A data object should never end up being responsible for its own persistence. That needs to be the concern of a completely different class or set of classes. Thus we end up with the data access pattern.

The data access pattern relies on a transfer object to hold the data. The transfer object is passed between the view (which is normally the UI) and the source of the data. Kotlin has a special data class that we can use for this specific purpose. Let’s have a look at an example.

/**
 * This a model class that is persisted to disk in this program
 */
data class Cook(val fName : String,
                val lName : String,
                val position : String,
                val age : Int) : Serializable

Koltin data classes get a built-in implementation of hashCode(), equals(), and toString(). We are free to add additional methods to them if needed. We also have this class implementing Serializable for a later demonstration. The important part to note about the Cook class is that it does not handle its own persistence. That is done by another class that is specific for this purpose.

Once we have our transfer object, we can begin by defining our persistence classes. Since we plan on using different formats for reading and writing data, we should use an interface as an abstraction point.

/**
 * This interface describes the kinds of Data Access Operations available
 */
interface CookDao {
    fun save(c : Cook, outStream: OutputStream)

    fun read(inStream: InputStream) : Cook
}

Our CookDao interface defines two methods. The first method, save(), takes a Cook object and an OuputStream. The Cook object is, of course, the transfer object. As for the OutputStream, it’s best to use OutputStream rather than File because it allows this Dao class to work with non-file streams such as network sockets. The DAO class should be concerned with how the object is persisted and retrieved, but it should not be coupled to a particular source. This keeps it flexible and allows us to send and store data on File Systems, Network Sockets, Http Responses, etc.

The same principle holds true for the read() function as well. This method takes an InputStream, the reciprocal class for OutputStream, and returns a Cook object. This allows us to create a Cook from any input stream. Once again, the source of the InputStream isn’t important to the class, only that it contains the data needed to create a Cook object.

Now that we have our interface, let’s look out our implementations. At first, our application wishes to store Cooks as Serialized Java Objects. Let’s create a version of CookDao that accomplishes this task.

/**
 * Implementation of CookDao that uses JVM serialization
 */
class SerializedCookDao : CookDao{
    override fun save(c: Cook, outStream: OutputStream) {
        val o = ObjectOutputStream(outStream)
        outStream.use {
            o.writeObject(c)
        }
    }

    override fun read(inStream: InputStream) : Cook{
        val i = ObjectInputStream(inStream)
        inStream.use {
            return i.readObject() as Cook
        }
    }
}

The SerializedCookDao implements CookDao and stores Cook as a Serialized Java object. When we wish to restore the Cook, we use the readObject() method found on ObjectInputStream to restore the Cook. Either way, the application at large isn’t concerned about how cook is persisted and restored.

Later on, we decide to support command separated values format or CSV. We don’t need to change any client code to accomplish such as task. All we need to do is define another class that implements CookDao.

/**
 * Implementation of CookDao that converts Cooks to a CSV file
 */
class CsvCookDao : CookDao {

    //Private extension function on Cook for creating
    //CSV files
    private fun Cook.toCSV() : String {
        return "${this.fName},${this.lName},${this.age},${this.position}"
    }

    //Private function to convert a CSV line to a Cook
    private fun parseCsv(csv : String) : Cook{
        val parts = csv.split(",")
        return Cook(fName = parts[0],
                    lName = parts[1],
                    age = parts[2].toInt(),
                    position = parts[3])
    }

    override fun save(c: Cook, outStream: OutputStream) {
        outStream.use {
            PrintWriter(outStream, true).println(c.toCSV())
        }
    }

    override fun read(inStream: InputStream): Cook {
        inStream.use {
            val sc = Scanner(inStream)
            return parseCsv(sc.nextLine())
        }
    }
}

The CsvCookDao class does the job. There are a couple of things to note about this class that makes it more interesting. The first is the extension function Cook.toCSV(). Kotlin extension functions let us add extra functionality to a class. We could have added a toCSV() method to Cook, but let’s think about the implications of such a design. For one thing, only CsvCookDao is concerned about making a Cook into a line of CSV. If we added toCSV() to Cook, then we are saying that all Cook objects should be able to turn themselves into CSV at any time.

That’s bad because we are violating the separation of concerns principle again. We don’t want Cook to concern itself with persistence. That’s the job of our DAO classes. Thus, our CsvCookDao gets a private method to convert Cooks into CSV. Likewise, we have the parseCsv function which also turns CSV back into Cooks. We could violate the separation of concerns by adding a constructor to the Cook class, but that would be equally as bad as adding a toCSV() method.

Now that we have our DAO classes, we need a way to pass Cooks from the view to the DAO. We can define a service class for this purpose.


/**
 * CookService uses Kotlin's delegation mechanism. CookService will have
 * all of the same methods as CookDao but the implementation will depend
 * on the CookDao that was provided
 */
class CookService(private val cookDao: CookDao) : CookDao by cookDao

The CookService class is powerful but incredibly compact. Instances of this class are using Kotlin’s delegation mechanism where all of the methods of CookService are wrapped by methods found in CookDao. It’s this mechanism that lets our application switch between serialization and CSV so easily. The view will use CookService. The CookService is created by passing an instance of CookDao to the constructor of CookService. Whichever CookDao is used will determine the data format the application is currently using!

Let’s have a look at the final portion of the application to see this in action.

fun saveAndRestore(bob : Cook, cookService : CookService, inStream : InputStream, outStream : OutputStream){
    println("Bob before saving => " + bob)

    cookService.save(bob, outStream)

    var restoredBob = cookService.read(inStream)
    println("Bob after restoring => " + restoredBob)
}

fun createIfNeeded(name : String) : File{
    val f = File(name)
    if(!f.exists()){
        f.createNewFile()
    }
    return f
}

fun main(args : Array<String>){
    val bob = Cook("Bob", "Belcher", "Owner", 45)
    println("Using CSV")
    saveAndRestore(bob,
            CookService(CsvCookDao()), //Application is using CSV
            FileInputStream(createIfNeeded("bob.csv")),
            FileOutputStream(createIfNeeded("bob.csv")))
    println()
    println("Using serialization")
    saveAndRestore(bob,
            CookService(SerializedCookDao()), //Application is using Serialization
            FileInputStream(createIfNeeded("bob.ser")),
            FileOutputStream(createIfNeeded("bob.ser")))
}

The saveAndRestore function does the job of saving and restoring a Cook object. However, it has no idea of where or how a Cook is handled. The saveAndRestore function interacts soley with CookService. The CookService objects are created in the main method. When a CsvCookDao is used in the CookService constructor, the Cooks are saved in CSV format. When SerialiazedCookDao is used instead, the application will use JVM serialization to save and restore a Cook.

It’s easy to imagine how easily this program can be extended later on. Suppose we which to support XML. All we need to do is create a new CookDao class that handles the transformation of a Cook object to and from XML. Later on, we would pass this CookDao class to the constructor of CookService and the rest of the application would continue to work. In this fashion, we can easily continue to add additional file formats as needs change.

Putting it Together

Here is the program in its entirety followed by output.

package ch4.dataacesspattern

import java.io.*
import java.util.*

/**
 * This a model class that is persisted to disk in this program
 */
data class Cook(val fName : String,
                val lName : String,
                val position : String,
                val age : Int) : Serializable

/**
 * This interface describes the kinds of Data Access Operations available
 */
interface CookDao {
    fun save(c : Cook, outStream: OutputStream)

    fun read(inStream: InputStream) : Cook
}

/**
 * Implementation of CookDao that uses JVM serialization
 */
class SerializedCookDao : CookDao{
    override fun save(c: Cook, outStream: OutputStream) {
        val o = ObjectOutputStream(outStream)
        outStream.use {
            o.writeObject(c)
        }
    }

    override fun read(inStream: InputStream) : Cook{
        val i = ObjectInputStream(inStream)
        inStream.use {
            return i.readObject() as Cook
        }
    }
}

/**
 * Implementation of CookDao that converts Cooks to a CSV file
 */
class CsvCookDao : CookDao {

    //Private extension function on Cook for creating
    //CSV files
    private fun Cook.toCSV() : String {
        return "${this.fName},${this.lName},${this.age},${this.position}"
    }

    //Private function to convert a CSV line to a Cook
    private fun parseCsv(csv : String) : Cook{
        val parts = csv.split(",")
        return Cook(fName = parts[0],
                    lName = parts[1],
                    age = parts[2].toInt(),
                    position = parts[3])
    }

    override fun save(c: Cook, outStream: OutputStream) {
        outStream.use {
            PrintWriter(outStream, true).println(c.toCSV())
        }
    }

    override fun read(inStream: InputStream): Cook {
        inStream.use {
            val sc = Scanner(inStream)
            return parseCsv(sc.nextLine())
        }
    }
}

/**
 * CookService uses Kotlin's delegation mechanism. CookService will have
 * all of the same methods as CookDao but the implementation will depend
 * on the CookDao that was provided
 */
class CookService(private val cookDao: CookDao) : CookDao by cookDao

fun saveAndRestore(bob : Cook, cookService : CookService, inStream : InputStream, outStream : OutputStream){
    println("Bob before saving => " + bob)

    cookService.save(bob, outStream)

    var restoredBob = cookService.read(inStream)
    println("Bob after restoring => " + restoredBob)
}

fun createIfNeeded(name : String) : File{
    val f = File(name)
    if(!f.exists()){
        f.createNewFile()
    }
    return f
}

fun main(args : Array<String>){
    val bob = Cook("Bob", "Belcher", "Owner", 45)
    println("Using CSV")
    saveAndRestore(bob,
            CookService(CsvCookDao()), //Application is using CSV
            FileInputStream(createIfNeeded("bob.csv")),
            FileOutputStream(createIfNeeded("bob.csv")))
    println()
    println("Using serialization")
    saveAndRestore(bob,
            CookService(SerializedCookDao()), //Application is using Serialization
            FileInputStream(createIfNeeded("bob.ser")),
            FileOutputStream(createIfNeeded("bob.ser")))
}

Output

Using CSV
Bob before saving => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)
Bob after restoring => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)

Using serialization
Bob before saving => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)
Bob after restoring => Cook(fName=Bob, lName=Belcher, position=Owner, age=45)

Python Sockets

Network sockets are extremely useful for interprocess communication (IPC). Not only do network sockets allow processes to communicate on the same machine, but we can also use sockets to communicate over a network. This post shows the most basic demonstration of network sockets using an example borrowed from Programming Python: Powerful Object-Oriented Programming. I added my own comments to help explain the program.

Code

from socket import socket, AF_INET, SOCK_STREAM

port = 50008
host = 'localhost'


# Function to create a server
def server():
    # Create a network socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Bind the socket to localhost with our port
    sock.bind(('', port))

    # Listen for up to 5 connections
    sock.listen(5)
    while True:
        # Wait for a client
        conn, addr = sock.accept()

        # Grab a megabyte of data from the client
        data = conn.recv(1024)

        # Create a reply string
        reply = 'server got: [{}]'.format(data)

        # Send the reply back to the client
        conn.send(reply.encode())


# Function to create a socket client
def client(name):
    # Create a socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Connect the socket to the server
    sock.connect((host, port))

    # Send a message to the server
    sock.send(name.encode())

    # Receive a megabyte of data from the server
    reply = sock.recv(1024)

    # Close our connection
    sock.close()

    # Print the output
    print('Client got: [{}]'.format(reply))


if __name__ == '__main__':
    from threading import Thread

    # Create a thread for the server
    sthread = Thread(target=server)
    sthread.daemon = True
    sthread.start()

    # Create 5 client threads
    for i in range(5):
        Thread(target=client, args=('client{}'.format(i),)).start()

Explanation

The example program creates a basic client / server program. The program uses threads to help keep the program simple. One thread calls the server function defined on lines 8-28 and the remaining five threads call the client function found on line 32-49. The server thread creates a network server that accepts up to five connections from the client threads.

The server function starts by creating a socket object (called sock). On line 13, the program binds our socket to the machines localhost address and the port number specified on line 3. On line 16, the socket waits for up to five connections. Then the server enters a loop on line 17.

Inside of the of the loop, we have a call to sock.accept(). This function accepts a connection from a client and returns a connection and address object. Out program only uses the connection object. The program reads data from the client on line 22 using conn.recv. The conn.recv function takes a number of bytes to read from the client. The conn.recv returns binary information and the program stores it in the data varaible. Lines 25 and 28 show how to send information back to the client using conn.send. The conn.send function expects binary information, which is why we call encode() on the reply variable.

The client function acts almost exactly like the server function. The socket client is created on line 34. We use the connect function (line 37) to connect to the server and pass it a tuple containing the host and the port number. Unlike the server, which has its own dedicated connection object, the client uses the socket object itself to send and receive information to and from the server. On line 40, the program calls sock.send() and passes it a binary string to send to the server. The response from the server is collected on line 43 using sock.recv(). When the client is finished, it needs to close its connection to the server using sock.close().

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Basic Pipes

Python provides two main avenues of parallel processing. One avenue is to use multithreading where a program itself multitasks, while the other approach is to have a program relaunch itself as a separate program in a new process. One approach is not necessarily better than the other approach but instead, should be throught of as tools for different use cases. Threads have low overhead and share a program’s memory space, which allows for easy communication between threads. Processes operate as if we launched a new copy of the program from our operating system and allow programs to spread themselves out over an operating system or even a network.

However, processes do not share a global memory space, which means they need a way to communicate with one another. One approach to interprocess communication (IPC) is to use pipes. This post shows an example of IPC using pipes taken from Programming Python: Powerful Object-Oriented Programming. I have added my own comments to the code for clarity.

Code

import os, time


# Function called by child processes
def child(pipeout):
    zzz = 0
    while True:
        time.sleep(zzz)

        # We have to encode our string to binary to use
        # with pipes
        msg = ('Spam {}'.format(zzz)).encode()

        # Send the data back to the parent process
        os.write(pipeout, msg)
        zzz = (zzz + 1) % 5


def parent():
    # Creates our pipes. The pipeout gets passed to the child
    # process while parent keeps pipein
    pipein, pipeout = os.pipe()

    if os.fork() == 0:
        # We are now in the child process so call child and supply
        # it with pipeout so that it can send information back to
        # the parent.
        child(pipeout)
    else:
        # This is the parent process
        while True:
            # Read data from the child process
            # This call blocks until there is data
            line = os.read(pipein, 32)

            # Print to the console
            print('Parent {} got [{}] as {}'.format(os.getpid(), line, time.time()))


if __name__ == '__main__':
    parent()

Explanation

We have two functions in the program named child() and parent(). The child() function is intended to run in child processes while parent() contains the main program. Parent() is defined on lines 19-37. The function begins by calling os.pipe() on line 22 which returns a tuple containing two ends of a single pipe. Pipes are unidirectional and thus pipein is used by the parent to read data that comes from the child process. The child process uses pipeout to send data to the parent.

The program forks into two different processes on line 24. The program is in the child process when os.fork() returns zero. Line 28 calls the child() function and passes pipeout to the child function so that the child process can send data back to the parent. The child process enters an infinite loop on line 7. On line 12, a msg variable is created that contains a String variable. Pipe send binary data, so we have to call encode() on the String to convert it to a binary string. Then on line 15, we send the msg varaiable back to the parent using os.write and supplying pipeout and msg to that function.

The parent process continues on line 31. It attempts to read data from the child process on line 34 using os.read. Notice that os.read requires a pipein variable and the size of binary data to read (32 bytes in this program). If the pipe contains data, os.read returns immedialy and stores the value in the line variable. Otherwise, os.read blocks the program until the pipe has data. The parent process prints the data on line 37.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Producer Consumer with Queue

The producer / consumer pattern is a common programming construct used in multithreaded applications where one thread acts as a producer of data while other threads consume the data. A web crawler application is a use case of the producer / consumer pattern. For example, the application may have a thread dedicated to crawling the web that gathers data (producer) while other threads index and store the data (consumers).

Producer and consumer threads need a way to share data. Python’s queue module provides one of many solutions. The Queue object is a FIFO object that lets the produce thread place data on the queue. Consumer threads are blocked by the Queue until the Queue has data for the consumer thread to read. When data becomes available, the consumer thread removes data from the Queue and does its work.

Below is an example program borrowed from Programming Python: Powerful Object-Oriented Programming that shows how to use a Queue to synchronize data between producer and consumer threads. I added my own comments to the code to help explain what is happening in the program.

Code

# Specify the number of consumer and producer threads
numconsumers = 2
numproducers = 4
nummessages = 4

import _thread as thread, queue, time

# Create a lock so that only one thread writes to the console at a time
safeprint = thread.allocate_lock()

# Create a queue object
dataQueue = queue.Queue()


# Function called by the producer thread
def producer(idnum):
    # Produce 4 messages to place on the queue
    for msgnum in range(nummessages):
        # Simulate a delay
        time.sleep(idnum)

        # Put a String on the queue
        dataQueue.put('[producer id={}, count={}]'.format(idnum, msgnum))


# Function called by the consumer threads
def consumer(idnum):
    # Create an infinite loop
    while True:
        # Simulate a delay
        time.sleep(0.1)
        try:
            # Attempt to get data from the queue. Note that
            # dataQueue.get() will block this thread's execution
            # until data is available
            data = dataQueue.get()
        except queue.Empty:
            pass
        else:
            # Acquire a lock on the console
            with safeprint:
                # Print the data created by the producer thread
                print('consumer ', idnum, ' got => ', data)


if __name__ == '__main__':
    # Create consumers
    for i in range(numconsumers):
        thread.start_new_thread(consumer, (i,))
        
    # Create producers
    for i in range(numproducers):
        thread.start_new_thread(producer, (i,))
        
    # Simulate a delay
    time.sleep(((numproducers - 1) * nummessages) + 1)
    
    # Exit the program
    print('Main thread exit')

Detailed Explanation

This program shows the producer / consumer pattern in action. We begin by defining variables that specify the number of consumer threads (line 2), the number of produce threads (line 3), and the number of messages the producer threads make (line 4). The program creates a lock on line 9 so that only one thread can use the console at the same time. Then on line 12, the queue is created as a global variable.

Our first function, producer, is defined on lines 16-23. There isn’t anything fancy going on in this function. The function simply enters a for-each loop and creates 4 strings that are placed on dataQueue (line 23). Since dataQueue is a FIFO structure, worker threads will remove these Strings from dataQueue in the order they are recieved.

Lines 27-43 define the consumer thread function, consumer. This code enters an infinite loop and removes data from dataQueue and prints the String to the console. Line 36 is the critical piece of code in the consumer function. The call to get() on dataQueue removes the item at the front of the queue and stores it in the variable data. If dataQueue is empty, the consumer thread is blocked until data becomes available.

Alternatively, we could pass false to the optional block parameter on get(). That would cause the thread to continue to execute even if the queue is empty. However, we need to be prepared for situations where the queue is empty and catch the queue.Empty exception that is thrown. Our program calls pass to skip over the exception should this happen (it shouldn’t be the way, because we are using the blocking version of get()).

Lines 48-49 create our producer threads and start them. Lines 52-53 create and start the consumer threads. The producer threads call the produce function while the consumer threads call the consume function. The dataQueue object does the job of synchronizing data between threads. The produce threads write to dataQueue and consumer threads read from it. Thus, our program has created the consumer / producer pattern.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python _thread Basic

Python 3 has the newer thread package, but the _thread package still exists for developers who are more comfortable with the 2.x API. This is a basic example derived from Programming Python: Powerful Object-Oriented Programming that demonstrates how to create threads using the _thread module.

Code

import _thread as thread, time


# This function will run in a new thread
def counter(tid, count):
    for i in range(count):
        # Simulate a delay
        time.sleep(1)
        # Print out the thread id (tid) and the current iteration
        # of our for loop
        print('[{}] => {}'.format(tid, i))


if __name__ == '__main__':
    # Enter a loop that creates 5 threads
    for i in range(5):
        # Start a new thread passing a callable and it's arguments
        # in the form of a tuple
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting')

Explanation

This script creates five new threads using _thread.start_new_thread. Each time a new thread is created, the counter function is called and is passed a tuple of (i, 5). That tuple corresponds to the tid and count parameters of the counter function. Counter enters a loop that runs 5 times since 5 was passed to the second parameter of counter on line 19. It will print the thread id and current iteration of the loop.

Meanwhile, the for loop in the parent thread continues to iterate because thread.start_new_thread does not block the for loop in the main thread. By calling start_new_thread, the program’s execution runs both the for loop in the main thread and the counter function in parallel. Allowing programs to run multiple portions of code at the same time is what gives threads their power. For example, you may wish to use a thread to handle a long running database query while the user continues to interact with the program in the main thread.

One final note about threads in Python. Threads give the appearance of allowing programs to multitask and for all intents and purposes, that is what is happening in the program. Nevertheless, what is really happening is that the Python Virtual Machine is time slicing the computer instructions and allowing a few lines of code to run before switching to another set of instructions.

In other words, if a program has three threads, A, B, and C, then Thread A runs for a few moments, then Thread B, and finally Thread C. Note that there is no guarantee to the order in which threads run. It is possible that one thread may run more often than other threads or that the order of running threads is different each time.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Use Python to Launch Another Program

Many Python programs operate as wrappers to other programs. For example, there are a wide variety of Python programs that work as a GUI wrapper for CLI programs. This tutorial uses an example from Programming Python: Powerful Object-Oriented Programming to demonstrate how a Python program can be used to launch another program.

Code

Parent Process

This code represents a parent process that is capable of launching a child process. I added my own comments to help further explain the code.

import os

# Track the number of spawned children
spawned_children = 0

# Enter an infinite loop
while True:
    
    # Increment the number of children
    spawned_children += 1
    
    # Now launch a new process
    # If we are still in the parent process, pid is non-zero
    # otherwise, pid is zero and indicates that we are in the
    # child process
    pid = os.fork()
    
    # Test if we are in the child process
    if pid == 0:
        
        # Now, since we are in the new process, we use
        # os.execlp to launch a child script.
        os.execlp('python', 'python', 'child.py', str(spawned_children))
        assert False, 'error starting program'
    else:
        # If we are here, then we are still in the parent process
        print('Child is', pid)
        if input() == 'q': break

Child Process

This is a simple script that simply represents a sample child process.

import os, sys

# Parent passes the child process number
# as a command line argument
print('Hello from child', os.getpid(), sys.argv[1])

Explanation

We have two core concepts used in this program. The first concept is forking the parent process into a child process. The other concept is actually launching a new program.

Forking

Forking is the process of launching a copy of the running program. The program that launched a copy of itself is called the parent process, while the copy of the program is called the child process. In theory, a parent and its children can spawn as many children as the underlying operating system allows.

The example program spawns a child process on line 16 and assigns the result to the variable called pid. When os.fork() returns, pid is zero if we are running in the child process or non-zero if we are still in the parent. The program uses an if – else statement (lines 19 and 25 respectively) to perform different operations depending if we are in the parent or child process.

The parent process executes lines 27 and 28. It simply prints out the child’s process id and then asks the user if they wish to continue the program. The user can enter ‘q’ to exit the parent process.

The child process executes code on lines 23 and 24. Line 23 uses os.execlp to start the companion child script and passes the number of spawned children as a command line argument. The child script prints its pid and the number of child processes and then exits.

Launching Programs

The Python os module provides a variety of ways to launch other programs via a Python script. This example makes use of os.execlp which looks for an executable on the operating system’s path. The first argument is the name of the program followed by any number of command line arguments.

The os.execlp is not supposed to return to the caller because if successful, the program is replaced by the new program. This is why it’s necessary to call os.fork() prior to using os.execlp (unless you wish your program to simply end with the new process). When the example script calls os.execlp, the child process created by os.fork is replaced by the new program launched by os.execlp.

Line 24 handles the case when os.execlp fails. Should os.execlp fail, control is returned back to the caller. In most cases, a develop should notify the user that launching the new program has failed. The example script accomplishes this by using an assert False statement with an error message.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Spring Boot Caching with Kotlin

It’s fairly common for applications to continually ask a datastore for the same information repeatedly. Requests to datastores consume application resources and thus have a performance cost even when the requested data is small. The Spring Platform provides a solution allows applications to store information in an in memory caching system that allows applications to check the cache for the required data prior to making a call to the database. This example shows how to use Spring Boot and Kotlin to cache files that we are storing in the database.

Database Entity

We are going to define a database entity that stores files in a database. Since retrieving such data can be an expesive call to the database, we are going to cache this entity.

@Entity
data class PersistedFile(
        @field: Id @field: GeneratedValue var id : Long = 0,
        var fileName : String = "",
        var mime : String = "",
        @field : Lob var bytes : ByteArray? = null)

You will notice that this class has a ByteArray field that is stored as a LOB in the database. In theory, this could be as many bytes as the system allows so ideally we would store this in cache. Other good candidates are entity classes that have complex object graphs and may result in the ORM generated complex SQL to retreive the managed object.

Enable Caching

Spring Boot defines a CachingManager internally for the application. You are free to use your own, but you need to configure your Spring Boot environment first.

Dependencies

You need to have spring-boot-starter-cache in your pom.xml or other dependency manager.


    org.springframework.boot
    spring-boot-starter-web

Annotation

You also need to tell the environment to turn on caching by using the @EnableCaching

@SpringBootApplication
@EnableJpaRepositories
@EnableCaching  //Spring Boot provides a CacheManager our of the box
                //but it only turns on when this annotation is present
class CachingTutorialApplication

Decorate the Caching Methods

At this point, we only need to decorate the methods we want the environment cache. This is done by decorating our methods with the @Cacheable annotation and then providing the annotation with the name of a cache. We can also optionally tell the cache manager what to use for the key. Here is the code for our service class followed by an explanation.

//We are going to use this class to handle caching of our PersistedFile object
//Normally, we would encapsulate our repository, but we are leaving it public to keep the code down
@Service
class PersistedFileService(@Autowired val persistedFileRepository: PersistedFileRepository){

    //This annotation will cause the cache to store a persistedFile in memory
    //so that the program doesn't have to hit the DB each time for the file.
    //This will result in faster page load times. Since we know that managed objects
    //have unique primary keys, we can just use the primary key for the cache key
    @Cacheable(cacheNames = arrayOf("persistedFile"), key="#id")
    fun findOne(id : Long) : PersistedFile = persistedFileRepository.findOne(id)

    //This annotation will cause the cache to store persistedFile ids
    //By storing the ids, we don't need to hit the DB to know if a file exists first
    @Cacheable(cacheNames = arrayOf("persistedIds"))
    fun exists(id: Long?): Boolean = persistedFileRepository.exists(id)
}

The first method, findOne, is used to look up a persistedFile object from the database. You will notice that we pass persistedFile as an argument to cacheNames and then use the primary key as the key for this item’s cache. We can use the PK because we know it’s a unique value so we can help make the cache more performant. However, keep in mind that the key is optional.

We can also avoid another call to the database by storing if items exist in the database in the cache. The first time exists() is called, the application will fire a count sql statement to the database. On subsequent calls, the cache will simply return true or false depending on what is stored in the cache.

Putting it all together

I put together a small web application that demonstates the caching working together. I turned on the show sql property in the applications.properties file so that viewers can see when the application is making calls to the database. You will notice that the first time I retreive the persisted file, there is sql generated. However, on the second call to the same object, no sql is generated because the application isn’t making a call to the database.

You can get the complete code from my GitHub page at this link.

Here are some links to posts that are related to concepts used in Spring Boot that we used today.

Spring Boot JPA Kotlin

Spring Boot provides a ready made solution to working with Java Persistence API (JPA). The post discusses how to make a basic web application that reads and writes employees to a database. We can also count how many employees have a certain name. This will all be done using Spring’s JPA features and the Kotlin programming language.

Spring Boot provides us with a data source on its own with very little configuration. Of course, we are free to connect the application to remote databases as well. To get started, we need to fill out some properties in the application.properties file found in src/main/resources

application.properties

spring.jpa.hibernate.ddl-auto=create-drop
spring.jpa.properties.hibernate.current_session_context_class=org.springframework.orm.hibernate5.SpringSessionContext
spring.datasource.driver-class-name=org.hsqldb.jdbcDriver

The first property, spring.jpa.hibernate.ddl-auto=create-drop tells the application to scan for all classes annotated with @Entity and create database tables for us. The persistence provider does the work of generating database definition language (DDL) and creating our database schema for us.

The second property configures Hibernate to act as our persistence provider. This is required because JPA is a specification. It requires a 3rd party library (Hibernate, EclipseLink, etc) to actually implement the specification. Finally, we need to tell the application what JDBC driver to use. This should match our database. In this case, we are using HSQLDB so we load their JDBC driver.

Database Entity

We need to create one or more persistant objects that are mapped to the database. In this case, we only have 1, an Employee Class, that we are using to map to a database table.

//This class maps to a table in the database
//that will get created for us
@Entity
data class Employee(
       @field: Id @field: GeneratedValue var Id : Long = 0, //Primary Key
       var name : String = "", //Column
       var position : String = "") //Column

Kotlin provides data classes for these sort of situations. The first property is annotated with Id and serves as the Primary Key. The GeneratedValue annotation tells the persistence provider to generate primary key values for us. The other two properties, name and position, end up becoming columns in the database table. When persistence provider scans this class, it will issue the correct commands to the database and generate an employee table with an primary key columna and two VARCHAR columns. Each instance of the Employee class that we store will become a record in the table.

Automatic Repositories

Spring is capable of generating @Repository classes for use when working with JPA. These repositories come fully loaded with 18 methods that handle all of our CRUD (create, read, update, and delete) methods and provide container managed transactions. We can even create our own custom queries using a naming convention and Spring will infer what needs to be done.

However, before we can have Spring generate our repositories for us, we need to tell it to do so. That’s pretty easy because all we need is a small configuration class.

Config

@Configuration
//The next line tells Spring Generate our JPA Repositories
@EnableJpaRepositories(basePackages = arrayOf("com.stonesoupprogramming.jpa"))
class Config

The package passed to the @EnableJpaReposities tells Spring where to look for repository interfaces. In order to make a Repository for the Employee class, we only need to declare an interface that extends JpaRepository.

EmployeeRepository

//The Implementation for this class is generated
//by Spring Data!
interface EmployeeRepository : JpaRepository <Employee, Long>{

    //Define a custom query using Spring Data
    fun countByNameContainingIgnoringCase(name : String) : Long
}

At no point will we ever write an implementation for this interface. When Spring sees this interface, it will generate an implementation class that is fully loaded and ready for our application to use. Technically, this interface could be empty, but we do have one method countByNameContainingIgnoringCase(String). Let’s discuss it.

Spring JPA Repositories are capable of defining queries on our persiteted objects provided that we follow the proper naming convention. Let’s take apart countByNameContainingIgnoringCase and discuss what each part means.

  • count — We are defining a count query
  • ByName — The syntax here is By[Property]. Our Employee class has a Name property, so we write ByName. If we wanted to use Position instead, it would be ByPosition
  • ContainingIgnoringCase — This is the predicate of the query. We are looking for anything containing a string value (in this case) and we are ignoring the case.

So in the end countByNameContainingIgnoringCase defines a query that means what it says. We are going to get a count of all records where the name contains a certain name and the name is not case sensitive. Spring is able to parse this name and create the correct query for us.

Put it in Action!

I wrote an MVC application that demonstrates how to use these concepts in a web application. Here is the code for the controller.

@Controller
@RequestMapping("/")
class IndexController(@Autowired private val employeeRepository: EmployeeRepository) {

    @RequestMapping(method = arrayOf(RequestMethod.GET))
    fun doGet(model : Model) : String {
        model.apply {
            addAttribute("employee", Employee())
            addAttribute("showName", false)
            addAttribute("employees", employeeRepository.findAll().toList())
        }
        return "index"
    }

    @RequestMapping("/employee_save", method = arrayOf(RequestMethod.POST))
    fun doEmployeeSave(employee: Employee,
                       model : Model) : String {
        employeeRepository.save(employee)
        model.apply {
            addAttribute("employee", Employee())
            addAttribute("showName", false)
            addAttribute("employees", employeeRepository.findAll().toList())
        }
        return "index"
    }

    @RequestMapping("/employee_count", method = arrayOf(RequestMethod.POST))
    fun doEmployeeCount(@RequestParam("name") name : String,
                        model : Model) : String {
        val count = employeeRepository.countByNameContainingIgnoringCase(name)
        model.apply {
            addAttribute("employee", Employee())
            addAttribute("showName", true)
            addAttribute("count", "Number of employees having name $name: $count")
            addAttribute("employees", employeeRepository.findAll().toList())
        }
        return "index"
    }
}

Even though we never wrote an implementation for EmployeeRepository, we can safely inject an instance of EmployeeRepository into our controller class. From this point, we have an HTTP GET method and two POST methods. The doEmployeeSave calls employeeRepository.save() and saves the incoming Employee object to the database. It also calls employeeRepository.findAll() and sends all employee records back to the view.

The doEmployeeCount calls our custom employeeRepository.countByNameContainingIgnoringCase method and returns a count of how many employee records contain the given name. We can pass this number back to the view. Once again, we are using employeeRepository.findAll().

This is the HTML code that works with the IndexController class.
indexhtml1indexcontroller2

Conclusion

The JPA cababilities provided by Spring Boot make developing ORM applications a breeze and it’s worth while to leverage them. For one thing, we only have to write a fraction of the code that we might have to write otherwise, but we are also less likely to introduce bugs into the application because we can trust the implementation of the JPA Repository classes and the persistence provided SQL generating capabilities.

Here are some screen shots of the finished application.


You can download the code at my GitHub page here or visit the YouTube tutorial.

Kotlin Stream Image from Database

Many web applications allow users to store images for later. For example, you may want to allow users to upload a profile picture that gets displayed later on in the application. This post demonstrates how to upload an image to a web application and store the image in a database. Then we will see how to display that image in a browser.

PersistedImage

The key to storing an image in a database is to use @Lob annotation in JPA and make the datatype as a byte array. Here is an example class that stores byte array in the database.

@Entity
data class PersistedImage(@field: Id @field: GeneratedValue var id : Long = 0,
                          //The bytes field needs to be marked as @Lob for Large Object Binary
                          @field: Lob var bytes : ByteArray? = null,
                          var mime : String = ""){

    fun toStreamingURI() : String {
        //We need to encode the byte array into a base64 String for the browser
        val base64 = DatatypeConverter.printBase64Binary(bytes)

        //Now just return a data string. The Browser will know what to do with it
        return "data:$mime;base64,$base64"
    }

Kotlin has a ByteArray class. In Java you would use byte []. The effect is the same either way. When persistence provider scans this class, it will store the byte array as a Lob in the database. Nevertheless it’s not enough to simply store an image in the database. At some point in time, the user will most likely wish to see the image. That’s there the toStreamingURI() method comes in handy.

The first line uses DatatypeConverter to convert the byte array to a base64 string. Then we can append that string to “data:[mime];base64,[base 64]”. In our example, we use Kotlin’s String template feature to build such a String. We start with the data: followed by the mime (such as /img/png). Then we can add the base64 string created by DatatypeConverter. This string can get added to the src attribute of the html img tag as shown in the screen shot below.

base64string
The browser knows how to display this string as an image.

File Uploads

It’s worth while to discuss how files are upload in Spring. Spring has a MultipartFile class that can get mapped to the a file upload input tag in the form. Here is how it looks in the HTML code.
fileuploadform
There are a couple of things that are critical for this to work. First, we have to set our applications.properties file to allow large file uploads.

spring.http.multipart.max-file-size=25MB
spring.http.multipart.max-request-size=25MB

Next our form tag has to set the enctype attribute to “multipart/form-data”. Finally we have to keep track of the name attribute on our input tag so that we can map it to the server code. In our example, our input tag has it’s name attribute set to “image”.

On the server end, we use this code get an instance of MultipartFile.

@RequestMapping(method = arrayOf(RequestMethod.POST))
    fun doPost(
            //Grab the uploaded image from the form
            @RequestPart("image") multiPartFile : MultipartFile,
               model : Model) : String {
        //Save the image file
        imageService.save(multiPartFile.toPersistedImage())
        model.addAttribute("images", imageService.loadAll())
        return "index"
    }

We annotate the multipartFile parameter with @RequestPart and pass to the annotation the same name attribute that we set on our input tag. At this point, the container will inject an instance of MultipartFile that represents the file that the user uploaded to the server. The MultipartFile class has two attributes that are critical to our purposes. First it has a byte array property that represents the bytes of the uploaded file and it has the file’s MIME.

We can use Kotlin’s extension functions to add a toPersistedImage() method on MutlipartFile.

fun MultipartFile.toPersistedImage() = PersistedImage(bytes = this.bytes, mime = this.contentType)

This method simply returns an instance of PersisitedImage that can get stored in the database. At this point, we can easily store and retrieve the image from the database.

Application

The demonstration application is a regular Spring MVC application written in Kotlin. You can refer to this post on an explanation on how this works. Here is the Kotlin code followed by the HTML code.

Kotlin Code

package com.stonesoupprogramming.streamimage

import org.hibernate.SessionFactory
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.boot.SpringApplication
import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.stereotype.Controller
import org.springframework.stereotype.Repository
import org.springframework.stereotype.Service
import org.springframework.ui.Model
import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RequestMethod
import org.springframework.web.bind.annotation.RequestPart
import org.springframework.web.multipart.MultipartFile
import javax.persistence.*
import javax.transaction.Transactional
import javax.xml.bind.DatatypeConverter

@SpringBootApplication
class StreamImageDbApplication

fun main(args: Array) {
    SpringApplication.run(StreamImageDbApplication::class.java, *args)
}

@Entity
data class PersistedImage(@field: Id @field: GeneratedValue var id : Long = 0,
                          //The bytes field needs to be marked as @Lob for Large Object Binary
                          @field: Lob var bytes : ByteArray? = null,
                          var mime : String = ""){

    fun toStreamingURI() : String {
        //We need to encode the byte array into a base64 String for the browser
        val base64 = DatatypeConverter.printBase64Binary(bytes)

        //Now just return a data string. The Browser will know what to do with it
        return "data:$mime;base64,$base64"
    }
}

//This is a Kotlin extension function that turns a MultipartFile into a PersistedImage
fun MultipartFile.toPersistedImage() = PersistedImage(bytes = this.bytes, mime = this.contentType)

@Configuration
class DataConfig {

    @Bean
    fun sessionFactory(@Autowired entityManagerFactory: EntityManagerFactory) :
             SessionFactory = entityManagerFactory.unwrap(SessionFactory::class.java)
}

@Repository
class ImageRepository(@Autowired private val sessionFactory: SessionFactory){

    fun save(persistedImage: PersistedImage) {
        sessionFactory.currentSession.saveOrUpdate(persistedImage)
    }

    fun loadAll() = sessionFactory.currentSession.createCriteria(PersistedImage::class.java).list() as List
}

@Transactional
@Service
class ImageService(@Autowired private val imageRepository: ImageRepository){

    fun save(persistedImage: PersistedImage) {
        imageRepository.save(persistedImage)
    }

    fun loadAll() = imageRepository.loadAll()
}

@Controller
@RequestMapping("/")
class IndexController(@Autowired private val imageService: ImageService){

    @RequestMapping(method = arrayOf(RequestMethod.GET))
    fun doGet(model : Model) : String {
        model.addAttribute("images", imageService.loadAll())
        return "index"
    }

    @RequestMapping(method = arrayOf(RequestMethod.POST))
    fun doPost(
            //Grab the uploaded image from the form
            @RequestPart("image") multiPartFile : MultipartFile,
               model : Model) : String {
        //Save the image file
        imageService.save(multiPartFile.toPersistedImage())
        model.addAttribute("images", imageService.loadAll())
        return "index"
    }
}

application.properties

spring.jpa.hibernate.ddl-auto=create-drop
spring.jpa.properties.hibernate.current_session_context_class=org.springframework.orm.hibernate5.SpringSessionContext
spring.datasource.driver-class-name=org.hsqldb.jdbcDriver

spring.http.multipart.max-file-size=25MB
spring.http.multipart.max-request-size=25MB

index.html

streamimage

Conclusion

Spring and Kotlin make it easy to embed images in a database and display those images in a browser. The main take away is to define a byte array property as a Lob on persisted image and then convert it to a base64 String when you wish to display it. Here are some screen shots of the working application.


You can get the source code for this project at my GitHub here or watch the video tutorial on YouTube.