Python Sockets

Network sockets are extremely useful for interprocess communication (IPC). Not only do network sockets allow processes to communicate on the same machine, but we can also use sockets to communicate over a network. This post shows the most basic demonstration of network sockets using an example borrowed from Programming Python: Powerful Object-Oriented Programming. I added my own comments to help explain the program.

Code

from socket import socket, AF_INET, SOCK_STREAM

port = 50008
host = 'localhost'


# Function to create a server
def server():
    # Create a network socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Bind the socket to localhost with our port
    sock.bind(('', port))

    # Listen for up to 5 connections
    sock.listen(5)
    while True:
        # Wait for a client
        conn, addr = sock.accept()

        # Grab a megabyte of data from the client
        data = conn.recv(1024)

        # Create a reply string
        reply = 'server got: [{}]'.format(data)

        # Send the reply back to the client
        conn.send(reply.encode())


# Function to create a socket client
def client(name):
    # Create a socket
    sock = socket(AF_INET, SOCK_STREAM)

    # Connect the socket to the server
    sock.connect((host, port))

    # Send a message to the server
    sock.send(name.encode())

    # Receive a megabyte of data from the server
    reply = sock.recv(1024)

    # Close our connection
    sock.close()

    # Print the output
    print('Client got: [{}]'.format(reply))


if __name__ == '__main__':
    from threading import Thread

    # Create a thread for the server
    sthread = Thread(target=server)
    sthread.daemon = True
    sthread.start()

    # Create 5 client threads
    for i in range(5):
        Thread(target=client, args=('client{}'.format(i),)).start()

Explanation

The example program creates a basic client / server program. The program uses threads to help keep the program simple. One thread calls the server function defined on lines 8-28 and the remaining five threads call the client function found on line 32-49. The server thread creates a network server that accepts up to five connections from the client threads.

The server function starts by creating a socket object (called sock). On line 13, the program binds our socket to the machines localhost address and the port number specified on line 3. On line 16, the socket waits for up to five connections. Then the server enters a loop on line 17.

Inside of the of the loop, we have a call to sock.accept(). This function accepts a connection from a client and returns a connection and address object. Out program only uses the connection object. The program reads data from the client on line 22 using conn.recv. The conn.recv function takes a number of bytes to read from the client. The conn.recv returns binary information and the program stores it in the data varaible. Lines 25 and 28 show how to send information back to the client using conn.send. The conn.send function expects binary information, which is why we call encode() on the reply variable.

The client function acts almost exactly like the server function. The socket client is created on line 34. We use the connect function (line 37) to connect to the server and pass it a tuple containing the host and the port number. Unlike the server, which has its own dedicated connection object, the client uses the socket object itself to send and receive information to and from the server. On line 40, the program calls sock.send() and passes it a binary string to send to the server. The response from the server is collected on line 43 using sock.recv(). When the client is finished, it needs to close its connection to the server using sock.close().

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Advertisements

Python Basic Pipes

Python provides two main avenues of parallel processing. One avenue is to use multithreading where a program itself multitasks, while the other approach is to have a program relaunch itself as a separate program in a new process. One approach is not necessarily better than the other approach but instead, should be throught of as tools for different use cases. Threads have low overhead and share a program’s memory space, which allows for easy communication between threads. Processes operate as if we launched a new copy of the program from our operating system and allow programs to spread themselves out over an operating system or even a network.

However, processes do not share a global memory space, which means they need a way to communicate with one another. One approach to interprocess communication (IPC) is to use pipes. This post shows an example of IPC using pipes taken from Programming Python: Powerful Object-Oriented Programming. I have added my own comments to the code for clarity.

Code

import os, time


# Function called by child processes
def child(pipeout):
    zzz = 0
    while True:
        time.sleep(zzz)

        # We have to encode our string to binary to use
        # with pipes
        msg = ('Spam {}'.format(zzz)).encode()

        # Send the data back to the parent process
        os.write(pipeout, msg)
        zzz = (zzz + 1) % 5


def parent():
    # Creates our pipes. The pipeout gets passed to the child
    # process while parent keeps pipein
    pipein, pipeout = os.pipe()

    if os.fork() == 0:
        # We are now in the child process so call child and supply
        # it with pipeout so that it can send information back to
        # the parent.
        child(pipeout)
    else:
        # This is the parent process
        while True:
            # Read data from the child process
            # This call blocks until there is data
            line = os.read(pipein, 32)

            # Print to the console
            print('Parent {} got [{}] as {}'.format(os.getpid(), line, time.time()))


if __name__ == '__main__':
    parent()

Explanation

We have two functions in the program named child() and parent(). The child() function is intended to run in child processes while parent() contains the main program. Parent() is defined on lines 19-37. The function begins by calling os.pipe() on line 22 which returns a tuple containing two ends of a single pipe. Pipes are unidirectional and thus pipein is used by the parent to read data that comes from the child process. The child process uses pipeout to send data to the parent.

The program forks into two different processes on line 24. The program is in the child process when os.fork() returns zero. Line 28 calls the child() function and passes pipeout to the child function so that the child process can send data back to the parent. The child process enters an infinite loop on line 7. On line 12, a msg variable is created that contains a String variable. Pipe send binary data, so we have to call encode() on the String to convert it to a binary string. Then on line 15, we send the msg varaiable back to the parent using os.write and supplying pipeout and msg to that function.

The parent process continues on line 31. It attempts to read data from the child process on line 34 using os.read. Notice that os.read requires a pipein variable and the size of binary data to read (32 bytes in this program). If the pipe contains data, os.read returns immedialy and stores the value in the line variable. Otherwise, os.read blocks the program until the pipe has data. The parent process prints the data on line 37.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Producer Consumer with Queue

The producer / consumer pattern is a common programming construct used in multithreaded applications where one thread acts as a producer of data while other threads consume the data. A web crawler application is a use case of the producer / consumer pattern. For example, the application may have a thread dedicated to crawling the web that gathers data (producer) while other threads index and store the data (consumers).

Producer and consumer threads need a way to share data. Python’s queue module provides one of many solutions. The Queue object is a FIFO object that lets the produce thread place data on the queue. Consumer threads are blocked by the Queue until the Queue has data for the consumer thread to read. When data becomes available, the consumer thread removes data from the Queue and does its work.

Below is an example program borrowed from Programming Python: Powerful Object-Oriented Programming that shows how to use a Queue to synchronize data between producer and consumer threads. I added my own comments to the code to help explain what is happening in the program.

Code

# Specify the number of consumer and producer threads
numconsumers = 2
numproducers = 4
nummessages = 4

import _thread as thread, queue, time

# Create a lock so that only one thread writes to the console at a time
safeprint = thread.allocate_lock()

# Create a queue object
dataQueue = queue.Queue()


# Function called by the producer thread
def producer(idnum):
    # Produce 4 messages to place on the queue
    for msgnum in range(nummessages):
        # Simulate a delay
        time.sleep(idnum)

        # Put a String on the queue
        dataQueue.put('[producer id={}, count={}]'.format(idnum, msgnum))


# Function called by the consumer threads
def consumer(idnum):
    # Create an infinite loop
    while True:
        # Simulate a delay
        time.sleep(0.1)
        try:
            # Attempt to get data from the queue. Note that
            # dataQueue.get() will block this thread's execution
            # until data is available
            data = dataQueue.get()
        except queue.Empty:
            pass
        else:
            # Acquire a lock on the console
            with safeprint:
                # Print the data created by the producer thread
                print('consumer ', idnum, ' got => ', data)


if __name__ == '__main__':
    # Create consumers
    for i in range(numconsumers):
        thread.start_new_thread(consumer, (i,))
        
    # Create producers
    for i in range(numproducers):
        thread.start_new_thread(producer, (i,))
        
    # Simulate a delay
    time.sleep(((numproducers - 1) * nummessages) + 1)
    
    # Exit the program
    print('Main thread exit')

Detailed Explanation

This program shows the producer / consumer pattern in action. We begin by defining variables that specify the number of consumer threads (line 2), the number of produce threads (line 3), and the number of messages the producer threads make (line 4). The program creates a lock on line 9 so that only one thread can use the console at the same time. Then on line 12, the queue is created as a global variable.

Our first function, producer, is defined on lines 16-23. There isn’t anything fancy going on in this function. The function simply enters a for-each loop and creates 4 strings that are placed on dataQueue (line 23). Since dataQueue is a FIFO structure, worker threads will remove these Strings from dataQueue in the order they are recieved.

Lines 27-43 define the consumer thread function, consumer. This code enters an infinite loop and removes data from dataQueue and prints the String to the console. Line 36 is the critical piece of code in the consumer function. The call to get() on dataQueue removes the item at the front of the queue and stores it in the variable data. If dataQueue is empty, the consumer thread is blocked until data becomes available.

Alternatively, we could pass false to the optional block parameter on get(). That would cause the thread to continue to execute even if the queue is empty. However, we need to be prepared for situations where the queue is empty and catch the queue.Empty exception that is thrown. Our program calls pass to skip over the exception should this happen (it shouldn’t be the way, because we are using the blocking version of get()).

Lines 48-49 create our producer threads and start them. Lines 52-53 create and start the consumer threads. The producer threads call the produce function while the consumer threads call the consume function. The dataQueue object does the job of synchronizing data between threads. The produce threads write to dataQueue and consumer threads read from it. Thus, our program has created the consumer / producer pattern.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python _thread Basic

Python 3 has the newer thread package, but the _thread package still exists for developers who are more comfortable with the 2.x API. This is a basic example derived from Programming Python: Powerful Object-Oriented Programming that demonstrates how to create threads using the _thread module.

Code

import _thread as thread, time


# This function will run in a new thread
def counter(tid, count):
    for i in range(count):
        # Simulate a delay
        time.sleep(1)
        # Print out the thread id (tid) and the current iteration
        # of our for loop
        print('[{}] => {}'.format(tid, i))


if __name__ == '__main__':
    # Enter a loop that creates 5 threads
    for i in range(5):
        # Start a new thread passing a callable and it's arguments
        # in the form of a tuple
        thread.start_new_thread(counter, (i, 5))

    time.sleep(6)
    print('Main thread exiting')

Explanation

This script creates five new threads using _thread.start_new_thread. Each time a new thread is created, the counter function is called and is passed a tuple of (i, 5). That tuple corresponds to the tid and count parameters of the counter function. Counter enters a loop that runs 5 times since 5 was passed to the second parameter of counter on line 19. It will print the thread id and current iteration of the loop.

Meanwhile, the for loop in the parent thread continues to iterate because thread.start_new_thread does not block the for loop in the main thread. By calling start_new_thread, the program’s execution runs both the for loop in the main thread and the counter function in parallel. Allowing programs to run multiple portions of code at the same time is what gives threads their power. For example, you may wish to use a thread to handle a long running database query while the user continues to interact with the program in the main thread.

One final note about threads in Python. Threads give the appearance of allowing programs to multitask and for all intents and purposes, that is what is happening in the program. Nevertheless, what is really happening is that the Python Virtual Machine is time slicing the computer instructions and allowing a few lines of code to run before switching to another set of instructions.

In other words, if a program has three threads, A, B, and C, then Thread A runs for a few moments, then Thread B, and finally Thread C. Note that there is no guarantee to the order in which threads run. It is possible that one thread may run more often than other threads or that the order of running threads is different each time.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Use Python to Launch Another Program

Many Python programs operate as wrappers to other programs. For example, there are a wide variety of Python programs that work as a GUI wrapper for CLI programs. This tutorial uses an example from Programming Python: Powerful Object-Oriented Programming to demonstrate how a Python program can be used to launch another program.

Code

Parent Process

This code represents a parent process that is capable of launching a child process. I added my own comments to help further explain the code.

import os

# Track the number of spawned children
spawned_children = 0

# Enter an infinite loop
while True:
    
    # Increment the number of children
    spawned_children += 1
    
    # Now launch a new process
    # If we are still in the parent process, pid is non-zero
    # otherwise, pid is zero and indicates that we are in the
    # child process
    pid = os.fork()
    
    # Test if we are in the child process
    if pid == 0:
        
        # Now, since we are in the new process, we use
        # os.execlp to launch a child script.
        os.execlp('python', 'python', 'child.py', str(spawned_children))
        assert False, 'error starting program'
    else:
        # If we are here, then we are still in the parent process
        print('Child is', pid)
        if input() == 'q': break

Child Process

This is a simple script that simply represents a sample child process.

import os, sys

# Parent passes the child process number
# as a command line argument
print('Hello from child', os.getpid(), sys.argv[1])

Explanation

We have two core concepts used in this program. The first concept is forking the parent process into a child process. The other concept is actually launching a new program.

Forking

Forking is the process of launching a copy of the running program. The program that launched a copy of itself is called the parent process, while the copy of the program is called the child process. In theory, a parent and its children can spawn as many children as the underlying operating system allows.

The example program spawns a child process on line 16 and assigns the result to the variable called pid. When os.fork() returns, pid is zero if we are running in the child process or non-zero if we are still in the parent. The program uses an if – else statement (lines 19 and 25 respectively) to perform different operations depending if we are in the parent or child process.

The parent process executes lines 27 and 28. It simply prints out the child’s process id and then asks the user if they wish to continue the program. The user can enter ‘q’ to exit the parent process.

The child process executes code on lines 23 and 24. Line 23 uses os.execlp to start the companion child script and passes the number of spawned children as a command line argument. The child script prints its pid and the number of child processes and then exits.

Launching Programs

The Python os module provides a variety of ways to launch other programs via a Python script. This example makes use of os.execlp which looks for an executable on the operating system’s path. The first argument is the name of the program followed by any number of command line arguments.

The os.execlp is not supposed to return to the caller because if successful, the program is replaced by the new program. This is why it’s necessary to call os.fork() prior to using os.execlp (unless you wish your program to simply end with the new process). When the example script calls os.execlp, the child process created by os.fork is replaced by the new program launched by os.execlp.

Line 24 handles the case when os.execlp fails. Should os.execlp fail, control is returned back to the caller. In most cases, a develop should notify the user that launching the new program has failed. The example script accomplishes this by using an assert False statement with an error message.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Spring Boot Caching with Kotlin

It’s fairly common for applications to continually ask a datastore for the same information repeatedly. Requests to datastores consume application resources and thus have a performance cost even when the requested data is small. The Spring Platform provides a solution allows applications to store information in an in memory caching system that allows applications to check the cache for the required data prior to making a call to the database. This example shows how to use Spring Boot and Kotlin to cache files that we are storing in the database.

Database Entity

We are going to define a database entity that stores files in a database. Since retrieving such data can be an expesive call to the database, we are going to cache this entity.

@Entity
data class PersistedFile(
        @field: Id @field: GeneratedValue var id : Long = 0,
        var fileName : String = "",
        var mime : String = "",
        @field : Lob var bytes : ByteArray? = null)

You will notice that this class has a ByteArray field that is stored as a LOB in the database. In theory, this could be as many bytes as the system allows so ideally we would store this in cache. Other good candidates are entity classes that have complex object graphs and may result in the ORM generated complex SQL to retreive the managed object.

Enable Caching

Spring Boot defines a CachingManager internally for the application. You are free to use your own, but you need to configure your Spring Boot environment first.

Dependencies

You need to have spring-boot-starter-cache in your pom.xml or other dependency manager.


    org.springframework.boot
    spring-boot-starter-web

Annotation

You also need to tell the environment to turn on caching by using the @EnableCaching

@SpringBootApplication
@EnableJpaRepositories
@EnableCaching  //Spring Boot provides a CacheManager our of the box
                //but it only turns on when this annotation is present
class CachingTutorialApplication

Decorate the Caching Methods

At this point, we only need to decorate the methods we want the environment cache. This is done by decorating our methods with the @Cacheable annotation and then providing the annotation with the name of a cache. We can also optionally tell the cache manager what to use for the key. Here is the code for our service class followed by an explanation.

//We are going to use this class to handle caching of our PersistedFile object
//Normally, we would encapsulate our repository, but we are leaving it public to keep the code down
@Service
class PersistedFileService(@Autowired val persistedFileRepository: PersistedFileRepository){

    //This annotation will cause the cache to store a persistedFile in memory
    //so that the program doesn't have to hit the DB each time for the file.
    //This will result in faster page load times. Since we know that managed objects
    //have unique primary keys, we can just use the primary key for the cache key
    @Cacheable(cacheNames = arrayOf("persistedFile"), key="#id")
    fun findOne(id : Long) : PersistedFile = persistedFileRepository.findOne(id)

    //This annotation will cause the cache to store persistedFile ids
    //By storing the ids, we don't need to hit the DB to know if a file exists first
    @Cacheable(cacheNames = arrayOf("persistedIds"))
    fun exists(id: Long?): Boolean = persistedFileRepository.exists(id)
}

The first method, findOne, is used to look up a persistedFile object from the database. You will notice that we pass persistedFile as an argument to cacheNames and then use the primary key as the key for this item’s cache. We can use the PK because we know it’s a unique value so we can help make the cache more performant. However, keep in mind that the key is optional.

We can also avoid another call to the database by storing if items exist in the database in the cache. The first time exists() is called, the application will fire a count sql statement to the database. On subsequent calls, the cache will simply return true or false depending on what is stored in the cache.

Putting it all together

I put together a small web application that demonstates the caching working together. I turned on the show sql property in the applications.properties file so that viewers can see when the application is making calls to the database. You will notice that the first time I retreive the persisted file, there is sql generated. However, on the second call to the same object, no sql is generated because the application isn’t making a call to the database.

You can get the complete code from my GitHub page at this link.

Here are some links to posts that are related to concepts used in Spring Boot that we used today.