Kotlin Compare Paths

The Paths interface overrides equals() and compareTo(), allowing us to compare paths in the file system.

import java.nio.file.Paths

fun main(args : Array<String>){
    //Get two references to the home directory
    val home = Paths.get(System.getProperty("user.home"))
    val otherHome = Paths.get(System.getProperty("user.home"))
    
    //Get the current working directory
    val cwd = Paths.get(System.getProperty("user.dir"))

    println("$home == $otherHome is " + (home == otherHome))
    println("$home == $cwd is " + (home == cwd))
    println("$home < $cwd is " + (home < cwd))
    println("$cwd < $home is " + (cwd < home))
    println("$home >= $otherHome is " + (home >= otherHome))
}

Output

/Users/stonesoup == /Users/stonesoup is true
/Users/stonesoup == /Users/stonesoup/IdeaProjects/OCJAP is false
/Users/stonesoup < /Users/stonesoup/IdeaProjects/OCJAP is true
/Users/stonesoup/IdeaProjects/OCJAP < /Users/stonesoup is false
/Users/stonesoup >= /Users/stonesoup is true

compareTo(other : Path) : Int

The javadoc states that two paths are compared lexicographically as defined by the file system provider. The default provider uses a platform specific means of comparing file paths.

equals(other : Object) : Boolean

Assuming the other is a Path object and is not associated with a different file system, the paths are equal if they are the same path. Keep in mind that case sensitivity may apply. Mac OS X is case insensitive so “belchers.txt” is the same as “Belchers.TXT”, but on Linux, this would be false.

References

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.htm

Kotlin Path Interface

The Path interface is provided to Kotlin by the java.nio.file.Paths package. It represents Paths on the underlying file system and provides a number of useful utility methods for working with paths. Here is an example program followed by output and an explanation.

import java.io.BufferedWriter
import java.io.FileWriter
import java.nio.file.Paths

val belchers = "belchers.txt"

fun makeBelcherFile(){
    val names = listOf("Bob", "Linda", "Tina", "Gene", "Louise")
    BufferedWriter(FileWriter(belchers)).use { writer ->
        names.forEach { name ->
            with(writer){
                write(name)
                newLine()
            }
        }
    }
}

fun main(args : Array<String>){
    //Just make a file on the file system for demonstration purposes
    makeBelcherFile()

    //Get a reference to our example path on the disk.
    //In this case, we are using Paths.get() to get a reference to the current working directory
    //and then using the resolve() method to add the belchers.txt file to the path
    val belcherPath = Paths.get(System.getProperty("user.dir")).resolve(belchers)

    val template = "\t%-30s => %s"

    with(belcherPath){
        println("File Information")
        //The fileName property returns the name of the file
        println(template.format("File Name", fileName))

        //The root property return the the root folder of the path
        println(template.format("File Path Root", root))

        //The parent property returns the parent folder of the file
        println(template.format("File Path Parent", parent))

        //The nameCount returns how many items are in the path
        println(template.format("Name Count", nameCount))

        //The subpath() method returns a portion of the path
        println(template.format("Subpath (0, 1)", subpath(0, 1)))

        //The normalize method returns items such as . or .. from the path
        println(template.format("Normalizing", normalize()))

        //True if this is an absolute path otherwise false
        println(template.format("Is Absolute Path?", isAbsolute))

        //Convert to an absolute path if needed
        println(template.format("Absolute Path", toAbsolutePath()))

        //Check if the path starts with a path. In this example, we are using the home folder
        println(template.format("Starts with ${System.getProperty("user.home")}?", startsWith(System.getProperty("user.home"))))
        println()

        println("Elements of the Path")
        //We can print each portion of the path individually also!
        forEach { it -> println("\tPortion => $it") }
    }
}

Here is the output when run on my machine.

File Information
	File Name                      => belchers.txt
	File Path Root                 => /
	File Path Parent               => /Users/stonesoup/IdeaProjects/OCJAP
	Name Count                     => 5
	Subpath (0, 1)                 => Users
	Normalizing                    => /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt
	Is Absolute Path?              => true
	Absolute Path                  => /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt
	Starts with /Users/stonesoup?  => true

Elements of the Path
	Portion => Users
	Portion => stonesoup
	Portion => IdeaProjects
	Portion => OCJAP
	Portion => belchers.txt

Explanation

The program writes out a basic text file to the file system for demonstration purposes. We are going to focus on the main function. Our first task is to get a Path object that points to our belchers.txt file. We use the Paths.get() factory method and path in a path on the file system. In our example, we use the current working directory by using the System property “user.dir”.

We could have also added the belchers.txt to the end of the current working directory. However, I wanted to demonstrate the resolve method that combines two paths into a single path. So we chain the resolve method to the returned Path object and add belchers.txt. The returned Path object points to the path of belchers.txt on the file system.

The next part of the program demonstrates commonly used methods found on the Path interface. Line 33 prints the name of the file by using the fileName property. Next we print out the root of the path by using the root property (line 36). When we want to know the parent of a path, we can use the parent property (line 39).

The Path interface has a nameCount property (line 42) that returns the number of items in a path. So if a path is /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt, nameCount returns 5, one for each item between each slash (/) character. The nameCount is useful when working with the Subpath function (line 45), which accepts a start index (inclusive) and an end index (exclusive) and returns a Path object based on the indexes.

Sometimes paths are abnormal paths and may have “.” or “..” characters in the path. When we want to remove such characters, we use the normalize() function (line 48) which strips out abnormal characters from the path. Depending on the work we may be doing, we may want to test if the Path is a relative path or an abosulte path. The Path interface has an isAbsolute property (line 51) for such purposes. It returns true if the path is an absolute path otherwise false.

Should we wish to convert a relative path into an absolute path, we only need to call the toAbsolutePath() function (line 54) and we will get an absolute path. We can also check if a path starts with a certain path. In our example, line 57, we check if our path starts with the users home directory (user.home). It returns true or false based on the outcome.

Path supports the forEach() function. Line 62 shows an example of how we can iterate through each part of the Path. The it variable holds each portion of the path and the program prints each part of the path.

Common Methods

We spoke about each method as it relates the program above. Here are each of the commonly used methods broken down.

Paths.get(first : String, varages more : String) : Path

The get() converts a String (or URI in the overloaded version) into a Path object. When we use use the varags part, the Path will use the OS name seperator. So Unix paths will have a forward slash, while Windows ones will have a backslash.

val home = Paths.get(System.getProperty("user.home"))

parent : Path

The parent property returns a Path object that points to the parent of the current Path object.

val parent = home.parent

nameCount : Int

The nameCount returns the number of items in the path.

val count = home.nameCount

subPath(beginIndex : Int, endIndex : Int) : Path

The subPath method is used to return a portion of the path object. The beginIndex is inclusive while the endIndex is exclusive.

val part = home.subPath(0, 2)

normalize() : Path

The normalize() method returns a Path object without unneeded characters.

val norm = home.normalize()

resolve(other : Path) : Path, resolve(other : String): Path

Returns a Path object that is a combined path between the current path and the other parameter.

val belchers = home.resolve("belchers.txt")

isAbsolute : Boolean

True if the Path is an absolute path otherwise it’s false.

val absolute = home.isAbsolute

startsWith(path : String) : Boolean, startsWith(path : Path) : Boolean

True if the current path starts with the supplied path argument.

val hasRoot = home.startsWith("/")

toAbsolutePath() : Path

Returns a Path object that is the aboslute path of the current Path.

val abs = home.toAbsolutePath()

References

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.html

Kotlin Object Serialization

Whenever a class implements Serializable, it’s a candidate for object serialization. The serialization mechanism converts an object into bytes and then writes the object to the output stream. We use the class ObjectOutputStream to serialize a file and then ObjectInputStream to restore an object.

import java.io.FileInputStream
import java.io.FileOutputStream
import java.io.ObjectInputStream
import java.io.ObjectOutputStream

fun main(args : Array<String>){
    //Destination File
    val file = "belchers.burgers"

    //A map of family
    val family = mapOf(
            "Bob" to "Father",
            "Linda" to "Mother",
            "Tina" to "Oldest",
            "Gene" to "Middle",
            "Louise" to "Youngest")

    //Write the family map object to a file
    ObjectOutputStream(FileOutputStream(file)).use{ it -> it.writeObject(family)}

    println("Wrote $file")
    println()
    println("Time to read $file back")

    //Now time to read the family back into memory
    ObjectInputStream(FileInputStream(file)).use { it ->
        //Read the family back from the file
        val restedFamily = it.readObject()

        //Cast it back into a Map
        when (restedFamily) {
            //We can't use <String, String> because of type erasure
            is Map<*, *> -> println(restedFamily)
            else -> println("Deserialization failed")
        }
    }
}

The example program writes a map of strings to a file using object serialization. It begins by creating a map of test data on lines 11-16. Line 19 opens the file by creating a FileOutputStream object and passing in the file name to the constructor. The FileOutputStream object gets passed to the newly created ObjectOutputStream. We apply the use() function to make sure all resources are closed when finished.

Writing the map to the file is painless. All we need to do is use the writeObject() method found on ObjectOutputStream, shown on line 19. The class does all of the work of flattening the family Map object into bytes and writing the bytes to the file. The use() function closes the file and the serialization process is complete.

Reading the object back into memory is almost as simple. We open the file by creating a new FileInputStream object and supplying the constructor with the file name. The FileInputStream object is supplied to the constructor of the ObjectInputStream and we chain it to the use() function to make sure the file gets closed when finished.

The object is restored with the readObject() method, but there is a catch. The readObject() method returns Any. It’s our job to downcast to the proper type. On line 31, we use the when() function and on line 33, we check that it is a Map. Since map is a generic interface and serialization doesn’t save type, we use *, * for the type arguments. At this point, we can work on the restedFamily object normally.

Kotlin Data Streams

Data streams are used to write binary data. The DataOutputStream writes binary data of primitive types, while DataInputStream reads data back from the binary stream and converts it to primitive types. Here is an example program that writes data to a file and then reads it back into memory.

import java.io.DataInputStream
import java.io.DataOutputStream
import java.io.FileInputStream
import java.io.FileOutputStream

fun main(args : Array<String>){
    val burgers = "data.burgers"

    //Open the file in binary mode
    DataOutputStream(FileOutputStream(burgers)).use { dos ->
        with(dos){
            //Notice we have to write our data types
            writeInt("Bob is Great\n".length) //Record length of the array
            writeChars("Bob is Great\n") //Write the array
            writeBoolean(true) //Write a boolean

            writeInt("How many burgers can Bob cook?\n".length) //Record length of array
            writeBytes("How many burgers can Bob cook?\n") //Write the array
            writeInt(Int.MAX_VALUE) //Write an int

            for (i in 0..5){
                writeByte(i) //Write a byte
                writeDouble(i.toDouble()) //Write a double
                writeFloat(i.toFloat()) //Write a float
                writeInt(i) //Write an int
                writeLong(i.toLong()) //Write a long
            }
        }
    }

    //Open a binary file in read mode. It has to be read in the same order
    //in which it was written
    DataInputStream(FileInputStream(burgers)).use {dis ->
        with (dis){
            val bobSize = readInt() //Read back the size of the array
            for (i in 0 until bobSize){
                print(readChar()) //Print the array one character at a time
            }
            println(readBoolean()) //Read a boolean

            val burgerSize = readInt() //Length of the next array
            for (i in 0 until burgerSize){
                print(readByte().toChar()) //Print array one character at a time
            }
            println(readInt()) //Read an int

            for (i in 0..5){
                println(readByte()) //Read a byte
                println(readDouble()) //Read a double
                println(readFloat()) //Read a float
                println(readInt()) //Read an int
                println(readLong()) //Read a long
            }
        }

    }
}

The program creates a FileOutputStream object and passes the name of the file to its constructor. The FileOutputStream object is then passed to the constructor of DataOutputStream. We apply the use() function to ensure all resources are freed properly when we have finished. The file is now open for writing in binary mode.

When we wish to use the same object repeatedly, we can pass it to the with() function. In our case, we intend to keep using our DataOutputStream object, so on line 11, we pass it to the with() function. Inside of the with() function, all method calls will target the dos object because it was supplied to with().

Since we intend to write a string to the file, we need to record the length of the string. We do this using the writeInt function and passing the length of our string to it. Then we can use writeChars() to write a character array to the file. The String argument is converted to a character array and written to the file. Finally, we call writeBoolean to write true/false values to the file.

The next section is a repeat of the first. We intend to write another string to the file, but do so, we need to record the length of the file. Once again, we turn to writeInt() to record an int value. The next line, we use writeBytes() rather than writeChars() to demonstrate how we can write a byte array rather than a String. The DataOutputStream class sees to the details of turning a String into a byte array. Finally, we write another int value to the stream.

Next, we enter a for loop on line 21. Inside of the for loop, we demonstrate writing different primitive types to the file. We can use writeByte() for a byte, writeDouble() for a double, and so on for each primitive type. The DataOutputStream class knows the size of each primitive type and writes the correct number of bytes for each primitive.

When we are done writing the object, we open it again to read it. Line 33 creates a FileInputStream object that accepts the path to the file in its constructor. The FileInputStream object is chained to DataInputStream by passing it to the constructor of DataInputStream. We apply the use() function to ensure all resources are properly closed.

Reading the file requires the file to be read in the same order in which it is written. Our first order of business is to grab the size of the character array we wrote to the file earlier. We use readInt() on line 35 followed by a for loop that terminates at the size of the array on line 36. Each iteration of the for loop calls readChar() and the String is printed to the console. When we are finished, we read a boolean on line 39.

Our next array was a byte array. Once again, we need it’s final size so we call readInt() on line 41. Lines 42-44 run through the array and call readByte() until the loop terminates. Each byte is converted to a character object using toChar(). On line 45, we read an int using readInt().

The final portion of the program repeats the for loop found earlier. In this case, we enter a for loop that terminates after five iterations (line 47). Inside of the for loop, we call readByte(), readDouble(), readFloat(), and so on. Each call prints the restored variable to the console.

Kotlin Scanner

The Scanner class is a powerful class that looks for tokens in an input stream and returns each match. The class is often used on files, but it can work with other strings, network sockets, or just about any other character input stream object. The following program demonstrates using a Scanner object to search for words without punctuation. It reads a file and then outputs the most frequently used words to the least frequently used words.

import java.io.FileReader
import java.util.*

fun main(args : Array<String>){
    //Check if they supplied a file
    if(args.isEmpty()){
        println("Please provide a file")
        System.exit(-1)
    }

    //Create an empty map
    val wordMap = mutableMapOf<String, Int>()

    //Open the file and pass it to a Scanner object.
    Scanner(FileReader(args[0])).use { sc ->

        //Tell the scanner to only match entire words
        sc.useDelimiter("""\W""".toPattern())

        //Loop until we get to the end of the file
        while(sc.hasNext()){

            //Grab the next word
            val word = sc.next()

            //Test that it's not a blank string
            if(word.isNotBlank()){
                //Add it to the word map
                wordMap[word] = wordMap.getOrDefault(word, 0) + 1
            }
        }
    }

    //This prints the entries by most used words to least used words
    wordMap.entries.sortedByDescending { it.value }.forEach({it -> println(it)})
}

The program starts by checking if the user provided command line arguments. If the args array is empty, the program exits after printing an error message. Line 12 creates an empty mutable map so that we can add words and counts to it. Individual words are used as the key while the Int is used for values to represent the number of times the word is found.

The file is opened on line 15. We create a FileReader object and pass the path of the file to its constructor. The file path is found at the first element in the arguments array and was supplied by the user. The FileReader object is passed to the constructor of the Scanner. We apply the use() function to ensure the Scanner and the underlying file is closed when have finished.

Line 18 tells the Scanner to match whole words by passing in a regex string and converting it to a Pattern object. The regex “\W” matches whole words. Kotlin allows use to use raw strings inside of triple quotes “”” so that we do not need to worry about escaping any characters.

Line 21 enters a while loop that terminates when Scanner.hasNext() is false. That means we loop until there are no more matches in the input stream. Line 27 tests if the word is a blank string and if it isn’t a blank string, we update the word count on line 29.

Line 35 prints each word from the most used to the least used. It’s accomplished by getting the entries list and then sorting it in descending order. The sortedByDescending takes a comparator object which is created by the lambda expression it.value. In this case, it.value represents the number of times a word was found. The final forEach() operation iterates through the sorted list of entries and prints them individually to the console.

Here was my output when I used this program with a brief excerpt from Green Eggs and Ham.

I=37
not=29
like=28
them=28
do=20
a=19
am=10
Sam=10
you=8
would=7
eggs=6
and=6
ham=6
Would=6
In=6
in=6
or=5
there=5
eat=5
Not=5
Here=4
house=4
mouse=4
with=4
You=4
That=3
Green=3
With=3
green=3
box=3
fox=3
car=3
Anywhere=2
here=2
anywhere=2
Eat=2
could=2
may=2
tree=2
Do=1
Could=1
they=1
are=1
will=1
see=1
let=1
me=1
be=1

Kotlin Buffered Text Files

The BufferedReader and BufferedWriter classes improve the performance of reading and writing operations by adding an in-memory buffer to the streams. By using a memory buffer, the program and reduce the number of calls required to the underlying read and write streams and thus improve performance. Here is an example program that makes use of both BufferedReader and BufferedWriter.

fun main(args : Array<String>){
    when (args.size){
        //Check for two command line arguments
        2 -> {
            //Grab source and destination files
            val src = args[0]
            val dest = args[1]

            //Check if the destination file exists. We can create it
            //if needed
            with (File(dest)){
                if(!exists()){
                    createNewFile()
                }
            }

            //Now, open the source file in read mode. The BufferedReader
            //provides buffering to improve performance
            BufferedReader(FileReader(src)).use { reader ->

                //Likewise, open the destination file in write mode
                //The BufferedWriter class provides buffering for performance
                BufferedWriter(FileWriter(dest)).use { writer ->

                    //Read through the source file one character at a time
                    var character = reader.read()
                    while(character != -1){

                        //Write the character to the destination file
                        writer.write(character)

                        //Read the next character.
                        character = reader.read()
                    }
                }
            }
        }
        else -> {
            println("Source file followed by destination file required")
            System.exit(-1)
        }
    }
}

The example program copies the source file to the destination file. We begin by using the when() function to check if we have two and only two command line arguments. If we have a source and destination file, the program continues starting on line 6 otherwise it jumps down to line 39 and exits after printing an error message.

On lines 6 and 7, we grab our source and destination files from the command line parameters. On line 11, we create a new File object and pass it to the with() function to see if we need to make a new file for the destination. Line 12 uses the exits() property to see if the file exists, and if it doesn’t exist, line 13 creates the new file.

Starting at line 19, we open the source file and begin our copy operation. The file is opened by creating a new FileReader object and passing in the name of the source file. The FileReader object is then passed to the constructor of BufferedReader. We utilize the use() function to ensure that all resources are properly closed when we are finished with the read operation. It’s also worth noting that we call the lambda parameter reader rather than it to improve code readability.

Line 23 opens the destination file for writing. We create a FileWriter (the companion object to FileReader) and pass the name of the destination file to the FileWriter’s constructor. The FileWriter object is passed to the BufferedWriter constructor to provide buffering support. Once again, we utilize the use() function to ensure that all resources are closed when finished.

The copy operation is fairly anti-climatic. We read the first character on line 26 and then enter into a while loop that terminates when character == -1. Inside of the while loop, we write the character to the destination file (line 30) and then read the next character (line 33). The use() function that was applied to both the BufferedReader and BufferedWriter objects closes the files when finished.

The program can be run by using the following commands at the command line.

kotlinc BufferedCopy.kt -include-runtime -d BufferedCopy.jar
ava -jar BufferedCopy.jar  [dest file]

When finished, the dest file will be an exact copy of source file.

Kotlin Reader Example

The java.io.Reader class provides a low-level API for reading character streams.

import java.io.FileReader

fun main(args : Array<String>){
    if (args.isEmpty()){
        println("Please provide a list of files separated by spaces")
        System.exit(-1)
    }

    //Read each supplied file
    args.forEach { fileName ->

        //Open the file. The use() extension function sees to the details
        //of closing the file when finished
        FileReader(fileName).use {

            //Read a single character
            var character = it.read()

            //read() returns -1 at End of File
            while (character != -1){

                //Print the character (make sure to convert it to a Character)
                print(character.toChar())

                //Read the next character
                character = it.read()
            }
        }
    }
}

The example program requires names of text files passed in as command line arguments so our first task is to check if we have any command line arguments. On line 4, we use the isEmpty() function on the args array object to check for an empty array. If true, we print a message to the user (line 5) and then exit the program (line 6).

Provided the program is still running, we begin by printing the contents of each file to the console. On line 10, we enter into a forEach statement to process each of the file supplied at the command line. Rather than using the standard it varaible name, we use fileName to help make the code more clear.

Line 14 performs the operation of actually opening the file. We do this by creating a new FileReader object and pass the name of the file into its constructor. Then we chain the object to the use() extension function. The use() function sees to the details of actually closing the file when we are finished with it, even in the case of an exception.

The file reader object is now referred to by the variable it. On line 17, we call it.read() to read a single character from the file and store it into the character variable. We then enter into a while loop that terminates when character is -1. The -1 value indicates we have reached the end of the file. Inside of the while loop, we print the character (line 23). Sicne read() returns an int, we have to call toChar() to print the actual character. Then on line 26, we update character to the next character in the stream.

Here is how I ran the program for those readers who wish to try it out.

kotlinc ReaderExample.kt -include-runtime -d readerExample.jar
java -jar readerExample.jar ReaderExample.kt

This invocation printed the example program to my console, but it works with any text file.

Python Split and Join file

The book Programming Python: Powerful Object-Oriented Programming has an example program that shows how to split and join files. Many utilities exist for such an operation but the program offers a good working example of how to read from and write to binary files in Python3. The code below is an adaptation from the book with my own comments added.

Code

import os


def split(source, dest_folder, write_size):
    # Make a destination folder if it doesn't exist yet
    if not os.path.exists(dest_folder):
        os.mkdir(dest_folder)
    else:
        # Otherwise clean out all files in the destination folder
        for file in os.listdir(dest_folder):
            os.remove(os.path.join(dest_folder, file))

    partnum = 0

    # Open the source file in binary mode
    input_file = open(source, 'rb')

    while True:
        # Read a portion of the input file
        chunk = input_file.read(write_size)

        # End the loop if we have hit EOF
        if not chunk:
            break

        # Increment partnum
        partnum += 1

        # Create a new file name
        filename = os.path.join(dest_folder, ('part%004' % partnum))

        # Create a destination file
        dest_file = open(filename, 'wb')

        # Write to this portion of the destination file
        dest_file.write(chunk)

        # Explicitly close 
        dest_file.close()
    
    # Explicitly close
    input_file.close()
    
    # Return the number of files created by the split
    return partnum


def join(source_dir, dest_file, read_size):
    # Create a new destination file
    output_file = open(dest_file, 'wb')
    
    # Get a list of the file parts
    parts = os.listdir(source_dir)
    
    # Sort them by name (remember that the order num is part of the file name)
    parts.sort()

    # Go through each portion one by one
    for file in parts:
        
        # Assemble the full path to the file
        path = os.path.join(source_dir, file)
        
        # Open the part
        input_file = open(path, 'rb')
        
        while True:
            # Read all bytes of the part
            bytes = input_file.read(read_size)
            
            # Break out of loop if we are at end of file
            if not bytes:
                break
                
            # Write the bytes to the output file
            output_file.write(bytes)
            
        # Close the input file
        input_file.close()
        
    # Close the output file
    output_file.close()

Explanation

split

The code snippet shows to sample functions that either split a file into parts or join those parts back together into one file. The split function begins by taking three parameters. The first parameter, source, is the file that we wish to split. The second parameter, dest_folder, is a folder that stores the output files created by the split operation. The final parameter, write_size, is the size of the file parts in bytes.

Split starts by checking if dest_folder exists or not. If the folder does not exist, we call os.mkdir to create a new folder on the file system. Otherwise, we obtain a list of all files in the folder by calling os.listdir and then remove all of them by calling os.remove. When calling os.remove, we use os.path.join to create a full path to the target file that’s getting deleted.

Once the destination folder has been prepared, the function continues by performing the actually split operation. A partnum variable is created on line 13 that tracks the number of file parts created by the split operation. The source file is opened on line 16 in binary mode. Binary mode is used in this case because we could be dealing with audio or video files and not just text files.

The split function enters an infinite loop on line 18. On line 20, we read a number of bytes, specified by write_size, from the source file and store them in the chunk variable. On line 23, we test if chunk actually recieved any bytes from the read operation. If chunk did not read any bytes, then we have hit end of file (EOF) and we break out of the loop. Otherwise, we increment partnum by one and begin to write the file part.

Line 30 creates the name and destination for the file part by using os.path.join, the dest_folder, and a string template that accepts the current part number. The destination file is created on line 33 with a call to open (also in binary mode) and then on line 36, we write chunk to the file. Line 39 has an explicit call to closing the file. While we normally wait for files to close in garabage collection, this function opens a lot of files so ideally we should close them in oder to make sure we don’t exceed the number of file handles the underlying OS allows. The function ends by closing the input_file and returning the number of part files created.

join

The join function does the reverse job of the split function. It begins by accepting a source_dir, a destination file, and the size of the part files. The output_file is created on line 50 (opened in binary mode) and then on line 53, we use os.listdir to get a list of all parts.

Since our part files contain a number that identifies the parts, we can store all parts in a list and call sort() on it. Then it’s just a matter of looping through all of the parts and assembling them into a single file. The for loop starts on line 59. On line 62, we use os.path.join to create a full path to the part file and then we can open the part file on line 65.

The program enters an infinite join loop on line 67. Inside of the while loop, we read a part of the input_file and return the bytes read. If bytes is empty, we have it end of file so we can test for this on line 72 and use break to end the while loop if we have hit end of file. Otherwise, we can write to the output file on line 76.

When we have finished reading our part file, we again close it explicitly on line 79. When all parts of have been read we close the output_file. The output_file contains the bytes of the original file that was split in the first places

Thoughts

The code contained in this post isn’t ideal for production but is instead meant to be a learning tool. In this code, we cover reading and writing to binary files and functions of the os module. There are areas we could improve this code. For example, split destroys the contents of the destination folder, but ideally, it should instead throw an exception back to the caller and let the caller delete all files in a folder instead.

We also don’t test if our input files are really files and if our folders are really folders. That is certainly an area for improvement. Another thing that could be improved upon is using an enumeration for the size of the file parts. Right now, write_size in split and read_size in join are specified in bytes, but that isn’t clear to clients of these functions.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Line Scanner

This post borrows from a code example found in Programming Python: Powerful Object-Oriented Programming that demonstrates collecting command line arguments, opening a file, reading the file, and passing a function as a callback to another function.

Code

Here is the entire script that accepts a file as a command line argument and prints the contents of the file to the console.


def scanner(name, func):

    # Open the file (with statement ensures closure even if there is an exception)
    with open(name, 'r') as f:
        # Iterate through the file
        for line in f:
            # Call our callback function
            func(line)

if __name__ == '__main__':
    import sys
    name = sys.argv[1]

    # This is a function we are passing to scanner
    # Python has first class functions which can be
    # get passed as arguments to other functions
    def print_line(str):
        print(str, end='')

    # Call the scanner function, which in turn
    # calls the print_line function for each line
    # in the file
    scanner(name, print_line)

Command Line Arguments

The first concept covered in this script is processing command line arguments. Python requires us to import the sys module (line 12) which maintains an argv property. The argv property is a list-like object that contains all of the command line arguments used to hold all of the command line parameters. The first index [0] is the name of the script, followed by all of the other arguments supplied to the program.

On line 13, we grab the target file (stored in argv[1]) and keep it in a name variable. At this point, our program knows which file to the open later on when we use the scanner function.

First Class Functions

Python treats functions as objects. As such, we can define any function in a Python program and store it in a variable just like anything else. Lines 18-19 define a print_line function that accepts a String parameter. On line 24, print_line is the second argument to the scanner function.

Once inside of the scanner function, the print_line function is referenced by the variable func. On line 9, we call print_line with the func(line) rather than print_line(line). This works because func and print_line both refer to the same function object in memory. Passing functions in this fashion is incredibly powerful because it allows the scanner function to accept different behaviors for each line it processes.

For example, we could define a function the writes each line processed by scanner to a file rather than printing it to the console. Later on, we may choose to write another function that sends each line over the network via network sockets. The beauty of the scanner function as defined is that it works the same regardless of the callback function passed to the func argument. This programming technique is sometimes known as programming to a behavior.

Opening and Reading Files

The final topic covered is opening and reading a file. Line 5 in the script uses the with statement combined with the open function to actually open the file in read mode. The as f assigns the result of the open function to the variable f. The f variable holds a Python file object.

Since Python file objects support the iterator protocol, they can be used in for loops. On line 7, we read through each line in the file with the statement for line in f:. On each execution of the loop, the line variable is updated with the next line in the file.

When the loop is complete, the with statement calls the file’s close() method automatically, even if there is an exception. Of course, Python’s garabage collection will also ensure a file is closed, but this pattern provides an extra level of safety, especially since there are a variety of Python interpretors that may act differently than the CPython.

Conclusion

The most powerful take away from this example if the first class functions. Python treats functions like any other data type. This allows functions to be stored as passed around the program as required. Using first class functions keeps code loosely coupled and highly maintanable!

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.