Kotlin Watch Service

The java.nio.file package has a WatchService class that is used to watch for changes in a folder. This is a Kotlin program that demonstrates how to create a watch service that monitors a folder for changes and reports the changes.

package ch9.files

import java.nio.file.Path
import java.nio.file.Paths
import java.nio.file.StandardWatchEventKinds
import java.nio.file.WatchService

private fun prompt(msg : String) : String {
    print("$msg => ")
    return readLine() ?: ""
}

private fun Path.watch() : WatchService {
    //Create a watch service
    val watchService = this.fileSystem.newWatchService()

    //Register the service, specifying which events to watch
    register(watchService, StandardWatchEventKinds.ENTRY_CREATE, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.OVERFLOW, StandardWatchEventKinds.ENTRY_DELETE)

    //Return the watch service
    return watchService
}

fun main(args : Array<String>){
    val folder = prompt("Enter a folder to watch")
    val path = Paths.get(folder)

    val watcher = path.watch()
    println("Press ctrl+c to exit")

    while(true){
        //The watcher blocks until an event is available
        val key = watcher.take()

        //Now go through each event on the folder
        key.pollEvents().forEach { it ->
            //Print output according to the event
            when(it.kind().name()){
                "ENTRY_CREATE" -> println("${it.context()} was created")
                "ENTRY_MODIFY" -> println("${it.context()} was modified")
                "OVERFLOW" -> println("${it.context()} overflow")
                "ENTRY_DELETE" -> println("${it.context()} was deleted")
            }
        }
        //Call reset() on the key to watch for future events
        key.reset()
    }
}

Here is what it looked like when run on my machine.

Enter a folder to watch => /users/stonesoup/downloads
Press ctrl+c to exit
bob.json was created
bob.json was deleted

While the program was running, I created a bob.json file in my Downloads folder and then deleted it.

Explanation

The first task is to register the Watch Service. The example program has an Path.watch() extension function that encapsulates creating a watch service, registering it, and then returning it to the caller. The Watch Service is obtained from Path.fileSystem.newWatchService() method (line 15). The next step is to register the Watch Service using the Path.register() method (line 18). When registering the Watch Service, we can pass in number of StandWatchEventKinds to tell the Watch Service what to watch.

The main method collects a path from the user (line 25), creates a Path object from the input (line 26), and then registers the Watch Service (line 28). At this point, we enter into an infinite loop and watch the target folder for changes.

The first action in the loop is watcher.take() (line 33). The take() method blocks the thread until an event happens. When a monitored watch event takes place, the take() method will return a WatchKey(). The WatchKey() holds any number of Watch Events that have happened since the last watch cycle.

The example program calls WatchKey.pollEvents().forEach and goes through each watch event (line 36). It uses the WatchEvent.kind().name property (line 38-43) to print output according to each event. Notice how the program combines a when() function to react to each kind of watch event (lines 38-43). When we are done processing all events, we call reset() on the WatchKey() so that the program can wait for the next event. We can also end the WatchService by calling cancel() on the WatchKey.

References

https://docs.oracle.com/javase/8/docs/api/?java/io/File.html

Advertisements

Kotlin Glob

Glob is a pattern that is used to match files to a pattern. For example, suppose we wish to match all Kotlin files on our file system, we would use the syntax “glob:*.kt”. The following demo program walks through a user-supplied start path and matches all files according to the user-supplied glob pattern.

package ch9.files

import java.nio.file.FileSystems
import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.util.stream.Collectors.toList

private fun prompt(msg : String) : String {
    print("$msg => ")
    return readLine() ?: ""
}

fun main(args : Array<String>){
    val start = prompt("Enter a start path")
    val glob = prompt("Enter a glob pattern")

    //Object a matcher object from the supplied Glob pattern
    val matcher = FileSystems.getDefault().getPathMatcher(glob)

    val path = Paths.get(start)
    //Walk the file system
    Files.walk(path)
            //Filter out anything that doesn't match the glob
            .filter { it : Path? -> it?.let { matcher.matches(it.fileName) } ?: false }
            //Collect to a list
            .collect(toList())
            //Print to the console
            .forEach({ it -> println("Found ${it.fileName}") })
}

Here is an example run of the program.

Enter a start path => /users/stonesoup
Enter a glob pattern => glob:*.kt
Found CachingTutorialApplicationTests.kt
Found CachingTutorialApplication.kt
Found ExposedTransactionManagerTest.kt
Found SpringTransactionManager.kt
Found SamplesDao.kt
Found SamplesSQL.kt
...continued

Detailed Explanation

The program asks the user for a start path (line 15) and a glob syntax (line 16). The program supports the glob patterns in the table below.

Pattern Description
* Matches anything
** Matches anything even accross directories
? The ? mark matches any single character
[xyz] Matches any character inside of [ ]. In this example, it’s x, y, or z
[0-5], [a-z] Matches a range. In this case, it’s 0-5 or the letters a-z
{xyz, abc} Matches one of the two patterns. In this case, either xyz or abc

Once the user has supplied a valid path and glob pattern, the program calls Files.walk to walk through the file system. Using Java 8’s Streaming API, we filter all items that do not match the pattern (line 25) using the matcher object that was returned on line 19. The results are collected into a list and printed to the console.

References

https://docs.oracle.com/javase/8/docs/api/?java/io/File.html
https://docs.oracle.com/javase/8/docs/api/?java/io/File.html

Kotlin Walk a File Tree

The java.nio.file.Files class has a walk method that returns a Stream used to walk a file tree. The example program lists out the 5 largest files given a starting path and demonstrates how to easily walk through a file system in Kotlin.

package ch9.files

import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.util.stream.Collectors.toList

private fun Path.size() : Long {
    return try {
        Files.size(this)
    } catch (e : Exception){
        -1
    }
}

fun main(args: Array<String>){
    if(args.isNotEmpty()){
        val path = Paths.get(args[0])

        //Open a Stream object
        Files.walk(path)
                //Sort by size
                .sorted { lhs : Path?, rhs : Path? -> compareValues(lhs?.size() ?: -1, rhs?.size() ?: -1)}
                //Collect the result into a list
                .collect(toList())
                //Now reverse the list so that the largest file is first
                .reversed()
                .stream()
                //Open another stream and collect up to 5 files
                .limit(5)
                //Now print the results
                .forEach({it -> println("${it.fileName} \t ${it.size()}") })
    } else {
        println("Usage: start path")
    }
}

Detailed Explanation

The program parses the command line arguments and returns a Path object (line 18). The Path object is passed to the Files.walk() method on line 21. The walk() method returns a Stream object that opens up all of the operations found on a Java 8 Stream. In our case, we wish to sort all files by their size (using the Path.size() extension function found on lines 8-14) on line 23. The result is collected into a list on line 25.

By default, our files are sorted smallest to largest. We can either rework the comparator used on line 23 to reverse sort or just call the reversed() method on the list object. The former idea is most likely more performant but later is very readable. Finally, since we are interested in the five largest files, we open another Stream on the list and limit it to 5 elements. The final operation is to call forEach on the list and print the file name and its size.

References

https://docs.oracle.com/javase/8/docs/api/?java/io/File.html

Kotlin Files.exists() and Files.isSameFile()

The Files class provides utility methods that are useful for working with the file system.

import java.io.BufferedWriter
import java.io.FileWriter
import java.nio.file.Files
import java.nio.file.Paths


private val belchers = "belchers.txt"

private fun makeBelcherFile(){
    val names = listOf("Bob", "Linda", "Tina", "Gene", "Louise")
    BufferedWriter(FileWriter(ch9.paths.belchers)).use { writer ->
        names.forEach { name ->
            with(writer){
                write(name)
                newLine()
            }
        }
    }
}

fun main(args : Array<String>){
    //Check if the file exists first, and create it if needed
    if (!Files.exists(Paths.get(belchers))){
        makeBelcherFile()
    }
    val relativeBeclhers = Paths.get(belchers)
    val absoluteBelchers = Paths.get(System.getProperty("user.dir"), belchers)

    //Check if both Paths point to the same file
    println("Using Files.isSameFile() => " + (Files.isSameFile(relativeBeclhers, absoluteBelchers)))
}

Output

Using Files.isSameFile() => true

The program uses Files.exists to see if we have a belchers.txt file on the underlying os. If the method returns false, we call makeBelchersFile() on line 24 to create the file. Lines 26 and 27 create two different Path objects to point at the belchers.txt file.

The relativeBelchers is a Path object created using a relative path to the file. The absoluateBelchers object is created with an aboslute path by combining the current working directory with the name of the file. One line 38, we use the Files.isSameFile and pass both of the Path objects to it. Since both of these Paths point at the same file, it returns True.

Methods

exists(path : Path, varages options : LinkOptions) : Boolean

The exists method is used to test if a file exists or not. We can also pass optional LinkOptions that instructs the method on how to handle symbolic links in the file system. For example, if we don’t want to follow links, then pass NOFOLLOE_LNKS.

isSameFile(path : Path, path2 : Path)

Tests if the both path objects point to the same file. It’s the same as checking if path == path2.

References

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#isSameFile-java.nio.file.Path-java.nio.file.Path-

Kotlin Path Interface

The Path interface is provided to Kotlin by the java.nio.file.Paths package. It represents Paths on the underlying file system and provides a number of useful utility methods for working with paths. Here is an example program followed by output and an explanation.

import java.io.BufferedWriter
import java.io.FileWriter
import java.nio.file.Paths

val belchers = "belchers.txt"

fun makeBelcherFile(){
    val names = listOf("Bob", "Linda", "Tina", "Gene", "Louise")
    BufferedWriter(FileWriter(belchers)).use { writer ->
        names.forEach { name ->
            with(writer){
                write(name)
                newLine()
            }
        }
    }
}

fun main(args : Array<String>){
    //Just make a file on the file system for demonstration purposes
    makeBelcherFile()

    //Get a reference to our example path on the disk.
    //In this case, we are using Paths.get() to get a reference to the current working directory
    //and then using the resolve() method to add the belchers.txt file to the path
    val belcherPath = Paths.get(System.getProperty("user.dir")).resolve(belchers)

    val template = "\t%-30s => %s"

    with(belcherPath){
        println("File Information")
        //The fileName property returns the name of the file
        println(template.format("File Name", fileName))

        //The root property return the the root folder of the path
        println(template.format("File Path Root", root))

        //The parent property returns the parent folder of the file
        println(template.format("File Path Parent", parent))

        //The nameCount returns how many items are in the path
        println(template.format("Name Count", nameCount))

        //The subpath() method returns a portion of the path
        println(template.format("Subpath (0, 1)", subpath(0, 1)))

        //The normalize method returns items such as . or .. from the path
        println(template.format("Normalizing", normalize()))

        //True if this is an absolute path otherwise false
        println(template.format("Is Absolute Path?", isAbsolute))

        //Convert to an absolute path if needed
        println(template.format("Absolute Path", toAbsolutePath()))

        //Check if the path starts with a path. In this example, we are using the home folder
        println(template.format("Starts with ${System.getProperty("user.home")}?", startsWith(System.getProperty("user.home"))))
        println()

        println("Elements of the Path")
        //We can print each portion of the path individually also!
        forEach { it -> println("\tPortion => $it") }
    }
}

Here is the output when run on my machine.

File Information
	File Name                      => belchers.txt
	File Path Root                 => /
	File Path Parent               => /Users/stonesoup/IdeaProjects/OCJAP
	Name Count                     => 5
	Subpath (0, 1)                 => Users
	Normalizing                    => /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt
	Is Absolute Path?              => true
	Absolute Path                  => /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt
	Starts with /Users/stonesoup?  => true

Elements of the Path
	Portion => Users
	Portion => stonesoup
	Portion => IdeaProjects
	Portion => OCJAP
	Portion => belchers.txt

Explanation

The program writes out a basic text file to the file system for demonstration purposes. We are going to focus on the main function. Our first task is to get a Path object that points to our belchers.txt file. We use the Paths.get() factory method and path in a path on the file system. In our example, we use the current working directory by using the System property “user.dir”.

We could have also added the belchers.txt to the end of the current working directory. However, I wanted to demonstrate the resolve method that combines two paths into a single path. So we chain the resolve method to the returned Path object and add belchers.txt. The returned Path object points to the path of belchers.txt on the file system.

The next part of the program demonstrates commonly used methods found on the Path interface. Line 33 prints the name of the file by using the fileName property. Next we print out the root of the path by using the root property (line 36). When we want to know the parent of a path, we can use the parent property (line 39).

The Path interface has a nameCount property (line 42) that returns the number of items in a path. So if a path is /Users/stonesoup/IdeaProjects/OCJAP/belchers.txt, nameCount returns 5, one for each item between each slash (/) character. The nameCount is useful when working with the Subpath function (line 45), which accepts a start index (inclusive) and an end index (exclusive) and returns a Path object based on the indexes.

Sometimes paths are abnormal paths and may have “.” or “..” characters in the path. When we want to remove such characters, we use the normalize() function (line 48) which strips out abnormal characters from the path. Depending on the work we may be doing, we may want to test if the Path is a relative path or an abosulte path. The Path interface has an isAbsolute property (line 51) for such purposes. It returns true if the path is an absolute path otherwise false.

Should we wish to convert a relative path into an absolute path, we only need to call the toAbsolutePath() function (line 54) and we will get an absolute path. We can also check if a path starts with a certain path. In our example, line 57, we check if our path starts with the users home directory (user.home). It returns true or false based on the outcome.

Path supports the forEach() function. Line 62 shows an example of how we can iterate through each part of the Path. The it variable holds each portion of the path and the program prints each part of the path.

Common Methods

We spoke about each method as it relates the program above. Here are each of the commonly used methods broken down.

Paths.get(first : String, varages more : String) : Path

The get() converts a String (or URI in the overloaded version) into a Path object. When we use use the varags part, the Path will use the OS name seperator. So Unix paths will have a forward slash, while Windows ones will have a backslash.

val home = Paths.get(System.getProperty("user.home"))

parent : Path

The parent property returns a Path object that points to the parent of the current Path object.

val parent = home.parent

nameCount : Int

The nameCount returns the number of items in the path.

val count = home.nameCount

subPath(beginIndex : Int, endIndex : Int) : Path

The subPath method is used to return a portion of the path object. The beginIndex is inclusive while the endIndex is exclusive.

val part = home.subPath(0, 2)

normalize() : Path

The normalize() method returns a Path object without unneeded characters.

val norm = home.normalize()

resolve(other : Path) : Path, resolve(other : String): Path

Returns a Path object that is a combined path between the current path and the other parameter.

val belchers = home.resolve("belchers.txt")

isAbsolute : Boolean

True if the Path is an absolute path otherwise it’s false.

val absolute = home.isAbsolute

startsWith(path : String) : Boolean, startsWith(path : Path) : Boolean

True if the current path starts with the supplied path argument.

val hasRoot = home.startsWith("/")

toAbsolutePath() : Path

Returns a Path object that is the aboslute path of the current Path.

val abs = home.toAbsolutePath()

References

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.html

Python Split and Join file

The book Programming Python: Powerful Object-Oriented Programming has an example program that shows how to split and join files. Many utilities exist for such an operation but the program offers a good working example of how to read from and write to binary files in Python3. The code below is an adaptation from the book with my own comments added.

Code

import os


def split(source, dest_folder, write_size):
    # Make a destination folder if it doesn't exist yet
    if not os.path.exists(dest_folder):
        os.mkdir(dest_folder)
    else:
        # Otherwise clean out all files in the destination folder
        for file in os.listdir(dest_folder):
            os.remove(os.path.join(dest_folder, file))

    partnum = 0

    # Open the source file in binary mode
    input_file = open(source, 'rb')

    while True:
        # Read a portion of the input file
        chunk = input_file.read(write_size)

        # End the loop if we have hit EOF
        if not chunk:
            break

        # Increment partnum
        partnum += 1

        # Create a new file name
        filename = os.path.join(dest_folder, ('part%004' % partnum))

        # Create a destination file
        dest_file = open(filename, 'wb')

        # Write to this portion of the destination file
        dest_file.write(chunk)

        # Explicitly close 
        dest_file.close()
    
    # Explicitly close
    input_file.close()
    
    # Return the number of files created by the split
    return partnum


def join(source_dir, dest_file, read_size):
    # Create a new destination file
    output_file = open(dest_file, 'wb')
    
    # Get a list of the file parts
    parts = os.listdir(source_dir)
    
    # Sort them by name (remember that the order num is part of the file name)
    parts.sort()

    # Go through each portion one by one
    for file in parts:
        
        # Assemble the full path to the file
        path = os.path.join(source_dir, file)
        
        # Open the part
        input_file = open(path, 'rb')
        
        while True:
            # Read all bytes of the part
            bytes = input_file.read(read_size)
            
            # Break out of loop if we are at end of file
            if not bytes:
                break
                
            # Write the bytes to the output file
            output_file.write(bytes)
            
        # Close the input file
        input_file.close()
        
    # Close the output file
    output_file.close()

Explanation

split

The code snippet shows to sample functions that either split a file into parts or join those parts back together into one file. The split function begins by taking three parameters. The first parameter, source, is the file that we wish to split. The second parameter, dest_folder, is a folder that stores the output files created by the split operation. The final parameter, write_size, is the size of the file parts in bytes.

Split starts by checking if dest_folder exists or not. If the folder does not exist, we call os.mkdir to create a new folder on the file system. Otherwise, we obtain a list of all files in the folder by calling os.listdir and then remove all of them by calling os.remove. When calling os.remove, we use os.path.join to create a full path to the target file that’s getting deleted.

Once the destination folder has been prepared, the function continues by performing the actually split operation. A partnum variable is created on line 13 that tracks the number of file parts created by the split operation. The source file is opened on line 16 in binary mode. Binary mode is used in this case because we could be dealing with audio or video files and not just text files.

The split function enters an infinite loop on line 18. On line 20, we read a number of bytes, specified by write_size, from the source file and store them in the chunk variable. On line 23, we test if chunk actually recieved any bytes from the read operation. If chunk did not read any bytes, then we have hit end of file (EOF) and we break out of the loop. Otherwise, we increment partnum by one and begin to write the file part.

Line 30 creates the name and destination for the file part by using os.path.join, the dest_folder, and a string template that accepts the current part number. The destination file is created on line 33 with a call to open (also in binary mode) and then on line 36, we write chunk to the file. Line 39 has an explicit call to closing the file. While we normally wait for files to close in garabage collection, this function opens a lot of files so ideally we should close them in oder to make sure we don’t exceed the number of file handles the underlying OS allows. The function ends by closing the input_file and returning the number of part files created.

join

The join function does the reverse job of the split function. It begins by accepting a source_dir, a destination file, and the size of the part files. The output_file is created on line 50 (opened in binary mode) and then on line 53, we use os.listdir to get a list of all parts.

Since our part files contain a number that identifies the parts, we can store all parts in a list and call sort() on it. Then it’s just a matter of looping through all of the parts and assembling them into a single file. The for loop starts on line 59. On line 62, we use os.path.join to create a full path to the part file and then we can open the part file on line 65.

The program enters an infinite join loop on line 67. Inside of the while loop, we read a part of the input_file and return the bytes read. If bytes is empty, we have it end of file so we can test for this on line 72 and use break to end the while loop if we have hit end of file. Otherwise, we can write to the output file on line 76.

When we have finished reading our part file, we again close it explicitly on line 79. When all parts of have been read we close the output_file. The output_file contains the bytes of the original file that was split in the first places

Thoughts

The code contained in this post isn’t ideal for production but is instead meant to be a learning tool. In this code, we cover reading and writing to binary files and functions of the os module. There are areas we could improve this code. For example, split destroys the contents of the destination folder, but ideally, it should instead throw an exception back to the caller and let the caller delete all files in a folder instead.

We also don’t test if our input files are really files and if our folders are really folders. That is certainly an area for improvement. Another thing that could be improved upon is using an enumeration for the size of the file parts. Right now, write_size in split and read_size in join are specified in bytes, but that isn’t clear to clients of these functions.

References

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Page Through A File

Many operating systems have command line tools that allow a user to page through a file in chunks. As a demonstration of how to read text files in Python, I used an example from Programming Python: Powerful Object-Oriented Programming.

Code

def more(text, numlines=15):
    # This splits the text into a list object based on line
    # endings
    lines = text.splitlines()

    # Now continue to loop until we are out of lines
    while lines:
        # Slice off numLines into chunk
        chunk = lines[:numlines]
        
        # Remove numLines from the beginning of lines
        lines = lines[numlines:]

        # Now loop through each line in chunk
        for line in chunk:
            # and then print a line
            print(line)
            
        # Now ask the user if we want to keep going
        if lines and input('More?') not in ['y', 'Y']:
            break

if __name__ == '__main__':
    # Import sys so that we can read command line arguments
    import sys
    
    # Next, we are grabbing the first argument from the
    # command line, and passing it the open function
    # which returns a file object. Calling read on this
    # object will dump the contents of the file into a String
    # which gets passed to our more function above
    more(open(sys.argv[1]).read(), 10)

Detailed Explanation

The comments in the code above are mine and explain what is going on in the program. The program starts by testing if this script is getting called as a standalone program or if we are importing this code as a module.

Assuming this is a standalone program, we import the sys module so that we can examine the command line arguments. The second command line argument needs to be a text file or this program will crash. We pass the name of the file to the open function, which returns a file object. Calling read() on the file object dumps the entire contents of the file into a String.

At this point, we pass the string into our more() function. It starts out by splitting the string by lines, which returns a list object. We start to loop through this list object, which continues until the list is empty.

Inside of the while loop, we slice off numLines from lines and store then in chunk. Then we remove those lines from the lines list. The next step is to print out each line in chunk. Once that is complete, we test if we still have more lines to print and if we do, we ask the user if they want to keep going or exit.

Here is the program output when run on my screen.

Patricks-MacBook-Pro:System stonesoup$ python more.py more.py
def more(text, numlines=15):
    lines = text.splitlines()

    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]

        for line in chunk:
            print(line)
        if lines and input('More?') not in ['y', 'Y']:
More?y
            break

if __name__ == '__main__':
    import sys
    more(open(sys.argv[1]).read(), 10)