Kotlin Walk a File Tree

The java.nio.file.Files class has a walk method that returns a Stream used to walk a file tree. The example program lists out the 5 largest files given a starting path and demonstrates how to easily walk through a file system in Kotlin.

package ch9.files

import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.util.stream.Collectors.toList

private fun Path.size() : Long {
    return try {
        Files.size(this)
    } catch (e : Exception){
        -1
    }
}

fun main(args: Array<String>){
    if(args.isNotEmpty()){
        val path = Paths.get(args[0])

        //Open a Stream object
        Files.walk(path)
                //Sort by size
                .sorted { lhs : Path?, rhs : Path? -> compareValues(lhs?.size() ?: -1, rhs?.size() ?: -1)}
                //Collect the result into a list
                .collect(toList())
                //Now reverse the list so that the largest file is first
                .reversed()
                .stream()
                //Open another stream and collect up to 5 files
                .limit(5)
                //Now print the results
                .forEach({it -> println("${it.fileName} \t ${it.size()}") })
    } else {
        println("Usage: start path")
    }
}

Detailed Explanation

The program parses the command line arguments and returns a Path object (line 18). The Path object is passed to the Files.walk() method on line 21. The walk() method returns a Stream object that opens up all of the operations found on a Java 8 Stream. In our case, we wish to sort all files by their size (using the Path.size() extension function found on lines 8-14) on line 23. The result is collected into a list on line 25.

By default, our files are sorted smallest to largest. We can either rework the comparator used on line 23 to reverse sort or just call the reversed() method on the list object. The former idea is most likely more performant but later is very readable. Finally, since we are interested in the five largest files, we open another Stream on the list and limit it to 5 elements. The final operation is to call forEach on the list and print the file name and its size.

References

https://docs.oracle.com/javase/8/docs/api/?java/io/File.html

Advertisements

Python Pipe Operations

Python programs can link their input and output streams together using pipe (‘|’) operations. Piping isn’t a feature of Python, but rather comes from operating system. Nevertheless, a Python program can leverage such a feature to create powerful operating system scripts that are highly portable across platforms. For example, developer toolchains can be scripted together using Python. I personally have used Python to feed input into unit testing for my Java/Kotlin programs.

This post is a modified example of a demonstration found in Programming Python: Powerful Object-Oriented Programming. It uses a producer and consumer script to demonstrate one program feeding input into another Python program.

writer.py

Here is the code for writer.py

family = [
    'Bob Belcher',
    'Linda Belcher',
    'Tina Belcher',
    'Gene Belcher',
    'Louise Belcher'
]
for f in family:
    print(f)

Nothing special here. We are just building up a list that prints out the names of our favorite TV family, the Belchers.

reader.py

This code receives the output from writer.

while True:
    try:
        print('Entering {}'.format(input()))
    except EOFError:
        break

Once again, this is a simple script. Without a pipe operation, the input() statement on line 3 would normally collect the input from the keyboard. That’s not what is going to happen here.

Demonstration

We are going to execute these scripts by running the command below in the terminal.

python writer.py | python reader.py

The pipe '|' character does the job of connecting writer.py's output stream to reader.py's input stream. Thus, print statements in writer.py connect to input statements in reader.py. Here is the output of this operation.

Entering Bob Belcher
Entering Linda Belcher
Entering Tina Belcher
Entering Gene Belcher
Entering Louise Belcher

Python Console Streams

Programs on all major operating systems are connected to input and output streams (and usually an error stream). When run from the commandline without a redirect operator, a program is normally connnected to the shells standard input stream (where a user can type commands into the program) and the standard output stream (which prints output back on the console).

However, we aren’t limited to such streams. It’s perfectly possible to use the contents of a file as a program’s input stream or even use the output of one program and link it’s output to another program’s input stream. Such chaining of streams allows for powerful OS scripting.

This isn’t really a Python feature, but here is an example found in Programming Python: Powerful Object-Oriented Programming that demonstraes connecting the output of one OS stream to the input stream of a Python program. I added some comments that help explain the program.

teststreams.py

def interact():
    print('Hello stream world')
    while True:
        try:
            # Input normally reads from the keyboard because our program
            # is connected to that input stream. However, if we execute this program
            # in a way that connects the program's input to some other stream,
            # the input command reads from there instead!
            reply = input('Enter a number => ')
        except EOFError:
            # We have reached the end of our input stream (for example user entered ctrl+c at the shell)
            # So we exit the looop
            break
        else:
            num = int(reply)
            print("%d squared is %d" % (num, num ** 2))
    print('Bye')

if __name__ == '__main__':
    interact()

When this program is run on it’s own, it will collect input from the keyboard until we press ctrl+c. That’s not the part that we are demonstrating here. Let’s suppose we have a text file that has the following contents.

input.txt

1
2
3
4
5
6
7
8
9
10

Now when we run our program with a redirect operator, we get the following output.

Patricks-MacBook-Pro:Streams stonesoup$ python teststreams.py  1 squared is 1
Enter a number => 2 squared is 4
Enter a number => 3 squared is 9
Enter a number => 4 squared is 16
Enter a number => 5 squared is 25
Enter a number => 6 squared is 36
Enter a number => 7 squared is 49
Enter a number => 8 squared is 64
Enter a number => 9 squared is 81
Enter a number => 10 squared is 100
Enter a number => Bye

Notice the unix redirect operator. This program was run python teststreams.py < input.txt. That < input.txt connects the contents of input.txt to the teststreams.py script. Thus, when input is called, the function simply collects the next line in input.txt rather than waiting for the keyboard.