Kotlin Scanner

The Scanner class is a powerful class that looks for tokens in an input stream and returns each match. The class is often used on files, but it can work with other strings, network sockets, or just about any other character input stream object. The following program demonstrates using a Scanner object to search for words without punctuation. It reads a file and then outputs the most frequently used words to the least frequently used words.

import java.io.FileReader
import java.util.*

fun main(args : Array<String>){
    //Check if they supplied a file
    if(args.isEmpty()){
        println("Please provide a file")
        System.exit(-1)
    }

    //Create an empty map
    val wordMap = mutableMapOf<String, Int>()

    //Open the file and pass it to a Scanner object.
    Scanner(FileReader(args[0])).use { sc ->

        //Tell the scanner to only match entire words
        sc.useDelimiter("""\W""".toPattern())

        //Loop until we get to the end of the file
        while(sc.hasNext()){

            //Grab the next word
            val word = sc.next()

            //Test that it's not a blank string
            if(word.isNotBlank()){
                //Add it to the word map
                wordMap[word] = wordMap.getOrDefault(word, 0) + 1
            }
        }
    }

    //This prints the entries by most used words to least used words
    wordMap.entries.sortedByDescending { it.value }.forEach({it -> println(it)})
}

The program starts by checking if the user provided command line arguments. If the args array is empty, the program exits after printing an error message. Line 12 creates an empty mutable map so that we can add words and counts to it. Individual words are used as the key while the Int is used for values to represent the number of times the word is found.

The file is opened on line 15. We create a FileReader object and pass the path of the file to its constructor. The file path is found at the first element in the arguments array and was supplied by the user. The FileReader object is passed to the constructor of the Scanner. We apply the use() function to ensure the Scanner and the underlying file is closed when have finished.

Line 18 tells the Scanner to match whole words by passing in a regex string and converting it to a Pattern object. The regex “\W” matches whole words. Kotlin allows use to use raw strings inside of triple quotes “”” so that we do not need to worry about escaping any characters.

Line 21 enters a while loop that terminates when Scanner.hasNext() is false. That means we loop until there are no more matches in the input stream. Line 27 tests if the word is a blank string and if it isn’t a blank string, we update the word count on line 29.

Line 35 prints each word from the most used to the least used. It’s accomplished by getting the entries list and then sorting it in descending order. The sortedByDescending takes a comparator object which is created by the lambda expression it.value. In this case, it.value represents the number of times a word was found. The final forEach() operation iterates through the sorted list of entries and prints them individually to the console.

Here was my output when I used this program with a brief excerpt from Green Eggs and Ham.

I=37
not=29
like=28
them=28
do=20
a=19
am=10
Sam=10
you=8
would=7
eggs=6
and=6
ham=6
Would=6
In=6
in=6
or=5
there=5
eat=5
Not=5
Here=4
house=4
mouse=4
with=4
You=4
That=3
Green=3
With=3
green=3
box=3
fox=3
car=3
Anywhere=2
here=2
anywhere=2
Eat=2
could=2
may=2
tree=2
Do=1
Could=1
they=1
are=1
will=1
see=1
let=1
me=1
be=1
Advertisements

Python 3 Os File Tools

The Python os module has a number of useful file commands that allow developers to perform common file tasks such as changing file permissions, renaming files, or even deleting files. The following snippets are modified examples from Programming Python: Powerful Object-Oriented Programming

os.chmod

os.chmod alters a file’s permissions. The example usage below takes two arguments. The first argument is the path to the file and the second argument is a 9-bit string that composes the new file permissions.

os.chmod('belchers.txt', 0o777)

os.rename

os.rename is used to give a file a new name. The first argument is the current name of the file and the second argument is the new name of the file.

os.rename('belchers.txt', 'pestos.txt')

os.remove

The os.remove deletes a file. It takes the path of the target file to delete.

os.remove('pestos.txt')

os.path.isdir

The os.path.isdir accepts a path to a file or directory. It returns True if the path is a directory otherwise False.

os.path.isdir('/home') #True
os.path.isdir('belchers.txt') #False

os.path.isfile

os.path.isfile works like os.path.isdir but only it’s for files.

os.path.isfile('belchers.txt') #True
os.path.isfile('/home') #False

os.path.getsize

os.path.getsize returns the size of the file. It accepts the path to the file as an argument.

os.path.getsize('belchers.txt')

Sources

Lutz, Mark. Programming Python. Beijing, OReilly, 2013.

Python Page Through A File

Many operating systems have command line tools that allow a user to page through a file in chunks. As a demonstration of how to read text files in Python, I used an example from Programming Python: Powerful Object-Oriented Programming.

Code

def more(text, numlines=15):
    # This splits the text into a list object based on line
    # endings
    lines = text.splitlines()

    # Now continue to loop until we are out of lines
    while lines:
        # Slice off numLines into chunk
        chunk = lines[:numlines]
        
        # Remove numLines from the beginning of lines
        lines = lines[numlines:]

        # Now loop through each line in chunk
        for line in chunk:
            # and then print a line
            print(line)
            
        # Now ask the user if we want to keep going
        if lines and input('More?') not in ['y', 'Y']:
            break

if __name__ == '__main__':
    # Import sys so that we can read command line arguments
    import sys
    
    # Next, we are grabbing the first argument from the
    # command line, and passing it the open function
    # which returns a file object. Calling read on this
    # object will dump the contents of the file into a String
    # which gets passed to our more function above
    more(open(sys.argv[1]).read(), 10)

Detailed Explanation

The comments in the code above are mine and explain what is going on in the program. The program starts by testing if this script is getting called as a standalone program or if we are importing this code as a module.

Assuming this is a standalone program, we import the sys module so that we can examine the command line arguments. The second command line argument needs to be a text file or this program will crash. We pass the name of the file to the open function, which returns a file object. Calling read() on the file object dumps the entire contents of the file into a String.

At this point, we pass the string into our more() function. It starts out by splitting the string by lines, which returns a list object. We start to loop through this list object, which continues until the list is empty.

Inside of the while loop, we slice off numLines from lines and store then in chunk. Then we remove those lines from the lines list. The next step is to print out each line in chunk. Once that is complete, we test if we still have more lines to print and if we do, we ask the user if they want to keep going or exit.

Here is the program output when run on my screen.

Patricks-MacBook-Pro:System stonesoup$ python more.py more.py
def more(text, numlines=15):
    lines = text.splitlines()

    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]

        for line in chunk:
            print(line)
        if lines and input('More?') not in ['y', 'Y']:
More?y
            break

if __name__ == '__main__':
    import sys
    more(open(sys.argv[1]).read(), 10)