Kotlin Scanner

The Scanner class is a powerful class that looks for tokens in an input stream and returns each match. The class is often used on files, but it can work with other strings, network sockets, or just about any other character input stream object. The following program demonstrates using a Scanner object to search for words without punctuation. It reads a file and then outputs the most frequently used words to the least frequently used words.

import java.io.FileReader
import java.util.*

fun main(args : Array<String>){
    //Check if they supplied a file
    if(args.isEmpty()){
        println("Please provide a file")
        System.exit(-1)
    }

    //Create an empty map
    val wordMap = mutableMapOf<String, Int>()

    //Open the file and pass it to a Scanner object.
    Scanner(FileReader(args[0])).use { sc ->

        //Tell the scanner to only match entire words
        sc.useDelimiter("""\W""".toPattern())

        //Loop until we get to the end of the file
        while(sc.hasNext()){

            //Grab the next word
            val word = sc.next()

            //Test that it's not a blank string
            if(word.isNotBlank()){
                //Add it to the word map
                wordMap[word] = wordMap.getOrDefault(word, 0) + 1
            }
        }
    }

    //This prints the entries by most used words to least used words
    wordMap.entries.sortedByDescending { it.value }.forEach({it -> println(it)})
}

The program starts by checking if the user provided command line arguments. If the args array is empty, the program exits after printing an error message. Line 12 creates an empty mutable map so that we can add words and counts to it. Individual words are used as the key while the Int is used for values to represent the number of times the word is found.

The file is opened on line 15. We create a FileReader object and pass the path of the file to its constructor. The file path is found at the first element in the arguments array and was supplied by the user. The FileReader object is passed to the constructor of the Scanner. We apply the use() function to ensure the Scanner and the underlying file is closed when have finished.

Line 18 tells the Scanner to match whole words by passing in a regex string and converting it to a Pattern object. The regex “\W” matches whole words. Kotlin allows use to use raw strings inside of triple quotes “”” so that we do not need to worry about escaping any characters.

Line 21 enters a while loop that terminates when Scanner.hasNext() is false. That means we loop until there are no more matches in the input stream. Line 27 tests if the word is a blank string and if it isn’t a blank string, we update the word count on line 29.

Line 35 prints each word from the most used to the least used. It’s accomplished by getting the entries list and then sorting it in descending order. The sortedByDescending takes a comparator object which is created by the lambda expression it.value. In this case, it.value represents the number of times a word was found. The final forEach() operation iterates through the sorted list of entries and prints them individually to the console.

Here was my output when I used this program with a brief excerpt from Green Eggs and Ham.

I=37
not=29
like=28
them=28
do=20
a=19
am=10
Sam=10
you=8
would=7
eggs=6
and=6
ham=6
Would=6
In=6
in=6
or=5
there=5
eat=5
Not=5
Here=4
house=4
mouse=4
with=4
You=4
That=3
Green=3
With=3
green=3
box=3
fox=3
car=3
Anywhere=2
here=2
anywhere=2
Eat=2
could=2
may=2
tree=2
Do=1
Could=1
they=1
are=1
will=1
see=1
let=1
me=1
be=1
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: