Kotlin Regex Pattern Matching

Matching Strings using regular expressions (REGEX) is a difficult topic. Regex strings are often difficult to understand and debug. They often require extensive testing to make sure that the regex is matching what it is supposed to match.

Kotlin goes out of its way to avoid making developers use regex. For example, the split() method of String does not require a regex (unlike its Java counterpart). Doing so reduces bugs and helps keep the code more readable in general. When we need to use a regex, Kotlin has an explicit Regex type.

One advantage of having a regex type is that code is immediately more readable.

val regex = """\d{5}""".toRegex()

Notice a few things about this String. First, we use the triple quoted, or raw, string to define the regular expression. This helps us avoid bugs caused by improper escaping of the regex string. Also, the string has a toRegex() method that converts the String to a Regex object.

The Regex object comes packed with its own methods that are used for pattern matching.

regex.containsMatchIn("My string 00000")
regex.findAll("00000, 000121, 23213")

Of course there are many other methods found on the Regex object, but see the Kotlin documentation for more details: http://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/-regex/index.html

Regex Tables

Below are some common regex symbols, meta symbols, and quantifiers as presented in Oracle Certified Professional Java SE 7 Programmer Exams 1Z0-804 and 1Z0-805 A Comprehensive OCPJP 7 Certification Guide by Ganesh and Sharma.

Common Symbols

Matches either x or y
Symbol Meaning
^expr Matches expr at beginning of the line
expr$ Matches expr at end of line
. Matches any single character (exception the newline character)
[xyz] Matches either x, y, or z
[p-z] Matches either any character from p to z or any digit from 1 to 9
[^p-z] ‘^’ as the first character negates the pattern. This will match anything outside of the range p-z
xy Matches x followed by y
x|y

Common Meta Symbols

\d Matches digits ([0-9])
\D Matches non-digits
\w Matches word characters
\W Matches non-word characters
\s Matches whitespaces [\t\r\f\n]
\S Matches non-whitespaces
\b Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket
\B Matches non-word boundary
\A Matches beginning of string
\Z Matches end of String

Common Quantifiers

expr? Matches 0 or 1 occurrence of expr (expr{0,1})
expr* Matches 0 or more occurrences of expr (expr{0,})
expr+ Matches 1 or more occurrences of expr (expr{1,})
expr{x, y} Matches between x and y occurrences of expr
expr{x, } Matches x or more occurrences of expr

Putting it Together

Here is an example program that uses Regex in Kotlin.

fun main(args : Array<String>){
    val symbols = mapOf(
            "^expr" to "Matches expr at beginning of the line",
            "expr$" to "Matches expr at end of line",
            "." to "Matches any single character (exception the newline character)",
            "[xyz]" to "Matches either x, y, or z",
            "[p-z]" to "Specifies a range. Matches any character from p to z",
            "[p-z1-9]" to "Matches either any character from p to z or any digit from 1 to 9",
            "[^p-z]" to "'^' as the first character negates the pattern. This will match anything outside of the range p-z",
            "xy" to "Matches x followed by y",
            "x|y" to "Matches either x or y")

    val metaSymbols = mapOf(
            "\\d" to "Matches digits ([0-9])",
            "\\D" to "Matches non-digits",
            "\\w" to "Matches word characters",
            "\\W" to "Matches non-word characters",
            "\\s" to "Matches whitespaces [\\t\\r\\f\\n]",
            "\\S" to "Matches non-whitespaces",
            "\\b" to "Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket",
            "\\B" to "Matches non-word boundary",
            "\\A" to "Matches beginning of string",
            "\\Z" to "Matches end of String")


    val quantifiers = mapOf(
            "expr?" to "Matches 0 or 1 occurrence of expr (expr{0,1})",
            "expr*" to "Matches 0 or more occurrences of expr (expr{0,})",
            "expr+" to "Matches 1 or more occurrences of expr (expr{1,})",
            "expr{x, y}" to "Matches between x and y occurrences of expr",
            "expr{x, }" to "Matches x or more occurrences of expr")

    val format = "%-10s\t%s"
    val func = {entry : Map.Entry<String, String> -> println(format.format(entry.key, entry.value)) }

    println("Symbols")
    symbols.entries.forEach(func)

    println("\nMeta Symbols")
    metaSymbols.entries.forEach(func)

    println("\nQuantifiers")
    quantifiers.entries.forEach(func)

    //Create a regex object
    println("\nTesting regex: ^Matches")
    val regex = "^Matches".toRegex()
    symbols.entries.forEach({it ->
        //The Regex Type has a Number of Pattern Matching Methods
        val matchResult = regex.containsMatchIn(it.value)
        println("$matchResult => ${it.value}")
    })
}

Output

Symbols
^expr     	Matches expr at beginning of the line
expr$     	Matches expr at end of line
.         	Matches any single character (exception the newline character)
[xyz]     	Matches either x, y, or z
[p-z]     	Specifies a range. Matches any character from p to z
[p-z1-9]  	Matches either any character from p to z or any digit from 1 to 9
[^p-z]    	'^' as the first character negates the pattern. This will match anything outside of the range p-z
xy        	Matches x followed by y
x|y       	Matches either x or y

Meta Symbols
\d        	Matches digits ([0-9])
\D        	Matches non-digits
\w        	Matches word characters
\W        	Matches non-word characters
\s        	Matches whitespaces [\t\r\f\n]
\S        	Matches non-whitespaces
\b        	Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket
\B        	Matches non-word boundary
\A        	Matches beginning of string
\Z        	Matches end of String

Quantifiers
expr?     	Matches 0 or 1 occurrence of expr (expr{0,1})
expr*     	Matches 0 or more occurrences of expr (expr{0,})
expr+     	Matches 1 or more occurrences of expr (expr{1,})
expr{x, y}	Matches between x and y occurrences of expr
expr{x, } 	Matches x or more occurrences of expr

Testing regex: ^Matches
Disconnected from the target VM, address: '127.0.0.1:61983', transport: 'socket'
true => Matches expr at beginning of the line
true => Matches expr at end of line
true => Matches any single character (exception the newline character)
true => Matches either x, y, or z
false => Specifies a range. Matches any character from p to z
true => Matches either any character from p to z or any digit from 1 to 9
false => '^' as the first character negates the pattern. This will match anything outside of the range p-z
true => Matches x followed by y
true => Matches either x or y

Rerences

Ganesh, S G., and Tushar Sharma. Oracle Certified Professional Java SE 7 Programmer Exams 1Z0-804 and 1Z0-805 A Comprehensive OCPJP 7 Certification Guide. Apress, 2013.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: