Matching Strings using regular expressions (REGEX) is a difficult topic. Regex strings are often difficult to understand and debug. They often require extensive testing to make sure that the regex is matching what it is supposed to match.
Kotlin goes out of its way to avoid making developers use regex. For example, the split() method of String does not require a regex (unlike its Java counterpart). Doing so reduces bugs and helps keep the code more readable in general. When we need to use a regex, Kotlin has an explicit Regex type.
One advantage of having a regex type is that code is immediately more readable.
val regex = """\d{5}""".toRegex()
Notice a few things about this String. First, we use the triple quoted, or raw, string to define the regular expression. This helps us avoid bugs caused by improper escaping of the regex string. Also, the string has a toRegex() method that converts the String to a Regex object.
The Regex object comes packed with its own methods that are used for pattern matching.
regex.containsMatchIn("My string 00000") regex.findAll("00000, 000121, 23213")
Of course there are many other methods found on the Regex object, but see the Kotlin documentation for more details: http://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/-regex/index.html
Regex Tables
Below are some common regex symbols, meta symbols, and quantifiers as presented in Oracle Certified Professional Java SE 7 Programmer Exams 1Z0-804 and 1Z0-805 A Comprehensive OCPJP 7 Certification Guide by Ganesh and Sharma.
Common Symbols
Symbol | Meaning |
^expr | Matches expr at beginning of the line |
expr$ | Matches expr at end of line |
. | Matches any single character (exception the newline character) |
[xyz] | Matches either x, y, or z |
[p-z] | Matches either any character from p to z or any digit from 1 to 9 |
[^p-z] | ‘^’ as the first character negates the pattern. This will match anything outside of the range p-z |
xy | Matches x followed by y |
x|y | |
Common Meta Symbols
\d | Matches digits ([0-9]) |
\D | Matches non-digits |
\w | Matches word characters |
\W | Matches non-word characters |
\s | Matches whitespaces [\t\r\f\n] |
\S | Matches non-whitespaces |
\b | Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket |
\B | Matches non-word boundary |
\A | Matches beginning of string |
\Z | Matches end of String |
Common Quantifiers
expr? | Matches 0 or 1 occurrence of expr (expr{0,1}) |
expr* | Matches 0 or more occurrences of expr (expr{0,}) |
expr+ | Matches 1 or more occurrences of expr (expr{1,}) |
expr{x, y} | Matches between x and y occurrences of expr |
expr{x, } | Matches x or more occurrences of expr |
Putting it Together
Here is an example program that uses Regex in Kotlin.
fun main(args : Array<String>){ val symbols = mapOf( "^expr" to "Matches expr at beginning of the line", "expr$" to "Matches expr at end of line", "." to "Matches any single character (exception the newline character)", "[xyz]" to "Matches either x, y, or z", "[p-z]" to "Specifies a range. Matches any character from p to z", "[p-z1-9]" to "Matches either any character from p to z or any digit from 1 to 9", "[^p-z]" to "'^' as the first character negates the pattern. This will match anything outside of the range p-z", "xy" to "Matches x followed by y", "x|y" to "Matches either x or y") val metaSymbols = mapOf( "\\d" to "Matches digits ([0-9])", "\\D" to "Matches non-digits", "\\w" to "Matches word characters", "\\W" to "Matches non-word characters", "\\s" to "Matches whitespaces [\\t\\r\\f\\n]", "\\S" to "Matches non-whitespaces", "\\b" to "Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket", "\\B" to "Matches non-word boundary", "\\A" to "Matches beginning of string", "\\Z" to "Matches end of String") val quantifiers = mapOf( "expr?" to "Matches 0 or 1 occurrence of expr (expr{0,1})", "expr*" to "Matches 0 or more occurrences of expr (expr{0,})", "expr+" to "Matches 1 or more occurrences of expr (expr{1,})", "expr{x, y}" to "Matches between x and y occurrences of expr", "expr{x, }" to "Matches x or more occurrences of expr") val format = "%-10s\t%s" val func = {entry : Map.Entry<String, String> -> println(format.format(entry.key, entry.value)) } println("Symbols") symbols.entries.forEach(func) println("\nMeta Symbols") metaSymbols.entries.forEach(func) println("\nQuantifiers") quantifiers.entries.forEach(func) //Create a regex object println("\nTesting regex: ^Matches") val regex = "^Matches".toRegex() symbols.entries.forEach({it -> //The Regex Type has a Number of Pattern Matching Methods val matchResult = regex.containsMatchIn(it.value) println("$matchResult => ${it.value}") }) }
Output
Symbols ^expr Matches expr at beginning of the line expr$ Matches expr at end of line . Matches any single character (exception the newline character) [xyz] Matches either x, y, or z [p-z] Specifies a range. Matches any character from p to z [p-z1-9] Matches either any character from p to z or any digit from 1 to 9 [^p-z] '^' as the first character negates the pattern. This will match anything outside of the range p-z xy Matches x followed by y x|y Matches either x or y Meta Symbols \d Matches digits ([0-9]) \D Matches non-digits \w Matches word characters \W Matches non-word characters \s Matches whitespaces [\t\r\f\n] \S Matches non-whitespaces \b Matches word boundary when outside of a bracket. Matches backslash when placed in a bracket \B Matches non-word boundary \A Matches beginning of string \Z Matches end of String Quantifiers expr? Matches 0 or 1 occurrence of expr (expr{0,1}) expr* Matches 0 or more occurrences of expr (expr{0,}) expr+ Matches 1 or more occurrences of expr (expr{1,}) expr{x, y} Matches between x and y occurrences of expr expr{x, } Matches x or more occurrences of expr Testing regex: ^Matches Disconnected from the target VM, address: '127.0.0.1:61983', transport: 'socket' true => Matches expr at beginning of the line true => Matches expr at end of line true => Matches any single character (exception the newline character) true => Matches either x, y, or z false => Specifies a range. Matches any character from p to z true => Matches either any character from p to z or any digit from 1 to 9 false => '^' as the first character negates the pattern. This will match anything outside of the range p-z true => Matches x followed by y true => Matches either x or y
Rerences
Ganesh, S G., and Tushar Sharma. Oracle Certified Professional Java SE 7 Programmer Exams 1Z0-804 and 1Z0-805 A Comprehensive OCPJP 7 Certification Guide. Apress, 2013.