Week 5: Regular Expressions

Tariq Hook Blog

Share

You know that cool feature in your browser that lets you search for a piece of text really fast. If not try it right now: press Control+F or Command+F on a mac and a box will pop up somewhere, prompting you to type a piece of text you’re searching for. You can type full words, single characters, numbers and your browser will show you all instances of your query, if any. This is accomplished through regular expressions or regex for short.

Regular expressions are text strings used to find text in a file. In Java and many more programming languages, you can go a step further and use regex to validate, replace, insert, or split strings matching your pattern. A simple concept, but the appearance of regex syntax can be very daunting to newcomers.

Fear not: you just have to take it bit by bit. First let’s look at some examples of regex symbols and example matches. Note that the below tables make up a very small subset of all the available regex symbols and possible combinations.

Symbol
Meaning Example Sample Match
+
one or more See below See below
*
zero or more See below See below
.
any character except the new line (\n) character t.p top, tip, tap
\
escape the following symbol 3\*3=9 3*3=9
.*
zero or more characters not including the new line character (\n) c.*y calligraphy, cartography
Symbol
Meaning Example Sample Match
\d
a single digit I am \d years old I am 3 years old
\d+
one or more digits I moved to NY in \d+ I moved to NY in 1994
\w
a single word character \w-\w\w D-oG
\w*
zero or more word characters My name is \w* My name is Johnny

Now let’s create a very simple Java program to make use of our new learned skill. This program will read a contract from an external text (.txt) file and replace any instance of “[name]” with your name and print out the signed contract. The text file looks like this:

I, [name], agree to be a disciple of Java and practice Java everyday.

And here’s our source code. I have included a mini lesson on loading external files, written as comments.

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class ContractRegex {

private String contractData;

public ContractRegex() {
this.contractData = loadContract();
}

// This method will load the contract.txt
// from our resources module
// and will return it as a string for
// us to work with in Java

private String loadContract(){

// Every Java object has a getClass() method that
// returns the runtime class of the object (Class type).
// We then use the Class object to get its ClassLoader.

ClassLoader classLoader = this.getClass().getClassLoader();

// We can use a ClassLoader object to load a number of things
// but in our case we want to load a file from our resources folder.
// This is a designated folder for any outside files we use in our application.
// These can be images, videos, executables, e.t.c.
// getResource will return a Uniform Resource Locator (URL) of our text file.
// [LINK=https://en.wikipedia.org/wiki/Uniform_Resource_Locator]URLs[/LINK] are
// unique location names of files on a system.
// You use URLs all the time. For example, http://www.google.com
// The URL object can return a File object of our resource with getFile().

File file = new File(classLoader.getResource("contract.txt").getFile());

// StringBuilders are used to create mutable versions of strings.
// That means we can append new strings to 'result'.
// We use this because normal [LINK=http://stackoverflow.com/a/8798424/1212854]String objects are immutable[/LINK] // 'result' will hold our final scanned data.
// Please note that StringBuilders are not Strings

StringBuilder result = new StringBuilder("");

// This is a try-catch block that tries to accomplish a task
// and catches an error if the operation in the try section failed.
// Here, we try to create a new Scanner object.
// A Scanner is used to parse text.
// We pass a File into the scanner to read the text from the File.
// A Scanner is part of the Input-Output (I/O) model.
// I/O has to do with information put into the computer from
// peripherals such as the keyboard and information printed out
// for example, text printed out to screen.
// A lot of issues can arise from I/O streams, so we
// catch all I/O errors using exceptions in our try-catch block.
// Exceptions are errors in Java. There are many built in exceptions.
// You can also create your own and "throw" it whenever. That's what
// 'Scanner scanner = new Scanner(file)' does. It throws an Exception
// and here we catch it and print the error message associated with
// the Exception with printStackTrace().

try(Scanner scanner = new Scanner(file)){

// A while loop performs the operation in the block
// until the condition (scanner.hasNextLine()) is false.
// A Scanner reads in text line by line.
// scanner.hasNextLine() returns a boolean (true/false)
// depending on if the File has text at the end of the scanned line

while(scanner.hasNextLine()){

// While we have a line to scan, we create a temporary
// String object to hold our scanned line.
// We retrieve the scanned line with scanner.nextLine().
// Because we used a StringBuilder and not a String to hold
// our final scanned data, we can easily add new strings to the
// end of 'result' by calling append(stringToAppend).
// '\n' represents a new line. A new line is created when you
// press Enter in a text editor.

String line = scanner.nextLine();
result.append(line).append("\n");
}

// Think of close() as turning the running water tap off.
// We would be wasting water and risking an overflow if we did
// not turn it off. Also, we know no one will use it in the future.
// You wouldn't leave your tap running in anticipation of another
// user, would you?

scanner.close();
}catch(IOException e){
e.printStackTrace();
}

// A StringBuilder object is not a string, but
// we can get a String object from it.

return result.toString();
}

public String getContractData() {
return contractData;
}

public static void main(String[] args) {
String signedContract = new ContractRegex().getContractData().replaceAll("\\[name\\]", "Zippy Jenkins");
System.out.println(signedContract);
}

}

The regular expression used is very simple. We want to replace not just “name” but the braces that surround it. Braces in regex already have a special meaning so we need to escape it.

Braces, on their own, are used to specify one or more character inside them to be matched. For example [AEIOU] will match A, E, I, O, or U.

We escape the braces using ‘\’ in Java. Using only ‘\]’ as opposed to ‘\\]’ will result in an error because ‘\’ and ‘[‘ have a significant meaning in regex.

The replacement string, “Zippy Jenkins” will take the place of [name] in our contract and the output will look like this:

I, Zippy Jenkins, agree to be a disciple of Java and practice Java everyday.

That’s it! Using replaceAll(String regex, String replacement), you can quickly replace all instances of the matching strings of the contract data with your replacement string. The replaceAll() method is one out of many ways to utilize regex in Java. You can create much more complex patterns, insert text, validate, and split strings using the Java regex tools.

Share