Friday, March 7, 2008

Using Regular Expression in Java

Regular expression (regex) is a very powerful construct to manipulate text. Originated its popularity from PERL language, it is now supported by almost every popular programming language including Java.

So, how do we use it in Java ?

It's easy, just take a look at following tips.

Matching a text pattern within a String object

To match a text pattern, we can use matches() method from String object. Its syntax definition as follow :

boolean java.lang.String.matches(String regex)

So, we see that matches() method take a pattern - which is a String object - and returns boolean value of a matching condition.

Example 1:

if(textToBeMatched.matches("[a-z A-Z]+"))
System.out.println("Text match with a series of alphabet and space character only pattern");


Example 2:

public class RegexTest {

public static void testPhoneNumber(String phoneNumber)
{
String phoneNumberPattern = "^\\+{0,1}[\\d]+[-\\d]+\\d$";

if(phoneNumber.matches(phoneNumberPattern))
System.out.println(phoneNumber + " is a correct phone number !");
else
System.out.println(phoneNumber + " is a wrong phone number !");
}
public static void main(String[] args)
{
testPhoneNumber("+6221-3011-9353"); //outputs a correct phone number
testPhoneNumber("-6221-3011-9353"); //outputs a wrong phone number
}
}


Replacing text that match a pattern

To replace a text matching a pattern we need to use two java.util classes, i.e : Pattern and Matcher.

First, we initialize the pattern with compile() static method of Pattern class. With which also create a Pattern object. We then feed a source text to the matcher() method which then create a Matcher object. The last thing for us to do is to manipulate the text within the object, so we replace text with Matcher's replaceAll() method.



Example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexReplacementTest {

public static String censoredPhoneNumber(String phoneNumber)
{
String phonePattern = "(\\+{0,1}[\\d]+[-\\d]+\\d).*";

Pattern pattern = Pattern.compile(phonePattern);
Matcher matcher = pattern.matcher(phoneNumber);

return matcher.replaceAll("*censored*");
}

public static void main(String[] args)
{
String phoneNo = "My phone number is +6221-3011-9353";

System.out.println(censoredPhoneNumber(phoneNo));

}
}

Conclusion

This blog's article show how we use regular expression in two ways :
  1. to match a text pattern using String's matches() method.
  2. to replace string which match a text pattern with two helper classes, Pattern and Matcher.

Hope this article can help you to resolve text manipulation problem that you may have encountered.

Any comments to improve this article is greatly welcomed. Post your comment here or mail to feris@phi-integration.com.


2 comments:

JesperBlog said...

mirip yg di kuliah otomata ya? kl yg [a-z A-Z]+ aartinya kan semua huruf besa atau kecil dan spasi, minimal satu krakter hrs ada, soalnya pake + kl pake * buleh null.

kl yg ^\\+{0,1}[\\d]+[-\\d]+\\d$ ini nih suah bacanya, {} itu apa ya? kl d pasti number, trus kok dikasih \ ya??, yoweslah ntar aku blajar regex, soalnya skrng lg ga ngerjain project yg ada regexnya, slamat2, nice blog

Feris Thia said...

Hi Mas Imam,

Kalau {} itu untuk jumlah range repetisi. {0,1} berarti pengulangan sebanyak 0 sampai maksimal 1 kali.

Untuk tanda ^ di depan artinya pola harus diawali dari karakter sesudah karakter ^ ini.

Contoh :
^j+

cocok untuk semua yang berawalan 'j' kecil :
java, jjj, jruby, ....

Untuk $ kebalikannya, semua harus diakhiri karakter sebelum karakter ini.

Contoh:
.+9$

maka semua pola yang berakhiran '9' diterima:
Halo999, 789, ....

Untuk karakter \d => harus ditulis dengan \\d di string regex. Maka mewakili semua angka digital.

Semoga bisa memperjelas ya.

O iya, Terima kasih atas pujiannya untuk blog ini ;)

Regards,

Feris