Easy to Learn Java: Programming Articles, Examples and Tips

Start with Java in a few days with Java Lessons or Lectures

Home

Code Examples

Java Tools

More Java Tools!

Java Forum

All Java Tips

Books

Submit News
Search the site here...
Search...
 
Search the JavaFAQ.nu
1000 Java Tips ebook

1000 Java Tips - Click here for the high resolution copy!1000 Java Tips - Click here for the high resolution copy!

Java Screensaver, take it here

Free "1000 Java Tips" eBook is here! It is huge collection of big and small Java programming articles and tips. Please take your copy here.

Take your copy of free "Java Technology Screensaver"!.

Regex Language Intro

JavaFAQ Home » Text processing Go to all tips in Text processing


Bookmark and Share

Regular expressions are a programming language for describing patterns in strings. At the syntax level, it's important to understand which characters are metacharacters (have a special meaning), and which are literal characters (stand for themselves). At the symantic level, several basic concepts are important: character classes, quantifiers, boundaries, grouping, and alternation. These fundamental regex elements apply to all implemenations, and will solve most or your regex needs.

Metacharacters

The characters that have special meaning are called metacharacters. A preceding backslash ("") turns a metachacter into a literal character. The set of metacharacters in character classes, ie between [ and ], is different.

Char Meaning
Turns metacharacters into literal characters, and literal characters into metacharacters. Because this is also the Java escape character in strings, it must be doubled.
[ Starts character class definition.
( Starts a group.
{ Encloses repetition count. {min, max}
^ Matches boundary at beginning. Class negation when immediately after [.
$ Matches boundary at end.
. Matches any single character.
? Preceding element must match zero or one time.
* Preceding element must match zero or more times.
+ Preceding element must match one or more times.
| Either preceding or following element must match.

Boundaries

A boundary is the position between two characters or at the beginning or end. The two most commonly used boundaries are ^ (matches at beginning) and $ (matches at end).

Code Meaning
^ Beginning of a line.
$ End of a line.
A Beginning of the input.
z End of the input.
 End of input, ignoring final terminator, if any.
G End of the previous match (to indicate where new match should start.

Character classes

A character class defines a set of characters. It matchs exactly one character unless it is followed by a quantifier specifying how many.

Predefined character classes

Notice the uppercase class is the negation of the lowercase class.

Code Matches
. Any character.
d A digit. Same as [0-9]
D A non-digit. Same as [^0-9] or [^d]
s A whitespace character. Same as [ x0Bf ]
S A non-whitespace character. Same as [^s]
w A "word" character. Same as [a-zA-Z0-9_] includes underscore, which not all regex libraries do. It does NOT include the non-ASCII Unicode characters (See below).
p{L} Unicode letters.
W A non-word character. Same as [^w]

Quantifiers

An element, X, which may be a literal character, a character class, or a group, may be followed by a quantifier, which indicates how often it should be matched.

Quantifiers are classified as greedy or lazy. Greedy quantifiers try to match as much as possible, and reduce the amount they match only if forced to by later failures. Lazy quantifiers match as little as possible, and only expand if required by a later failure. Unlike most regex libraries, Java supports possesive quantifiers, which are not only greedy, but won't give back anything they've matched. They can provide a speed advantage in some circumstances.

Code Meaning
X? X must match zero or one time. Greedy.
X* X must match zero or more times. Greedy.
X+ X must match one or more times. Greedy.
X{n} X must match n times.
X{n,} X must match at least n times. Greedy.
X{n, m} X must match at least n times, but no more than m times. Greedy.
X?? X must match zero or one time. Lazy.
X*? X must match zero or more times. Lazy.
X+? X must match one or more times. Lazy.
X{n,}? X must match at least n times. Lazy.
X{n, m}? X must match at least n times, but no more than m times. Lazy.

Grouping

Code Meaning
(X) This matches X as usual, and it also records the beginning and end of the substring that X matches. This forms a group that can be used in one of three ways:
  • Matcher methods can be called to get the number of groups, a particular group by number, or the beginning and end character index of any group.
  • Back references can be made inside a pattern to match previous groups that were matched. These references are of the form , where n is the number of a previous group.
  • Matcher appendReplacement() method may reference groups in the replacement string using $n,

Group 0 is the entire match. For other groups, the number of the group corresponds to the number of the left parenthesis in the regex when counting from the left, starting at one.

The group includes only the last repetition caused by quantifiers. Enclose the quantifiers in a group if you want the repeations in one group.

Alternation

Code Meaning
X|Y Tries to match X. If that fails, it tries to match Y.

 Printer Friendly Page  Printer Friendly Page
 Send to a Friend  Send to a Friend

.. Bookmark and Share

Search here again if you need more info!
Custom Search



Home Code Examples Java Forum All Java Tips Books Submit News, Code... Search... Offshore Software Tech Doodling

RSS feed Java FAQ RSS feed Java FAQ News     

    RSS feed Java Forums RSS feed Java Forums

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest 1999-2006 by Java FAQs Daily Tips.

Interactive software released under GNU GPL, Code Credits, Privacy Policy