Navigation: TextEd > Regular expressions >

Subpatterns

 

 

 

 

Subpatterns are delimited by parentheses (round brackets), which can be nested. Turning part of a pattern into a subpattern does two things:

1. It localizes a set of alternatives. For example, the pattern

 

  cat(aract|erpillar|)

 

matches "cataract", "caterpillar", or "cat". Without the parentheses, it would match "cataract", "erpillar" or an empty string.

2. It sets up the subpattern as a capturing subpattern. This means that, when the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller via the ovector argument of the matching function. (This applies only to the traditional matching functions; the DFA matching functions do not support capturing.)

 

Opening parentheses are counted from left to right (starting from 1) to obtain numbers for the capturing subpatterns. For example, if the string "the red king" is matched against the pattern

 

  the ((red|white) (king|queen))

 

the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3, respectively.

 

The fact that plain parentheses fulfil two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by a question mark and a colon, the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern

 

  the ((?:red|white) (king|queen))

 

the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of capturing subpatterns is 65535.

 

As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns

 

  (?i:saturday|sunday)

  (?:(?i)saturday|sunday)


match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".

 


 

Philip Hazel

University Computing Service

Cambridge CB2 3QH, England.

Last updated: 12 November 2013

Copyright © 1997-2013 University of Cambridge.


 


 


 


 

 

 

 

Copyright © 2024 Rickard Johansson