.. topic:: Henry Spencer Regular Expressions :name: hsregexp This document describes the regular expressions supported by the implementation by Henry Spencer (the traditional package for LPMud). .. subtopic:: Options The following bitflag options modify the behaviour of the regular expressions - both interpretation and actual matching. .. todo:: is this truncated or incomplete? The efuns may understand additional options. RE_EXCOMPATIBLE If this bit is set, the pattern is interpreted as the UNIX ed editor would do it: () match literally, and the \( \) group expressions. .. subtopic:: Regular Expression Details A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern The quick brown fox matches a portion of a subject string that is identical to itself. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in some special way. There are two different sets of metacharacters: those that are recognized anywhere in the pattern except within square brackets, and those that are recognized in square brackets. Outside square brackets, the metacharacters are as follows: . Match any character. ^ Match begin of line. $ Match end of line. \< Match begin of word. \> Match end of word. \B not at edge of a word (supposed to be like the emacs compatibility one in gnu egrep) x|y Match regexp x or regexp y. () Match enclosed regexp like a 'simple' one (unless RE_EXCOMPATIBLE is set). x* Match any number (0 or more) of regexp x. x+ Match any number (1 or more) of regexp x. [..] Match one of the characters enclosed. [^ ..] Match none of the characters enclosed. The .. are to replaced by single characters or character ranges: [abc] matches a, b or c. [ab0-9] matches a, b or any digit. [^a-z] does not match any lowercase character. \c match character c even if it's one of the special characters. .. note:: The \< and \> metacharacters from Henry Spencers package are not available in PCRE, but can be emulate with \b, as required, also in conjunction with \W or \w. .. note:: In LDMud, backtracks are limited by the EVAL_COST runtime limit, to avoid freezing the driver with a match like ``regexp(({"=XX==================="}), "X(.+)+X")``. .. lore:: Authors: - Mark H. Colburn, :abbr:`NAPS` International (mark@jhereg.mn.org) - Henry Spencer, University of Torronto (henry@utzoo.edu) - Joern Rennecke - Ian Phillipps .. seealso:: :concept:`regexp`, :concept:`pcre`