Public Member Functions | Protected Member Functions | Private Attributes | List of all members
QRegExp Class Reference

The QRegExp class provides pattern matching using regular expressions or wildcards. More...

#include <qregexp.h>

Public Member Functions

 QRegExp ()
 
 QRegExp (const QCString &, bool caseSensitive=TRUE, bool wildcard=FALSE)
 
 QRegExp (const QRegExp &)
 
 ~QRegExp ()
 
QRegExpoperator= (const QRegExp &)
 
QRegExpoperator= (const QCString &pattern)
 
bool operator== (const QRegExp &) const
 
bool operator!= (const QRegExp &r) const
 
bool isEmpty () const
 
bool isValid () const
 
bool caseSensitive () const
 
void setCaseSensitive (bool)
 
bool wildcard () const
 
void setWildcard (bool)
 
QCString pattern () const
 
void setPattern (const QCString &pattern)
 
int match (const QCString &str, int index=0, int *len=0, bool indexIsStart=TRUE) const
 
int find (const QCString &str, int index)
 

Protected Member Functions

void compile ()
 
const char * matchstr (uint *, const char *, uint, const char *) const
 

Private Attributes

QCString rxstring
 
uintrxdata
 
int error
 
bool cs
 
bool wc
 

Detailed Description

The QRegExp class provides pattern matching using regular expressions or wildcards.

QRegExp knows these regexp primitives:

In wildcard mode, it only knows four primitives:

QRegExp supports Unicode both in the pattern strings and in the strings to be matched.

When writing regular expressions in C++ code, remember that C++ processes \ characters. So in order to match e.g. a "." character, you must write "\\." in C++ source, not "\.".

A character set matches a defined set of characters. For example, [BSD] matches any of 'B', 'D' and 'S'. Within a character set, the special characters '.', '*', '?', '^', '$', '+' and '[' lose their special meanings. The following special characters apply:

Thus, [a-zA-Z0-9.] matches upper and lower case ASCII letters, digits and dot; and [^] matches everything except white space.

Bug:
Case insensitive matching is not supported for non-ASCII/Latin1 (non-8bit) characters. Any character with a non-zero QChar.row() is matched case sensitively even if the QRegExp is in case insensitive mode.
Note
In Qt 3.0, the language of regular expressions will contain five more special characters, namely '(', ')', '{', '|' and '}'. To ease porting, it's a good idea to escape these characters with a backslash in all the regular expressions you'll write from now on.

Definition at line 46 of file qregexp.h.

Constructor & Destructor Documentation

QRegExp::QRegExp ( )

Constructs an empty regular expression.

Definition at line 174 of file qregexp.cpp.

175 {
176  rxdata = 0;
177  cs = TRUE;
178  wc = FALSE;
179  error = PatOk;
180 }
const bool FALSE
Definition: qglobal.h:370
error
Definition: include.cc:26
uint * rxdata
Definition: qregexp.h:85
const int PatOk
Definition: qregexp.cpp:160
bool cs
Definition: qregexp.h:87
const bool TRUE
Definition: qglobal.h:371
QRegExp::QRegExp ( const QCString pattern,
bool  caseSensitive = TRUE,
bool  wildcard = FALSE 
)

Constructs a regular expression.

  • pattern is the regular expression pattern string.
  • caseSensitive specifies whether or not to use case sensitive matching.
  • wildcard specifies whether the pattern string should be used for wildcard matching (also called globbing expression), normally used for matching file names.
See also
setWildcard()

Definition at line 195 of file qregexp.cpp.

196 {
197  rxstring = pattern;
198  rxdata = 0;
199  cs = caseSensitive;
200  wc = wildcard;
201  compile();
202 }
void compile()
Definition: qregexp.cpp:914
QCString rxstring
Definition: qregexp.h:84
uint * rxdata
Definition: qregexp.h:85
bool wildcard() const
Definition: qregexp.h:66
bool caseSensitive() const
Definition: qregexp.h:63
bool cs
Definition: qregexp.h:87
QCString pattern() const
Definition: qregexp.h:69
QRegExp::QRegExp ( const QRegExp r)

Constructs a regular expression which is a copy of r.

See also
operator=(const QRegExp&)

Definition at line 209 of file qregexp.cpp.

210 {
211  rxstring = r.pattern();
212  rxdata = 0;
213  cs = r.caseSensitive();
214  wc = r.wildcard();
215  compile();
216 }
void compile()
Definition: qregexp.cpp:914
QCString rxstring
Definition: qregexp.h:84
uint * rxdata
Definition: qregexp.h:85
bool wildcard() const
Definition: qregexp.h:66
bool caseSensitive() const
Definition: qregexp.h:63
bool cs
Definition: qregexp.h:87
QCString pattern() const
Definition: qregexp.h:69
QRegExp::~QRegExp ( )

Destructs the regular expression and cleans up its internal data.

Definition at line 222 of file qregexp.cpp.

223 {
224  if ( rxdata ) // Avoid purify complaints
225  delete [] rxdata;
226 }
uint * rxdata
Definition: qregexp.h:85

Member Function Documentation

bool QRegExp::caseSensitive ( ) const
inline

Returns TRUE if case sensitivity is enabled, otherwise FALSE. The default is TRUE.

See also
setCaseSensitive()

Definition at line 63 of file qregexp.h.

63 { return cs; }
bool cs
Definition: qregexp.h:87
void QRegExp::compile ( )
protected

Definition at line 914 of file qregexp.cpp.

915 {
916  if ( rxdata ) { // delete old data
917  delete [] rxdata;
918  rxdata = 0;
919  }
920  if ( rxstring.isEmpty() ) { // no regexp pattern set
921  error = PatNull;
922  return;
923  }
924 
925  error = PatOk; // assume pattern is ok
926 
928  if ( wc )
929  pattern = wc2rx(rxstring);
930  else
931  pattern = rxstring;
932  const char *start = pattern.data(); // pattern pointer
933  const char *p = start; // pattern pointer
934  uint pl = pattern.length();
935  uint *d = rxarray; // data pointer
936  uint *prev_d = 0;
937 
938 #define GEN(x) *d++ = (x)
939 
940  while ( pl ) {
941  char ch = (char)*p;
942  switch ( ch ) {
943 
944  case '^': // beginning of line
945  prev_d = d;
946  GEN( p == start ? BOL : (CHR | ch) );
947  p++;
948  pl--;
949  break;
950 
951  case '$': // end of line
952  prev_d = d;
953  GEN( pl == 1 ? EOL : (CHR | ch) );
954  p++;
955  pl--;
956  break;
957 
958  case '.': // any char
959  prev_d = d;
960  GEN( ANY );
961  p++;
962  pl--;
963  break;
964 
965  case '[': // character class
966  {
967  prev_d = d;
968  p++;
969  pl--;
970  if ( !pl ) {
971  error = PatSyntax;
972  return;
973  }
974  bool firstIsEscaped = ( (char)*p == '\\' );
975  uint cch = char_val( &p, &pl );
976  if ( cch == '^' && !firstIsEscaped ) { // negate!
977  GEN( CCN );
978  if ( !pl ) {
979  error = PatSyntax;
980  return;
981  }
982  cch = char_val( &p, &pl );
983  } else {
984  GEN( CCL );
985  }
986  uint numFields = 0;
987  while ( pl ) {
988  if ((pl>2) && ((char)*p == '-') && ((char)*(p+1) != ']')) {
989  // Found a range
990  char_val( &p, &pl ); // Read the '-'
991  uint cch2 = char_val( &p, &pl ); // Read the range end
992  if ( cch > cch2 ) { // swap start and stop
993  int tmp = cch;
994  cch = cch2;
995  cch2 = tmp;
996  }
997  GEN( (cch << 16) | cch2 ); // from < to
998  numFields++;
999  }
1000  else {
1001  // Found a single character
1002  if ( cch & MCD ) // It's a code; will not be mistaken
1003  GEN( cch ); // for a range, since from > to
1004  else
1005  GEN( (cch << 16) | cch ); // from == to range
1006  numFields++;
1007  }
1008  if ( d >= rxarray + maxlen ) { // pattern too long
1009  error = PatOverflow;
1010  return;
1011  }
1012  if ( !pl ) { // At least ']' should be left
1013  error = PatSyntax;
1014  return;
1015  }
1016  bool nextIsEscaped = ( (char)*p == '\\' );
1017  cch = char_val( &p, &pl );
1018  if ( cch == (uint)']' && !nextIsEscaped )
1019  break;
1020  if ( !pl ) { // End, should have seen ']'
1021  error = PatSyntax;
1022  return;
1023  }
1024  }
1025  *prev_d |= numFields; // Store number of fields
1026  }
1027  break;
1028 
1029  case '*': // Kleene closure, or
1030  case '+': // positive closure, or
1031  case '?': // optional closure
1032  {
1033  if ( prev_d == 0 ) { // no previous expression
1034  error = PatSyntax; // empty closure
1035  return;
1036  }
1037  switch ( *prev_d ) { // test if invalid closure
1038  case BOL:
1039  case BOW:
1040  case EOW:
1041  case CLO:
1042  case OPT:
1043  error = PatSyntax;
1044  return;
1045  }
1046  int ddiff = (int)(d - prev_d);
1047  if ( *p == '+' ) { // convert to Kleene closure
1048  if ( d + ddiff >= rxarray + maxlen ) {
1049  error = PatOverflow; // pattern too long
1050  return;
1051  }
1052  memcpy( d, prev_d, ddiff*sizeof(uint) );
1053  d += ddiff;
1054  prev_d += ddiff;
1055  }
1056  memmove( prev_d+1, prev_d, ddiff*sizeof(uint) );
1057  *prev_d = ch == '?' ? OPT : CLO;
1058  d++;
1059  GEN( END );
1060  p++;
1061  pl--;
1062  }
1063  break;
1064 
1065  default:
1066  {
1067  prev_d = d;
1068  uint cv = char_val( &p, &pl );
1069  if ( cv & MCD ) { // It's a code
1070  GEN( cv );
1071  }
1072  else {
1073  if ( !cs && cv <= 0xff ) // #only 8bit support
1074  cv = tolower( cv );
1075  GEN( CHR | cv );
1076  }
1077  }
1078  }
1079  if ( d >= rxarray + maxlen ) { // oops!
1080  error = PatOverflow; // pattern too long
1081  return;
1082  }
1083  }
1084  GEN( END );
1085  int len = (int)(d - rxarray);
1086  rxdata = new uint[ len ]; // copy from rxarray to rxdata
1087  CHECK_PTR( rxdata );
1088  memcpy( rxdata, rxarray, len*sizeof(uint) );
1089 #if defined(DEBUG)
1090  //dump( rxdata ); // uncomment this line for debugging
1091 #endif
1092 }
#define GEN(x)
const uint END
Definition: qregexp.cpp:138
static uint char_val(const char **str, uint *strlength)
Definition: qregexp.cpp:749
bool isEmpty() const
Definition: qcstring.h:189
static const int maxlen
Definition: qregexp.cpp:904
uint length() const
Definition: qcstring.h:195
const uint CLO
Definition: qregexp.cpp:149
const uint BOL
Definition: qregexp.cpp:144
error
Definition: include.cc:26
QCString rxstring
Definition: qregexp.h:84
const uint EOL
Definition: qregexp.cpp:145
uint * rxdata
Definition: qregexp.h:85
const int PatOverflow
Definition: qregexp.cpp:163
const uint BOW
Definition: qregexp.cpp:146
const uint CHR
Definition: qregexp.cpp:143
const uint CCN
Definition: qregexp.cpp:142
p
Definition: test.py:223
const int PatOk
Definition: qregexp.cpp:160
const char * data() const
Definition: qcstring.h:207
string tmp
Definition: languages.py:63
const int PatNull
Definition: qregexp.cpp:161
const uint ANY
Definition: qregexp.cpp:148
const uint MCD
Definition: qregexp.cpp:153
static QCString wc2rx(const QCString &pattern)
Definition: qregexp.cpp:707
#define CHECK_PTR(p)
Definition: qglobal.h:601
static uint rxarray[maxlen]
Definition: qregexp.cpp:905
const uint CCL
Definition: qregexp.cpp:141
bool cs
Definition: qregexp.h:87
unsigned uint
Definition: qglobal.h:351
QCString pattern() const
Definition: qregexp.h:69
const int PatSyntax
Definition: qregexp.cpp:162
const uint EOW
Definition: qregexp.cpp:147
const uint OPT
Definition: qregexp.cpp:150
int QRegExp::find ( const QCString str,
int  index 
)
inline

Attempts to match in str, starting from position index. Returns the position of the match, or -1 if there was no match.

See also
match()

Definition at line 76 of file qregexp.h.

77  { return match( str, index ); }
int match(const QCString &str, int index=0, int *len=0, bool indexIsStart=TRUE) const
Definition: qregexp.cpp:649
bool QRegExp::isEmpty ( ) const
inline

Returns TRUE if the regexp is empty.

Definition at line 60 of file qregexp.h.

60 { return rxdata == 0; }
uint * rxdata
Definition: qregexp.h:85
bool QRegExp::isValid ( ) const
inline

Returns TRUE if the regexp is valid, or FALSE if it is invalid.

The pattern "[a-z" is an example of an invalid pattern, since it lacks a closing bracket.

Definition at line 61 of file qregexp.h.

61 { return error == 0; }
error
Definition: include.cc:26
int QRegExp::match ( const QCString str,
int  index = 0,
int *  len = 0,
bool  indexIsStart = TRUE 
) const

Attempts to match in str, starting from position index. Returns the position of the match, or -1 if there was no match.

If len is not a null pointer, the length of the match is stored in *len.

If indexIsStart is TRUE (the default), the position index in the string will match the start-of-input primitive (^) in the regexp, if present. Otherwise, position 0 in str will match.

Example:

QRegExp r("[0-9]*\\.[0-9]+"); // matches floating point
int len;
r.match("pi = 3.1416", 0, &len); // returns 5, len == 6
Note
In Qt 3.0, this function will be replaced by find().

Definition at line 649 of file qregexp.cpp.

651 {
652  if ( !isValid() || isEmpty() )
653  return -1;
654  if ( str.length() < (uint)index )
655  return -1;
656  const char *start = str.data();
657  const char *p = start + index;
658  uint pl = str.length() - index;
659  uint *d = rxdata;
660  int ep = -1;
661 
662  if ( *d == BOL ) { // match from beginning of line
663  ep = matchstring( d, p, pl, indexIsStart ? p : start, cs );
664  } else {
665  if ( *d & CHR ) {
666  char c = *d;
667  if ( !cs /*&& !c.row()*/ ) { // case sensitive, # only 8bit
668  while ( pl && ( /*p->row() ||*/ tolower(*p) != c ) ) {
669  p++;
670  pl--;
671  }
672  } else { // case insensitive
673  while ( pl && *p != c ) {
674  p++;
675  pl--;
676  }
677  }
678  }
679  while( 1 ) { // regular match
680  ep = matchstring( d, p, pl, indexIsStart ? start+index : start, cs );
681  if ( ep >= 0 )
682  break;
683  if ( !pl )
684  break;
685  p++;
686  pl--;
687  }
688  }
689  if ( len )
690  *len = ep >= 0 ? ep : 0; // No match -> 0, for historical reasons
691  return ep >= 0 ? (int)(p - start) : -1; // return index;
692 }
static int matchstring(uint *rxd, const char *str, uint strlength, const char *bol, bool cs)
Definition: qregexp.cpp:405
uint length() const
Definition: qcstring.h:195
const uint BOL
Definition: qregexp.cpp:144
bool isEmpty() const
Definition: qregexp.h:60
uint * rxdata
Definition: qregexp.h:85
const uint CHR
Definition: qregexp.cpp:143
p
Definition: test.py:223
const char * data() const
Definition: qcstring.h:207
bool cs
Definition: qregexp.h:87
bool isValid() const
Definition: qregexp.h:61
unsigned uint
Definition: qglobal.h:351
const char* QRegExp::matchstr ( uint ,
const char *  ,
uint  ,
const char *   
) const
protected
bool QRegExp::operator!= ( const QRegExp r) const
inline

Returns TRUE if this regexp is not equal to r.

See also
operator==()

Definition at line 57 of file qregexp.h.

58  { return !(this->operator==(r)); }
bool operator==(const QRegExp &) const
Definition: qregexp.cpp:265
QRegExp & QRegExp::operator= ( const QRegExp r)

Copies the regexp r and returns a reference to this regexp. The case sensitivity and wildcard options are copied, as well.

Definition at line 233 of file qregexp.cpp.

234 {
235  rxstring = r.rxstring;
236  cs = r.cs;
237  wc = r.wc;
238  compile();
239  return *this;
240 }
bool wc
Definition: qregexp.h:88
void compile()
Definition: qregexp.cpp:914
QCString rxstring
Definition: qregexp.h:84
bool cs
Definition: qregexp.h:87
QRegExp & QRegExp::operator= ( const QCString pattern)

Consider using setPattern() instead of this method.

Sets the pattern string to pattern and returns a reference to this regexp. The case sensitivity or wildcard options do not change.

Definition at line 250 of file qregexp.cpp.

251 {
252  rxstring = pattern;
253  compile();
254  return *this;
255 }
void compile()
Definition: qregexp.cpp:914
QCString rxstring
Definition: qregexp.h:84
QCString pattern() const
Definition: qregexp.h:69
bool QRegExp::operator== ( const QRegExp r) const

Returns TRUE if this regexp is equal to r.

Two regexp objects are equal if they have equal pattern strings, case sensitivity options and wildcard options.

Definition at line 265 of file qregexp.cpp.

266 {
267  return rxstring == r.rxstring && cs == r.cs && wc == r.wc;
268 }
bool wc
Definition: qregexp.h:88
QCString rxstring
Definition: qregexp.h:84
bool cs
Definition: qregexp.h:87
QCString QRegExp::pattern ( ) const
inline

Returns the pattern string of the regexp.

Definition at line 69 of file qregexp.h.

69 { return rxstring; }
QCString rxstring
Definition: qregexp.h:84
void QRegExp::setCaseSensitive ( bool  enable)

Enables or disables case sensitive matching.

In case sensitive mode, "a.e" matches "axe" but not "Axe".

See also: caseSensitive()

Definition at line 335 of file qregexp.cpp.

336 {
337  if ( cs != enable ) {
338  cs = enable;
339  compile();
340  }
341 }
void compile()
Definition: qregexp.cpp:914
bool cs
Definition: qregexp.h:87
void QRegExp::setPattern ( const QCString pattern)
inline

Sets the pattern string to pattern and returns a reference to this regexp. The case sensitivity or wildcard options do not change.

Definition at line 71 of file qregexp.h.

72  { operator=( pattern ); }
QRegExp & operator=(const QRegExp &)
Definition: qregexp.cpp:233
void QRegExp::setWildcard ( bool  wildcard)

Sets the wildcard option for the regular expression. The default is FALSE.

Setting wildcard to TRUE makes it convenient to match filenames instead of plain text.

For example, "qr*.cpp" matches the string "qregexp.cpp" in wildcard mode, but not "qicpp" (which would be matched in normal mode).

See also
wildcard()

Definition at line 310 of file qregexp.cpp.

311 {
312  if ( wildcard != wc ) {
313  wc = wildcard;
314  compile();
315  }
316 }
void compile()
Definition: qregexp.cpp:914
bool wildcard() const
Definition: qregexp.h:66
bool QRegExp::wildcard ( ) const
inline

Returns TRUE if wildcard mode is on, otherwise FALSE.

See also
setWildcard().

Definition at line 66 of file qregexp.h.

66 { return wc; }
bool wc
Definition: qregexp.h:88

Member Data Documentation

bool QRegExp::cs
private

Definition at line 87 of file qregexp.h.

int QRegExp::error
private

Definition at line 86 of file qregexp.h.

uint* QRegExp::rxdata
private

Definition at line 85 of file qregexp.h.

QCString QRegExp::rxstring
private

Definition at line 84 of file qregexp.h.

bool QRegExp::wc
private

Definition at line 88 of file qregexp.h.


The documentation for this class was generated from the following files: