정규식
Regular Expression
정규표현
定規表現 규칙이 정해진 표현
아래 내용은 위키피디아 내용을 단순 짜집기하였다.(https://ko.wikipedia.org/wiki/정규_표현식)
목차
Syntax[편집 | 원본 편집]
Delemiters[편집 | 원본 편집]
'/'를 사용하는 경우가 있다.
Standards[편집 | 원본 편집]
POSIX basic and extended[편집 | 원본 편집]
Metacharacter | 기능 | Description |
---|---|---|
^
|
처음 | Matches the starting position within the string. In line-based tools, it matches the starting position of any line. |
.
|
문자 | Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
|
[ ]
|
문자 클래스 | A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z] .
The |
[^ ]
|
부정 | Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
|
$
|
끝 | Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. |
( )
|
하위식 | Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n ). A marked subexpression is also called a block or capturing group. BRE mode requires \( \) .
|
\n
|
일치하는 n번째 패턴 | Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Also known as a backreference. |
*
|
0회 이상 | Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
|
{m,n}
|
m회 이상 n회 이하 | Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires \{m,n\} .
|
POSIX extended[편집 | 원본 편집]
Metacharacter | 기능 | Description |
---|---|---|
?
|
0 또는 1회 | Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".
|
+
|
1회 이상 | Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
|
|
|
선택 | The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc|def matches "abc" or "def".
|
Character classes[편집 | 원본 편집]
POSIX | Non-standard | Perl/Tcl | Vim | Java | ASCII | Description |
---|---|---|---|---|---|---|
[:ascii:]
|
\p{ASCII}
|
[\x00-\x7F]
|
ASCII characters | |||
[:alnum:]
|
\p{Alnum}
|
[A-Za-z0-9]
|
Alphanumeric characters | |||
[:word:]
|
\w
|
\w
|
\w
|
[A-Za-z0-9_]
|
Alphanumeric characters plus "_" | |
\W
|
\W
|
\W
|
[^A-Za-z0-9_]
|
Non-word characters | ||
[:alpha:]
|
\a
|
\p{Alpha}
|
[A-Za-z]
|
Alphabetic characters | ||
[:blank:]
|
\s
|
\p{Blank}
|
[ \t]
|
Space and tab | ||
\b
|
\< \>
|
\b
|
(?<=\W)(?=\w)|(?<=\w)(?=\W)
|
Word boundaries | ||
\B
|
(?<=\W)(?=\W)|(?<=\w)(?=\w)
|
Non-word boundaries | ||||
[:cntrl:]
|
\p{Cntrl}
|
[\x00-\x1F\x7F]
|
Control characters | |||
[:digit:]
|
\d
|
\d
|
\p{Digit} or \d
|
[0-9]
|
Digits | |
\D
|
\D
|
\D
|
[^0-9]
|
Non-digits | ||
[:graph:]
|
\p{Graph}
|
[\x21-\x7E]
|
Visible characters | |||
[:lower:]
|
\l
|
\p{Lower}
|
[a-z]
|
Lowercase letters | ||
[:print:]
|
\p
|
\p{Print}
|
[\x20-\x7E]
|
Visible characters and the space character | ||
[:punct:]
|
\p{Punct}
|
[][!"#$%&'()*+,./:;<=>?@\^_`{|}~-]
|
Punctuation characters | |||
[:space:]
|
\s
|
\_s
|
\p{Space} or \s
|
[ \t\r\n\v\f]
|
Whitespace characters | |
\S
|
\S
|
\S
|
[^ \t\r\n\v\f]
|
Non-whitespace characters | ||
[:upper:]
|
\u
|
\p{Upper}
|
[A-Z]
|
Uppercase letters | ||
[:xdigit:]
|
\x
|
\p{XDigit}
|
[A-Fa-f0-9]
|
Hexadecimal digits |
Examples[편집 | 원본 편집]
메타문자 ;; 메타문자들의 열은 표현할 정규식을 지정한다. =~ m// ;; 펄에서 문자열을 '일치'시키려는 동작을 지정한다. =~ s/// ;; 펄에서 문자열을 '대체'시키려는 동작을 지정한다.
Metacharacter(s) | Description | Example |
---|---|---|
.
|
Normally matches any character except a newline.
Within square brackets the dot is literal. |
$string1 = "Hello World\n"; if ($string1 =~ m/...../) { print "$string1 has length >= 5.\n"; } Output: Hello World has length >= 5. |
( )
|
Groups a series of pattern elements to a single element.
When you match a pattern within parentheses, you can use any of |
$string1 = "Hello World\n"; if ($string1 =~ m/(H..).(o..)/) { print "We matched '$1' and '$2'.\n"; } Output: We matched 'Hel' and 'o W'. |
+
|
Matches the preceding pattern element one or more times. |
$string1 = "Hello World\n"; if ($string1 =~ m/l+/) { print "There are one or more consecutive letter \"l\"'s in $string1.\n"; } Output: There are one or more consecutive letter "l"'s in Hello World. |
?
|
Matches the preceding pattern element zero or one time. |
$string1 = "Hello World\n"; if ($string1 =~ m/H.?e/) { print "There is an 'H' and a 'e' separated by "; print "0-1 characters (e.g., He Hue Hee).\n"; } Output: There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee). |
?
|
Modifies the * , + , ? or {M,N} 'd regex that comes before to match as few times as possible.
|
$string1 = "Hello World\n"; if ($string1 =~ m/(l.+?o)/) { print "The non-greedy match with 'l' followed by one or "; print "more characters is 'llo' rather than 'llo Wo'.\n"; } Output: The non-greedy match with 'l' followed by one or more characters is 'llo' rather than 'llo Wo'. |
*
|
Matches the preceding pattern element zero or more times. |
$string1 = "Hello World\n"; if ($string1 =~ m/el*o/) { print "There is an 'e' followed by zero to many "; print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n"; } Output: There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo). |
{M,N}
|
Denotes the minimum M and the maximum N match count.
N can be omitted and M can be 0:
|
$string1 = "Hello World\n"; if ($string1 =~ m/l{1,2}/) { print "There exists a substring with at least 1 "; print "and at most 2 l's in $string1\n"; } Output: There exists a substring with at least 1 and at most 2 l's in Hello World |
[…]
|
Denotes a set of possible character matches. |
$string1 = "Hello World\n"; if ($string1 =~ m/[aeiou]+/) { print "$string1 contains one or more vowels.\n"; } Output: Hello World contains one or more vowels. |
|
|
Separates alternate possibilities. |
$string1 = "Hello World\n"; if ($string1 =~ m/(Hello|Hi|Pogo)/) { print "$string1 contains at least one of Hello, Hi, or Pogo."; } Output: Hello World contains at least one of Hello, Hi, or Pogo. |
\b
|
Matches a zero-width boundary between a word-class character (see next) and either a non-word class character or an edge; same as
|
$string1 = "Hello World\n"; if ($string1 =~ m/llo\b/) { print "There is a word that ends with 'llo'.\n"; } Output: There is a word that ends with 'llo'. |
\w
|
Matches an alphanumeric character, including "_";
same as
in Unicode, where the |
$string1 = "Hello World\n"; if ($string1 =~ m/\w/) { print "There is at least one alphanumeric "; print "character in $string1 (A-Z, a-z, 0-9, _).\n"; } Output: There is at least one alphanumeric character in Hello World (A-Z, a-z, 0-9, _). |
\W
|
Matches a non-alphanumeric character, excluding "_";
same as
in Unicode. |
$string1 = "Hello World\n"; if ($string1 =~ m/\W/) { print "The space between Hello and "; print "World is not alphanumeric.\n"; } Output: The space between Hello and World is not alphanumeric. |
\s
|
Matches a whitespace character,
which in ASCII are tab, line feed, form feed, carriage return, and space; in Unicode, also matches no-break spaces, next line, and the variable-width spaces (amongst others). |
$string1 = "Hello World\n"; if ($string1 =~ m/\s.*\s/) { print "In $string1 there are TWO whitespace characters, which may"; print " be separated by other characters.\n"; } Output: In Hello World there are TWO whitespace characters, which may be separated by other characters. |
\S
|
Matches anything but a whitespace. |
$string1 = "Hello World\n"; if ($string1 =~ m/\S.*\S/) { print "In $string1 there are TWO non-whitespace characters, which"; print " may be separated by other characters.\n"; } Output: In Hello World there are TWO non-whitespace characters, which may be separated by other characters. |
\d
|
Matches a digit;
same as in Unicode, same as the |
$string1 = "99 bottles of beer on the wall."; if ($string1 =~ m/(\d+)/) { print "$1 is the first number in '$string1'\n"; } Output: 99 is the first number in '99 bottles of beer on the wall.' |
\D
|
Matches a non-digit;
same as |
$string1 = "Hello World\n"; if ($string1 =~ m/\D/) { print "There is at least one character in $string1"; print " that is not a digit.\n"; } Output: There is at least one character in Hello World that is not a digit. |
^
|
Matches the beginning of a line or string. |
$string1 = "Hello World\n"; if ($string1 =~ m/^He/) { print "$string1 starts with the characters 'He'.\n"; } Output: Hello World starts with the characters 'He'. |
$
|
Matches the end of a line or string. |
$string1 = "Hello World\n"; if ($string1 =~ m/rld$/) { print "$string1 is a line or string "; print "that ends with 'rld'.\n"; } Output: Hello World is a line or string that ends with 'rld'. |
\A
|
Matches the beginning of a string (but not an internal line). |
$string1 = "Hello\nWorld\n"; if ($string1 =~ m/\AH/) { print "$string1 is a string "; print "that starts with 'H'.\n"; } Output: Hello World is a string that starts with 'H'. |
\z
|
Matches the end of a string (but not an internal line). |
$string1 = "Hello\nWorld\n"; if ($string1 =~ m/d\n\z/) { print "$string1 is a string "; print "that ends with 'd\\n'.\n"; } Output: Hello World is a string that ends with 'd\n'. |
[^…]
|
Matches every character except the ones inside brackets. |
$string1 = "Hello World\n"; if ($string1 =~ m/[^abc]/) { print "$string1 contains a character other than "; print "a, b, and c.\n"; } Output: Hello World contains a character other than a, b, and c. |