IOS regular expression

IOS in the use of regular expressions will have to mention NSRegularExpression, so we need to understand what is NSRegularExpression, to develop a good habit of first check the documents: NSRegularExpression documents, see the document in the first paragraph:

Original:

The NSRegularExpression class is used to represent and apply regular expressions to Unicode strings. An instance of this class is an immutable representation of a compiled regular expression pattern and various option flags. The pattern syntax currently supported is that specified by ICU. The ICU regular expressions are described at http://userguide.icu-project.org/strings/regexp.

Probably not very accurate, but that’s probably what it means. Suggest that you look at the original :

NSRegularExpression is a class that uses Unicode strings to express and apply regular expressions. An instance of this class represents an immutable regular expression. Currently supported syntax is ICU. ICU regular expressions in the specific description: http://userguide.icu-project.org/strings/regexp

The first paragraph gives us 4 messages:

  • 1) NSRegularExpression support is the Unicode character, on behalf of all characters can be entered.
  • 2 (NSRegularExpression) the expression content of the instance and the identity of the option are not changed. View NSRegularExpression’s.H file directly:
@interface NSRegularExpression: NSObject < NSCopying, NSSecureCoding>... (nullable * NSRegularExpression) + regularExpressionWithPattern: (NSString *) pattern options: (NSRegularExpressionOptions) options error: (NSError) error; (nullable instancetype) - initWithPattern: (NSString *) pattern options: (NSRegularExpressionOptions) options error: (NSError) error NS_DESIGNATED_INITIALIZER @property (readonly, copy; NSString *pattern @property (readonly); NSRegularExpressionOptions options; @end)...

As can be seen from the.H file, the properties of pattern and options is modified with readonly on behalf of the read-only non assignable, in the initialization can be assigned.

  • 3) NSRegularExpression supported ICU regular expression syntax.
  • 4) ICU regular expression detailed address: http://userguide.icu-project.org/strings/regexp (this is the wall, to the wall.)

ICU Perl is a regular expression based on regular expressions, regular expressions and the Perl is Version 8 Regular based on Expressions, interested can refer to: ICU Regular Expressions, Perl Regular Expressions Version Regular Expressions, 8, said here is ominous.

Flag Options

See the Flag Options in the document first, identify the options. Following table:

Flag Options describe
I Case insensitive
X Allow spaces and comments
S Matches the line terminator, which is not matched by default
M ^ and $match the beginning and end of each line, by default it only matches the beginning and end of the text.
Para If set, the word boundaries are found in the text boundary of Unicode UAX 29. By default, the word boundaries are simply classified as “word” or “non-word” by the character, which is similar to that of the traditional regular expression. The results obtained with two choices can be completely different in space and other non word characters.

Let’s take a look at NSRegularExpressionOptions:

Typedef NS_OPTIONS (NSUInteger, NSRegularExpressionOptions) {NSRegularExpressionCaseInsensitive < = 1; < 0, Match letters in the pattern independent / of case. / NSRegularExpressionAllowCommentsAndWhitespace < = 1; < 1, Ignore whitespace and #-prefixed comments in / the pattern. / NSRegularExpressionIgnoreMetacharacters < = 1; < 2, Treat the entire pattern as a / literal string. < / NSRegularExpressionDotMatchesLineSeparators = 1; < 3, Allow. To match any character / including line separators. / NSRegularExpressionAnchorsMatchLines, < = 1; < 4, Allow / * ^ and $to match the start and end of lines. * / NSRegularExpressionUseUnixLineSeparators < = 1; < 5, Treat only /n as a line / separator (otherwise, all standard line separators are used). * / NSRegularExpressionUseUnicodeWordBoundaries = 1 < < 6; Use Unicode TR#29 to specify word / boundaries (otherwise, traditional regular expression word boundaries are used * /});

In fact, the above notes have been very clear.

Typedef NS_OPTIONS (NSUInteger, NSRegularExpressionOptions) {NSRegularExpressionCaseInsensitive < = 1; < / * 0, matching is case insensitive < * / NSRegularExpressionAllowCommentsAndWhitespace = 1; < 1, regular and # / * ignore whitespace annotated content * / NSRegularExpressionIgnoreMetacharacters = 1 < 2, < / * all patter as ordinary string, for example $/[]: (NSRegularExpressionDotMatchesLineSeparators) +*^.|*/ = 1 < 3, < / * wildcards can match any character, this model cannot match a newline character wildcard. * / NSRegularExpressionAnchorsMatchLines = 1, < 4, < / * ^ $,, all the lines begin and end with * / NSRegularExpressionUseUnixLineSeparators = 1 < < 5, / * row delimiter only /n (otherwise, the line break all standard match, such as /r and newline) * / NSRegularExpressionUseUnicodeWordBoundaries = 1 < < 6; Unicode TR#29 / * use prescribed boundary. (otherwise, regular expressions use the traditional word boundary) * /};

Regular Expression Metacharacters (regular expression element)

The character table is as follows:

Character expression describe
/a Match Bell (ring), /u0007, ASCII table seventh
/A Match the beginning of the text, and the difference is different can not match the beginning of the line. NSRegularExpressionAnchorsMatchLines can match the beginning of each line under the conditions, the same is also the same as $
/b Matching the front of the word, or behind, for example: the matching string for this is a test code, the regular t/b or /bt, the former matches the end of the T, which matches the beginning of the T, you can understand that the /b represents the boundary. Separators can be special characters or chinese.
/B This is the case: Match if current position not a word boundary., meaning should be: if the current location is not a boundary to match the is. It is easy to understand that the string of /B modifiers is not on the border, but that is not the case. /B can be understood as non boundary, such as strings to be matched are: test, TES, EST, we need to match the ES in head nor tail of the string, the regular corresponding should be: /Bes/B, if the regular /Bes will match: Test and tes, es/B will be matched to the regular: Test and EST
/cX Matching control characters, such as /cM matching control-M or enter, click to view more control characters
/d Matching 10 hexadecimal number, 0~9
/D Matching non binary number 10
/E With the use of /Q, /Q at the beginning of the end of /E, the middle of the string will be treated as an ordinary string. For example: /Q$/E equivalent / $
/e Matching space, /u001B
/f Matching /u000F
/n Matching /u000A
/G Continuous matching, from the current position, the first to start, always does not match the local end, such as matching the string test1test, regular [a-z] can match the 8 characters, if the regular /Gtest can match the 4, if the strings to be matched: 1test1test, regular: /Gtest will match any string, regular: 1/Gtest can be matched to 4 characters
/N{UNICODE CHARACTER NAME} Match the character. (specific role and use of unknown) named…. )
/p{UNICODE PROPERTY NAME} Matches the specified UNICODE property name of the character, such as: Property Name: Lu represents the capital letter, to be matched string: Test, regular: /p{Lu}, the matching result is T. More attribute names
/P{UNICODE PROPERTY NAME} Matches the character that does not specify the UNICODE property name, such as the property name: Lu represents the capital letter, the string to be matched: Test, regular: /P{Lu}, the matching result is e, s, t. More attribute names
/Q With the use of /E, /Q at the beginning of the end of /E, the middle of the string will be treated as an ordinary string. For example: /Q$/E equivalent / $
/r Change character, /u000D.
/t Tab, /u0009
/s Blank string, [/t/n/f/r/p{Z}]
/S Non empty string, [^/t/n/f/r/p{Z}]
/uhhhh 16 hexadecimal hhhh string
/uhhhhhhhh 16 hexadecimal hhhhhhhh string
/w Non character
/W character
/x{hhhh} 16 hexadecimal value of hhhh characters
/xhh 16 hexadecimal value of HH characters
/X Official document explained: Match a Grapheme Cluster.
/Z The end of the input, the original text: Match if current position at the of input, but before final line terminator, if one exists. the end is.
/z The end of the input, the original text: Match if current is at the end of input. position
/n Line break, /u000A
/0ooo 8 hexadecimal value OOo string
[pattern] Pattern represents the string to match, such as [A] that matches the A
. Arbitrary character
^ Start
$ Ending
/ Escape, for example, $/ $, regular:

The table is the author of the document to understand the characters, there are a few I do not know for example: /N{UNICODE CHARACTER NAME} how to use and what role.

Regular expression operator

Operator table:

Operator describe
. Or, for example, A B, matching A or B
* 0 or more times
+ 1 or more times
? 0 or 1 times
{n} N as a number, indicating that the previous match appears n times, for example: 3{2} means 33
{n} N as a number, indicating that at least 333333333 consecutive times before the match, such as: 3{2,}, said:…
{n, m} N, m for digital, n&lt, =m, said the previous match appears N ~ m times, for example: 3{2,5}, said: 33333333333333
*? 0 or more times, but the number as little as possible, such as regular: [/d]*, to match the character: 12345, matching the number of times 6, are empty
+? 1 or more times, but the number as little as possible, such as regular: [/d]+?, to be matched characters: 12345, matching the number of times 5 times, matching results are: 1, 2, 3, 5.
?? 0 or 1 times, such as regular: [/d]?, to be matched characters: 12345, matching the number of times 6, are empty
{n}? With {n}
{n}? N is a number, but as few as possible. For example, regular: [/d]{1,}, to be matched characters: 12345, matching the number of times 5 times, matching results are: 1, 2, 3, 5.
{n, m}? With {n, m}, but as little as possible. For example, regular: [/d]{1,5}, to be matched characters: 12345, matching the number of times 5 times, matching results are: 1, 2, 3, 5.
* 0 or more times, but as many times as possible (greedy mode). For example, regular: [/d]*+, to be matched characters: 12345, matching the number of times 2, matching the results are: 12345 and empty.
+ + 1 or more times, but as many times as possible (greedy mode). For example, regular: [/d]*+, to be matched characters: 12345, matching the number of times 1, matching the results are: 12345.
+? 0 or 1 times, but as many times as possible (greedy mode). For example, regular: [/d]*+, to be matched characters: 12345, matching the number of times 5 times, matching results are: 1, 2, 3, 4.
{n}+ With {n}
{n,}+ With {n,}, but as many times as possible (greedy mode). For example: regular [/d]{1,}+, matching characters: 12345, 1 times. The matching results are: 12345.
{n, m}+ With {n, m}, but as many times as possible (greedy mode). For example, regular: [/d]{1,5}+, to be matched characters: 12345, matching the number of times 1, matching the results are: 12345.
(…) A sub expression, and captures the matching character (the difference between capture and non capture is described later)
(…) The matching expression, but does not capture characters (the difference between capture and non capture is described later)
(? >…) Greedy sub expression, do not capture, such as regular: ((> [/d]{1,}), to be matched characters: 12345, matching the number of times 1, matching results are: 12345.
(? #…) Notes
(= =…) Zero width positive predictive assertion. Matches the front position of the character, the attention is the location is not the character, so the width is 0. For example: (=/d), to be matched string: abc123, the matching position is in front of the following characters: 1, 2, 3
(???) Zero width prediction. The match is not followed by the position of the character, the attention is not the character position, so the width of 0. For example, regular: (=/d), to be matched string: abc123, the location of the match is behind the following characters: A, B, C, iOS match will automatically match more than one terminator.
(? < =…) Zero width forward backward assertion. Match the back position of the character, note that the position is not a character, so the width is 0. For example: (=/d), to be matched string: abc123, the location of the match is behind the following characters: 1, 2, 3
(? <!!) Zero width negative review. The location of the match is not followed by the character, the attention is not the character position, so the width of 0. For example: (=/d), to be matched string: abc123, the matching position is in front of the following characters: A, B, C, 1
(? Ismwx-ismwx:…) Set sub regular… Match flag option. For example: to match the string: test, regular: ((i:TEST)), matching results: test. Regular: ((-i:TEST)), less than matching. When using the code test option do not choose NSRegularExpressionCaseInsensitive.
(? Ismwx-ismwx) Set the position after the regular match flag option. For example: to match the string: test, regular: ((I) TEST, matching results: test. Regular: ((-i) TEST, less than matching, regular: T (? I) EST, less than matching, because the previous T is case sensitive. When using the code test option do not choose NSRegularExpressionCaseInsensitive.

tool

Multiple testing tools, you can directly search the Internet ‘regular expression online detection’ will appear a pile (Figure: regular expression online detection.Png)

IOS regular expression
regular expression online detection.Png

can also write your own code to verify but more trouble every time for the convenience of run, and then the author himself wrote a simple test tool if you need to click iOS regular expression detection tools download. As figure:

IOS regular expression
iOS regular expression detection tool.Png

currently does not do string replacement function.

Capture and non capture

capture

The capture will be matched to the current storage character, in regular expressions can be used for /n, n is 10 hexadecimal number, the N acquisition will write /n, $n can be used in a replacement pattern, said the N capture string.

Not capture

Does not capture a string that does not match

Example:

Matching string: / / hello / / regular: (/w) /1 / / ll / / code matching results are as follows: NSString = *string [NSString stringWithFormat:@ "hello"]; NSError *error = NULL; NSString = *regexString @ /1 (//w); NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regexString options:NSRegularExpressionUseUnicodeWordBoundaries error:& error]; NSUInteger integer = [regex numberOfMatchesInString:string options:NSMatchingReportProgress range:NSMakeRange (0, string.length); NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange (0 [string length])]; for (NSTextCheckingResult * result in matches) {NSRange matchRange = [r Esult range]; NSString * subString = [string substringWithRange:matchRange]; NSLog (@ "% @", subString);}
Matching string: / / hello / / regular: (he) ((/w) /3) (o) / / replace mode: $1-$2-$3-$4 / / he-ll-l-o / / replace results: capture sequence description: according to the "(" order / / code is as follows: NSString = [NSString *string stringWithFormat:@ "hello"]; NSError *error = NULL; NSString = *regexString @ (he) ((//w) //3) (o); NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regexString options:NSRegularExpressionUseUnicodeWordBoundaries error:& error]; NSLog (@ "%li", regex.numberOfCaptureGroups); NSLog (@ "% @", [regex stringByReplacingMatchesInString:string options:NSMatchingReportProgress range:NSMakeRange (0, string.length) withTemplate:@ "$1-$2-$3-$4"]);

Example

1 domestic mobile phone number

The domestic mobile phone number is 11 bits, beginning about:
China Mobile: 134, 135, 136, 137, 138, 139, 150, 151, 152, 157 (TD), 158, 159, 182, 183, 184, 187, 178, 188, 147, 1705 (mobile virtual operators paragraph
Chinese): 130, 131, 132 China Unicom (145, 155, 156 data card section), 176, 185, 186, 1709 (virtual operators China Unicom number
): 133, 153, 177 China Telecom, 180, 181, 189, 134, 1700 (virtual operators of telecommunications section)
Baidu know

Regular as follows:

(134|135|136|137|138|139|150|151|152|157|158|159|182|183|184|187|178|188|147|1705|130|131|132|145|155|156|176|185|186|1709|133|153|177|180|181|189|134|1700) /d{8}

Analysis:

"(134|135|136|137|138|139|150|151|152|157|158|159|182|183|184|187|178|188|147|1705|130|131|132|145|155|156|176|185|186|1709|133|153|177|180|181|189|134|1700)" matches the number beginning. "/d{8}" matches the following 8 must be numbers

Tested successfully.

2 Chinese

Regular expression: [/u4E00-/u9FA5]
analysis: Chinese coding and part of the Japanese code in the /u4E00-/u9FA5 range

3 mailbox

A@B.C

A rules are as follows: for the numbers and letters, and the length of at least 1
B for the numbers and letters, and the length of at least 1
C letters or Chinese, length of at least 2. (domain name suffix length of 1 has not seen)

Regular expression: /b ((I) /w+@/w+/.[A-Z/u4E00-/u9FA5]{2),}/b

Analysis:

"/b" is the border, said the mailbox and are no spaces or other characters "(? I)" said behind the letters are case insensitive "/w+" character, length of at least 1 ".[A-Z/u4E00-/u9FA5]{2," or Chinese} letters, length of at least 2

4 domestic telephone

The format of the domestic telephone is: area code – number
area code shortest 3, the longest 4. Number 5, the longest 8

Regular: [/d]{3,4}-[d]{5,8}
analysis:

"[/d]{3,4}" digit, 3 to 4 "-" delimiter - "[d]{5,8}" number, 5 to 8 bits

5 user name, in English at the beginning, allowing the importation of numbers and English

Regular: ^[A-Za-z][A-Za-z0-9]*

Analysis:

"^[A-Za-z]" letter "[A-Za-z0-9]*" can enter numbers and English

6.QQ number

QQ is also the shortest 5.
regular: [1-9]/d{4,}

Analysis:

"[1-9]" is not a 0 digit "/d{4,}" with at least 4 digits

7 ID card

The second generation ID card ID card coding rules are as follows:
(1) before the top 1, the number of 2: the province (municipality, autonomous region) code;

(2) the figures of the third and 4 digits: the code of the prefecture level city (autonomous prefecture);

(3) the number of the fifth and the 6 digits: the codes of the districts (counties, autonomous counties and county-level cities);

(4) the number of seventh – 14 digits: year, month, day of birth;

(5) the fifteenth and 16 digits indicate the location of the police station;

(6) the number of the seventeenth digits indicates the gender: odd number indicates the male (1, 3, 5, 7, 9), even the female (0, 2, 4, 6);

(7) the eighteenth digit number is the check code: some also say it is a personal information code, not with the random generation of the computer, it is used to test the correctness of the identity card. Check code can be 0 – 9 numbers, and sometimes also with X. As the tail number of the checksum code, compiled by the number units according to the unified formula, if the person’s tail number is 0 – 9, there are no X, but if the tail number is 10, then you have to use X instead, because if done with the 10 tail number, then the person’s identity card a 19 bit. X is 10 of the number of Rome, with X instead of 10, you can ensure that the citizen’s identity card in line with national standards.

Note: the maximum input here is 2019.

Regular: [1-6][0-7][/d]{4} ((19[/d]{2}) | ((20[0-1][/d])) (0[1-9]) (1[0-2]) (|) (0[1-9]) | ([1-2]/d) | (3[0-1])) [/d]{3}[/dx]

Analysis:

"[1-6][0-7]" according to the provinces (municipalities and autonomous regions) of the code is the first 1-6, the second is 0-7 "[/d]{4}" cities and districts are "digital encoding ((19[/d]{2}) | ((20[0-1][/d])) (0[1-9]) (1[0-2]) (|) (0[1-9]) ([1-2]/d) (| | 3[0-1])") date of birth, the first is 1900/01/01, 2019/12/31 is the latest "[/d]{3}[/dx]" digital four or 3 digital +x

8 postal code

Domestic postal code for 6 digits

Regular /d{6}

Analysis: both figures

9 URL

Regular: (? I) /b (http://|https://)? ([www.]? [/w/./-]+/.[A-Z/u4E00-/u9FA5]{2)} (/: [0-9]+) * (/ ($|[a-zA-Z0-9/./ / /’//+&%/$#/=~_/-]+; /?) *)

Can match:

Http://jianshu.com:8080/ http://jianshu.com/? A=asdsa& b=asdas http:// book.Com/

Not match:

Ftp://jianshu.com ftps://jianshu.com

Analysis:

"(? I)" case insensitive "/b" boundary "(http://|https://)? Http:// or https:// at the beginning there will be no" ([www.]?) "www." [/w/./-]+ "can appear not essential, English, English digital period, and at least one - Chinese,". "English:" [A-Z/u4E00-/u9FA5]{2, stop. English, Chinese} "at least two" (A / [0-9]+) * /: digital "not essential (/ ($|[a-zA-Z0-9/./ /; /? /'///+&;%/$#/=~_/-]+)) *" slash / back characters can appear

10 date of birth

Format: yyyy/MM/dd

Note: the maximum input here is 2019.

Regular: ((19[/d]{2}) | (20[0-1][/d])) / ((0[1-9]) | (1[0-2])) / ((0[1-9]) | ([1-2]/d) | (3[0-1]))

Analysis:

"(19[/d]{2}) | (20[0-1][/d])" birth year 1900~2019 "/" / "delimiter ((0[1-9]) | (1[0-2]))" birth month 01-12 "/" / "delimiter ((0[1-9]) | ([1-2]/d) | (3[0-1]))" date of birth 1-31

11 time

24 hours, time format: hh:mm:ss

Regular: (([0-1]/d) | (2[0-3]): [0-5]/d:[0-5]/d)

Analysis:

"(([0-1]/d) | (2[0-3]))" 00 to 23 "[0-5]/d" 00 to 59 hours

12 image file name

Regular: /w+/. (? I:png|jpg|jpeg|gif)

Analysis:

The "[/w]+" image file name cannot be a character "((i:png|jpg|jpeg|gif))" suffix "PNG", "GIF", "JPG", "JPEG", and "case insensitive".

If there is something wrong please correct me, thank you.
thank you very much for Ever_Blacks.