1 ---
2 type: none
3 identifier: python-regex-homoglyphs
4 title: Patching Python's regex AST for confusable homoglyphs
5 description: Exploiting the re module's parsing and compiling methods to create a better automoderator.
6 datestring: 2022-11-06
7 banner_image: /static/images/futaba-filter.jpg
8 links:
9     futaba: https://github.com/strinking/futaba
10     confusable_homoglyphs: https://pypi.org/project/confusable_homoglyphs/
11     Python re module: https://docs.python.org/3/library/re.html
12     Scunthorpe problem: https://www.wikiwand.com/en/Scunthorpe_problem
13     Abstract Syntax Trees: https://www.wikiwand.com/en/Abstract_syntax_tree
14 ---
15
16 A couple years ago I contributed to an open source Discord
17 [bot](https://github.com/strinking/futaba) designed by a public computer
18 programming community I’m part of. As with most popular Discord bots, it
19 incorporates its own filter to prohibit unwanted language.  However, to ensure
20 coverage of messages like `as𝕕f` (notice non-ASCII `𝕕`) in case it’s told to
21 filter `asdf` (an ASCII-only string), the bot makes use of the
22 [confusable_homoglyphs](https://pypi.org/project/confusable_homoglyphs/) Python
23 package to automatically expand an inputted filter string to support these
24 non-ASCII edge cases.
25
26 Originally, the bot used base Python string manipulation to convert the
27 inputted filter string into a Python [regular
28 expression](https://docs.python.org/3/library/re.html) string that matches for
29 each original character as well as each character's confusable homoglyphs
30 (similar-looking non-ASCII characters). For example, the inputted filter string
31 `asdf` was converted by the bot into the following regular expression:
32
33 ```bash
34 [a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք]
35 ```
36
37 This regular expression uses character sets to match one instance each of the
38 original character or its confusable homoglyphs. In other words, it has the
39 ability to catch any recognizable rendition of a word using uncommon non-ASCII
40 characters. This was really nice, because it meant the bot could catch most
41 edge cases automatically. Unfortunately, however...
42
43 ## The problem
44
45 Using the method of regex generation described above precludes the use of
46 arbitrary regex patterns, which would help in looking at context to prevent
47 [false positives](https://www.wikiwand.com/en/Scunthorpe_problem). When
48 expanding filter strings, the bot could not differentiate between literal
49 characters and special regex tokens. So, instead of the valid regular
50 expression string `^asdf(.*)$` converting into this, preserving special regex
51 tokens as desired:
52
53 ```bash
54 ^[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք](.*)$
55 ```
56
57 It would convert into this, mangling the original intended regex string, making
58 it useless for arbitrary matching:
59
60 ```bash
61 [\^˄ˆ][a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք][\(［❨❲〔﴾][\.𝅭․‎܁‎‎܂‎꘎‎𐩐‎‎٠‎۰ꓸ][\*⁎‎٭‎∗𐌟][\)］❩❳〕﴿]$
62 ```
63 <figcaption>Interestingly, confusable_homoglyphs doesn't list any special
64 characters that look similar to $.</figcaption>
65
66 ## The solution
67
68 To support expansion for confusables while preserving arbitrary regex, an
69 **[abstract syntax tree](https://www.wikiwand.com/en/Abstract_syntax_tree)
70 (AST)** for the regular expression would need to be generated from the filter
71 string, then modified to replace predefined character literals (e.g. `a`) with
72 sets of their confusable homoglyphs (e.g. `[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а]`) before
73 being compiled into a usable pattern object. While this process appears similar
74 to the string manipulation method described previously, parsing first for an
75 AST provides a mutable structure that allows us to distinguish character
76 literals from special regex lexemes/grammar. Fortunately, Python’s `re` module
77 already contains (and exposes!!) the internal tools for parsing and
78 compiling—we just need to modify the process.
79
80 *The AST is generated from the inputted filter string, and the usable pattern
81 matching object is compiled from the AST. Since these steps are separate in
82 Python's `re` pipeline and since the AST is a mutable object, the AST object
83 can be separately modified before being compiled.*
84
85 # Manipulating Python’s regex AST
86
87 This required a bit of reverse engineering on my part as I couldn't find any
88 adequate documentation on the internals of Python’s `re` module. After some
89 brief digging in the CPython source repository, I found **two submodules of
90 `re` that handle regex string parsing and compilation: `re.sre_parse` and
91 `re.sre_compile`**, respectively.
92
93 ## Reverse engineering
94
95 The `re.compile()` function uses the `sre_parse` and `sre_compile` submodules
96 by (effectively) doing the following to return a `re.Pattern` object:
97
98 ```python
99 ast = re.sre_parse.parse( input_regex_string ) # -> re.sre_parse.SubPattern
100 pattern_object = re.sre_compile.compile( ast ) # -> re.Pattern
101 return pattern_object
102 ```
103
104 Knowing this, we can experiment with the output of `sre_parse.parse()` function
105 to determine `re`'s AST structure and figure out how we need to modify it.
106
107 ```python
108 >>> import re
109
110 >>> re.sre_parse.parse("asdf")
111 [(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)]
112
113 >>> type(re.sre_parse.parse("asdf"))
114 sre_parse.SubPattern
115
116 >>> re.sre_parse.parse("[asdf]")
117 [(IN, [(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)])]
118
119 >>> re.sre_parse.parse("(asdf)") # To see how `re` handles nested lexemes
120 [(SUBPATTERN, (1, 0, 0, [(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)]))]
121 ```
122
123 From this, we know the `sre_parse.parse()` function returns a `SubPattern`
124 list-like object containing *tuples of lexemes in the format `(LEXEME_NAME,
125 value)`.*
126
127 ## Modifying the AST (implementing the solution)
128
129 For our case, we’re looking to replace lexemes identified as `LITERAL`s:
130
131 > `(LITERAL, ord)`, representing literal characters like `a`
132
133 with character match sets—`IN` lexemes—wrapping more `LITERAL` lexemes:
134
135 > `(IN, [ (LITERAL, ord) ... ])`, representing sets like `[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а]`
136
137 Because abstract syntax trees are, well, trees, this needs to be done
138 recursively to account for nested lexemes, such as those within matching
139 groups.
140
141 This is the solution I came up with:
142
143 ```python
144 from collections.abc import Iterable
145 import re
146
147 from confusable_homoglyphs import confusables
148
149 def patched_regex(regex_string: str) -> re.Pattern:
150     """
151     Generate a regex pattern object after replacing literals with sets of
152     confusable homoglyphs
153     """
154
155     # Generate AST from base input string
156     ast_root = re.sre_parse.parse(regex_string)
157
158     # Iterative function to patch AST
159     def modify(ast_local: Iterable):
160
161         # Step through this level of the AST
162         for index, item in enumerate(ast_local):
163
164             # Lexeme represented by item = (LEXEME_TYPE, VALUE)
165             if isinstance(item, tuple):
166
167                 if item[0] == re.sre_parse.LITERAL:
168                     # LITERAL type found item = (sre_parse.LITERAL, ord)
169
170                     # Generate list of confusable homoglyphs...
171                     groups = confusables.is_confusable(chr(item[1]),
172                                                        greedy=True)
173                     if not groups:
174                         # Homoglyph is not confusable, doesn't need to be
175                         # patched, move to next item in subpattern
176                         continue
177
178                     confusable_literals = [item] # Begin the homoglyph set with
179                                                  # the original character lexeme
180                     for homoglyph in groups[0]["homoglyphs"]:
181                         confusable_literals += [ # Append confusable homoglyph
182                                                  # lexemes to set
183                             (re.sre_parse.LITERAL, ord(char))
184                             for char in homoglyph["c"]
185                         ]
186
187                     # Overwrite the original LITERAL lexeme in the AST with the
188                     # new set of confusable homoglyphs
189                     ast_local[index] = (re.sre_parse.IN, confusable_literals)
190                 else:
191                     # Not LITERAL; more possible lexemes nested in this one
192                     # Convert to list, recurse, output back to tuple, then
193                     # overwrite in AST
194                     ast_local[index] = tuple(modify(list(item)))
195
196             elif isinstance(item, re.sre_parse.SubPattern):
197                 # More possible lexemes, recurse and overwrite in AST
198                 ast_local[index] = modify(item)
199
200         return ast_local
201
202     # Patch generated base AST
203     ast_root = modify(ast_root)
204
205     # Compile AST to case-insensitive re.Pattern and return
206     return re.sre_compile.compile(ast_root, re.IGNORECASE)
207 ```
208
209 Testing the generated regular expression pattern with the example input string
210 from earlier, we can see it now works as expected.
211
212 ```python
213 >>> pattern = patched_regex("^asdf(.*)$")
214
215 >>> pattern.match("Not a match")
216 None
217
218 >>> pattern.match("Not a match even though asdf is in it because it doesn't follow the regex pattern")
219 None
220
221 >>> pattern.match("asdf")
222 <re.Match object; span=(0, 4), match='asdf'>
223
224 >>> pattern.match("asdf match")
225 <re.Match object; span=(0, 10), match='asdf match'>
226
227 >>> pattern.match("𝜶ꮪ𝚍𝖿") # String containing confusable homoglyphs
228 <re.Match object; span=(0, 4), match='𝜶ꮪ𝚍𝖿'>
229
230 >>> pattern2 = patch_regex("^asdf\\$(.*)$") # Also works when escaping special characters
231
232 >>> pattern2.match("asdf match?")
233 None
234
235 >>> pattern2.match("asdf$ match?")
236 <re.Match object; span=(0, 11), match='asdf$ match'>
237 ```
238 <figcaption>This works likewise with re.Pattern.search() and other re.Pattern
239 functions.</figcaption>
240
241 I submitted a [pull request](https://github.com/strinking/futaba/pull/368) to
242 the bot which would make any filter string prefixed by `regex:` use a custom
243 regex compilation process similar to the one above. This allows the Discord bot
244 to employ arbitrary regular expressions as filter items, making use of
245 supported regex features such as lookarounds, while still preserving expansion
246 for non-ASCII confusable homoglyphs.
247
1	---
2	type: none
3	identifier: python-regex-homoglyphs
4	title: Patching Python's regex AST for confusable homoglyphs
5	description: Exploiting the re module's parsing and compiling methods to create a better automoderator.
6	datestring: 2022-11-06
7	banner_image: /static/images/futaba-filter.jpg
8	links:
9	futaba: https://github.com/strinking/futaba
10	confusable_homoglyphs: https://pypi.org/project/confusable_homoglyphs/
11	Python re module: https://docs.python.org/3/library/re.html
12	Scunthorpe problem: https://www.wikiwand.com/en/Scunthorpe_problem
13	Abstract Syntax Trees: https://www.wikiwand.com/en/Abstract_syntax_tree
14	---
15
16	A couple years ago I contributed to an open source Discord
17	[bot](https://github.com/strinking/futaba) designed by a public computer
18	programming community I’m part of. As with most popular Discord bots, it
19	incorporates its own filter to prohibit unwanted language. However, to ensure
20	coverage of messages like `as𝕕f` (notice non-ASCII `𝕕`) in case it’s told to
21	filter `asdf` (an ASCII-only string), the bot makes use of the
22	[confusable_homoglyphs](https://pypi.org/project/confusable_homoglyphs/) Python
23	package to automatically expand an inputted filter string to support these
24	non-ASCII edge cases.
25
26	Originally, the bot used base Python string manipulation to convert the
27	inputted filter string into a Python [regular
28	expression](https://docs.python.org/3/library/re.html) string that matches for
29	each original character as well as each character's confusable homoglyphs
30	(similar-looking non-ASCII characters). For example, the inputted filter string
31	`asdf` was converted by the bot into the following regular expression:
32
33	```bash
34	[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք]
35	```
36
37	This regular expression uses character sets to match one instance each of the
38	original character or its confusable homoglyphs. In other words, it has the
39	ability to catch any recognizable rendition of a word using uncommon non-ASCII
40	characters. This was really nice, because it meant the bot could catch most
41	edge cases automatically. Unfortunately, however...
42
43	## The problem
44
45	Using the method of regex generation described above precludes the use of
46	arbitrary regex patterns, which would help in looking at context to prevent
47	[false positives](https://www.wikiwand.com/en/Scunthorpe_problem). When
48	expanding filter strings, the bot could not differentiate between literal
49	characters and special regex tokens. So, instead of the valid regular
50	expression string `^asdf(.*)$` converting into this, preserving special regex
51	tokens as desired:
52
53	```bash
54	^[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք](.*)$
55	```
56
57	It would convert into this, mangling the original intended regex string, making
58	it useless for arbitrary matching:
59
60	```bash
61	[\^˄ˆ][a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а][sｓ𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜ꜱƽѕꮪ𑣁𐑈][dⅾⅆ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍ԁᏧᑯꓒ][f𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏ꬵꞙſẝք][\(［❨❲〔﴾][\.𝅭․‎܁‎‎܂‎꘎‎𐩐‎‎٠‎۰ꓸ][\*⁎‎٭‎∗𐌟][\)］❩❳〕﴿]$
62	```
63	<figcaption>Interestingly, confusable_homoglyphs doesn't list any special
64	characters that look similar to $.</figcaption>
65
66	## The solution
67
68	To support expansion for confusables while preserving arbitrary regex, an
69	**[abstract syntax tree](https://www.wikiwand.com/en/Abstract_syntax_tree)
70	(AST)** for the regular expression would need to be generated from the filter
71	string, then modified to replace predefined character literals (e.g. `a`) with
72	sets of their confusable homoglyphs (e.g. `[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а]`) before
73	being compiled into a usable pattern object. While this process appears similar
74	to the string manipulation method described previously, parsing first for an
75	AST provides a mutable structure that allows us to distinguish character
76	literals from special regex lexemes/grammar. Fortunately, Python’s `re` module
77	already contains (and exposes!!) the internal tools for parsing and
78	compiling—we just need to modify the process.
79
80	*The AST is generated from the inputted filter string, and the usable pattern
81	matching object is compiled from the AST. Since these steps are separate in
82	Python's `re` pipeline and since the AST is a mutable object, the AST object
83	can be separately modified before being compiled.*
84
85	# Manipulating Python’s regex AST
86
87	This required a bit of reverse engineering on my part as I couldn't find any
88	adequate documentation on the internals of Python’s `re` module. After some
89	brief digging in the CPython source repository, I found **two submodules of
90	`re` that handle regex string parsing and compilation: `re.sre_parse` and
91	`re.sre_compile`**, respectively.
92
93	## Reverse engineering
94
95	The `re.compile()` function uses the `sre_parse` and `sre_compile` submodules
96	by (effectively) doing the following to return a `re.Pattern` object:
97
98	```python
99	ast = re.sre_parse.parse( input_regex_string ) # -> re.sre_parse.SubPattern
100	pattern_object = re.sre_compile.compile( ast ) # -> re.Pattern
101	return pattern_object
102	```
103
104	Knowing this, we can experiment with the output of `sre_parse.parse()` function
105	to determine `re`'s AST structure and figure out how we need to modify it.
106
107	```python
108	>>> import re
109
110	>>> re.sre_parse.parse("asdf")
111	[(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)]
112
113	>>> type(re.sre_parse.parse("asdf"))
114	sre_parse.SubPattern
115
116	>>> re.sre_parse.parse("[asdf]")
117	[(IN, [(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)])]
118
119	>>> re.sre_parse.parse("(asdf)") # To see how `re` handles nested lexemes
120	[(SUBPATTERN, (1, 0, 0, [(LITERAL, 97), (LITERAL, 115), (LITERAL, 100), (LITERAL, 102)]))]
121	```
122
123	From this, we know the `sre_parse.parse()` function returns a `SubPattern`
124	list-like object containing *tuples of lexemes in the format `(LEXEME_NAME,
125	value)`.*
126
127	## Modifying the AST (implementing the solution)
128
129	For our case, we’re looking to replace lexemes identified as `LITERAL`s:
130
131	> `(LITERAL, ord)`, representing literal characters like `a`
132
133	with character match sets—`IN` lexemes—wrapping more `LITERAL` lexemes:
134
135	> `(IN, [ (LITERAL, ord) ... ])`, representing sets like `[a⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊ɑα𝛂𝛼𝜶𝝰𝞪а]`
136
137	Because abstract syntax trees are, well, trees, this needs to be done
138	recursively to account for nested lexemes, such as those within matching
139	groups.
140
141	This is the solution I came up with:
142
143	```python
144	from collections.abc import Iterable
145	import re
146
147	from confusable_homoglyphs import confusables
148
149	def patched_regex(regex_string: str) -> re.Pattern:
150	"""
151	Generate a regex pattern object after replacing literals with sets of
152	confusable homoglyphs
153	"""
154
155	# Generate AST from base input string
156	ast_root = re.sre_parse.parse(regex_string)
157
158	# Iterative function to patch AST
159	def modify(ast_local: Iterable):
160
161	# Step through this level of the AST
162	for index, item in enumerate(ast_local):
163
164	# Lexeme represented by item = (LEXEME_TYPE, VALUE)
165	if isinstance(item, tuple):
166
167	if item[0] == re.sre_parse.LITERAL:
168	# LITERAL type found item = (sre_parse.LITERAL, ord)
169
170	# Generate list of confusable homoglyphs...
171	groups = confusables.is_confusable(chr(item[1]),
172	greedy=True)
173	if not groups:
174	# Homoglyph is not confusable, doesn't need to be
175	# patched, move to next item in subpattern
176	continue
177
178	confusable_literals = [item] # Begin the homoglyph set with
179	# the original character lexeme
180	for homoglyph in groups[0]["homoglyphs"]:
181	confusable_literals += [ # Append confusable homoglyph
182	# lexemes to set
183	(re.sre_parse.LITERAL, ord(char))
184	for char in homoglyph["c"]
185	]
186
187	# Overwrite the original LITERAL lexeme in the AST with the
188	# new set of confusable homoglyphs
189	ast_local[index] = (re.sre_parse.IN, confusable_literals)
190	else:
191	# Not LITERAL; more possible lexemes nested in this one
192	# Convert to list, recurse, output back to tuple, then
193	# overwrite in AST
194	ast_local[index] = tuple(modify(list(item)))
195
196	elif isinstance(item, re.sre_parse.SubPattern):
197	# More possible lexemes, recurse and overwrite in AST
198	ast_local[index] = modify(item)
199
200	return ast_local
201
202	# Patch generated base AST
203	ast_root = modify(ast_root)
204
205	# Compile AST to case-insensitive re.Pattern and return
206	return re.sre_compile.compile(ast_root, re.IGNORECASE)
207	```
208
209	Testing the generated regular expression pattern with the example input string
210	from earlier, we can see it now works as expected.
211
212	```python
213	>>> pattern = patched_regex("^asdf(.*)$")
214
215	>>> pattern.match("Not a match")
216	None
217
218	>>> pattern.match("Not a match even though asdf is in it because it doesn't follow the regex pattern")
219	None
220
221	>>> pattern.match("asdf")
222	<re.Match object; span=(0, 4), match='asdf'>
223
224	>>> pattern.match("asdf match")
225	<re.Match object; span=(0, 10), match='asdf match'>
226
227	>>> pattern.match("𝜶ꮪ𝚍𝖿") # String containing confusable homoglyphs
228	<re.Match object; span=(0, 4), match='𝜶ꮪ𝚍𝖿'>
229
230	>>> pattern2 = patch_regex("^asdf\\$(.*)$") # Also works when escaping special characters
231
232	>>> pattern2.match("asdf match?")
233	None
234
235	>>> pattern2.match("asdf$ match?")
236	<re.Match object; span=(0, 11), match='asdf$ match'>
237	```
238	<figcaption>This works likewise with re.Pattern.search() and other re.Pattern
239	functions.</figcaption>
240
241	I submitted a [pull request](https://github.com/strinking/futaba/pull/368) to
242	the bot which would make any filter string prefixed by `regex:` use a custom
243	regex compilation process similar to the one above. This allows the Discord bot
244	to employ arbitrary regular expressions as filter items, making use of
245	supported regex features such as lookarounds, while still preserving expansion
246	for non-ASCII confusable homoglyphs.
247