discourse/lib/emoji
Régis Hanol 97c257e00c
DEV: Auto-generate emojiReplacementRegex from emoji database (#38491)
The `emojiReplacementRegex` in `pretty-text/emoji.js` was a manually
maintained regex string copied from an external source
(mathiasbynens/emoji-test-regex-pattern). This created a maintenance
gap: when the `discourse-emojis` gem was updated with new Unicode emoji
(e.g. Unicode 17.0), the replacements map would include them but the
regex would not match their raw Unicode characters. This meant pasting a
newer emoji like 🫪 (distorted face) would pass through un-replaced.

This commit eliminates the manual step by generating the regex
automatically from `Emoji.unicode_replacements` during `rake
javascript:update_constants` — the same task that already generates the
emoji names, aliases, and replacements map.

A new `Emoji::RegexGenerator` module builds a trie from all emoji
Unicode sequences (converted to UTF-16 code units for JS compatibility),
then emits an optimized regex pattern with character class ranges and
shared-prefix grouping. The generated regex is exported from
`pretty-text/emoji/data.js` alongside the other emoji constants, and
`emoji.js` now imports it instead of hardcoding it.

The generated regex matches all 3,418 emoji keys (including the 43
Unicode 17.0 emoji the old regex missed), is ~20% faster in benchmarks,
and can never drift from the emoji database again.

Closes #38416
2026-03-16 14:32:57 +01:00
..
regex_generator.rb DEV: Auto-generate emojiReplacementRegex from emoji database (#38491) 2026-03-16 14:32:57 +01:00