3 frak transforms collections of strings into regular expressions for
4 matching those strings. The primary goal of this library is to
5 generate regular expressions from a known set of inputs which avoid
6 backtracking as much as possible.
10 Add frak as a dependency to your `project.clj` file.
21 user> (frak/pattern ["foo" "bar" "baz" "quux"])
22 #"(?:ba(?:r|z)|foo|quux)"
23 user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"])
24 #"Cloj(?:ure(?:Script)?|ars)"
29 A frak pattern is constructed from a trie of characters. As characters
30 are added to it, meta data is stored in it's branches containing
31 information such as which branches are terminal and a record of
32 characters which have "visited" the branch.
34 During the rendering process frak will prefer branch characters that
35 have "visited" the most. In the example above, you will notice the
36 `ba(?:r|z)` branch takes precedence over `foo` even though `"foo"` was
37 the first to enter the trie. This is because the character `\b` has
38 frequented the branch more than `\f` and `\q`. The example below
39 illustrates this behavior on the second character of each input.
42 user> (frak/pattern ["bit" "bat" "ban" "bot" "bar" "box"])
43 #"b(?:a(?:t|n|r)|o(?:t|x)|it)"
48 [Here's](https://github.com/guns/vim-clojure-static/blob/249328ee659190babe2b14cd119f972b21b80538/syntax/clojure.vim#L91-L92)
53 While the patterns currently generated by frak are correct, there is
54 potential for improvement; the word trie could be converted to a
55 directed acyclic word graph (DAWG).
57 By using a DAWG it might be possible to produce more efficient
58 patterns which consider common prefixes and suffixes. This could
59 reduce backtracking and overall pattern size.
61 ## And now for something completely different
63 Let's build a regular expression for matching any word in
64 `/usr/share/dict/words`.
67 user> (require '[clojure.java.io :as io])
70 (-> (io/file "/usr/share/dict/words")
74 user> (def word-re (frak/pattern words))
76 user> (every? #(re-matches word-re %) words)
80 You can view the full expression
81 [here](https://gist.github.com/noprompt/6106573/raw/fcb683834bb2e171618ca91bf0b234014b5b957d/word-re.clj)
82 (it's approximately `1.5M`!).