3 frak transforms collections of strings into regular expressions for
4 matching those strings. The primary goal of this library is to
5 generate regular expressions from a known set of inputs which avoid
6 backtracking as much as possible.
10 Add frak as a dependency to your `project.clj` file.
21 user> (frak/pattern ["foo" "bar" "baz" "quux"])
22 #"(?:ba(?:r|z)|foo|quux)"
23 user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"])
24 #"Cloj(?:ure(?:Script)?|ars)"
29 A frak pattern is constructed from a very stupid trie of characters.
30 As characters are added to it, meta data is stored in it's branches.
31 The meta data contains information such as which branches are terminal
32 and a record of characters which have "visited" the branch.
34 During the rendering process frak will prefer branch characters that
35 have "visited" the most. In the example above, you will notice the
36 `ba(?:r|z)` branch takes precedence over `foo` even though `"foo"` was
37 the first to enter the trie. This is because the character `\b` has
38 frequented the branch more than `\f` and `\q`. The example below
39 illustrates this behavior on the second character of each input.
42 user> (frak/pattern ["bit" "bat" "ban" "bot" "bar" "box"])
43 #"b(?:a(?:t|n|r)|o(?:t|x)|it)"
48 [Here's](https://github.com/guns/vim-clojure-static/blob/249328ee659190babe2b14cd119f972b21b80538/syntax/clojure.vim#L91-L92)
51 ## And now for something completely different
53 Let's build a regular expression for matching any word in
54 `/usr/share/dict/words`.
57 user> (require '[clojure.java.io :as io])
60 (-> (io/file "/usr/share/dict/words")
64 user> (def word-re (frak/pattern words))
66 user> (every? #(re-matches word-re %) words)
70 You can view the full expression
71 [here](https://gist.github.com/noprompt/6106573/raw/fcb683834bb2e171618ca91bf0b234014b5b957d/word-re.clj)
72 (it's approximately `1.5M`!).
80 (-> (io/file "/usr/share/dict/words")
85 "Create a naive regular expression pattern for matching every string
89 (clojure.string/join "|")
92 ;; Shuffle 10000 words and build a naive and frak pattern from them.
93 (def ws (shuffle (take 10000 words)))
94 (def n-pat (naive-pattern ws))
95 (def f-pat (frak/pattern ws))
97 ;; Verify the naive pattern matches everything it was constructed from.
98 (every? #(re-matches n-pat %) ws)
101 ;; Shuffle the words again since the naive pattern is built in the
102 ;; same order as it's inputs.
103 (def ws' (shuffle ws))
109 (bench (doseq [w ws'] (re-matches n-pat w)))
111 ;; Execution time mean : 1.499489 sec
112 ;; Execution time std-deviation : 181.365166 ms
113 ;; Execution time lower quantile : 1.337817 sec ( 2.5%)
114 ;; Execution time upper quantile : 1.828733 sec (97.5%)
118 (bench (doseq [w ws'] (re-matches f-pat w)))
119 ;; Execution time mean : 155.515855 ms
120 ;; Execution time std-deviation : 5.663346 ms
121 ;; Execution time lower quantile : 148.168855 ms ( 2.5%)
122 ;; Execution time upper quantile : 164.164294 ms (97.5%)