Set development version
[frak.git] / README.md
1 # frak
2
3 frak transforms collections of strings into regular expressions for
4 matching those strings. The primary goal of this library is to
5 generate regular expressions from a known set of inputs which avoid
6 backtracking as much as possible. It is available as a [command line
7 utility](#command-line-usage) and for the [browser](#browser-usage)
8 as a JavaScript library.
9
10 ## "Installation"
11
12 Add frak as a dependency to your `project.clj` file.
13
14 ```clojure
15 [frak "0.1.4"]
16 ```
17
18 ## Clojure(Script) usage
19
20 ```clojure
21 user> (require 'frak)
22 nil
23 user> (frak/pattern ["foo" "bar" "baz" "quux"])
24 #"(?:ba[rz]|foo|quux)"
25 user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"])
26 #"Cloj(?:ure(?:Script)?|ars)"
27 user> (frak/pattern ["skill" "skills" "skull" "skulls"])
28 #"sk(?:[ui]lls?)"
29 ```
30
31 ## Command line usage
32
33 frak can be used from the command line with either Leiningen or NodeJS.
34
35 ### With Leiningen
36
37 Use the `lein run` command:
38
39 ```shell
40 $ lein run -e foo bar baz quux
41 ^(?:ba[rz]|foo|quux)$
42 ```
43
44 ### With NodeJS
45
46 Compile the NodeJS version
47
48 ```shell
49 $ lein do cljx once, cljsbuild once node
50 $ chmod +x bin/frak
51 $ bin/frak -e foo bar baz quux
52 ^(?:ba[rz]|foo|quux)$
53 ```
54
55 ## Browser usage
56
57 To use frak as a standalone library in the browser with JavaScript
58 compile the browser version:
59
60 ```shell
61 $ lein do cljx once, cljsbuild once browser
62 $ mv ./target/js/frak.min.js <destination>
63 ```
64
65 Try it using this HTML:
66
67 ```html
68 <!DOCTYPE html>
69 <html>
70 <head>
71 </head>
72 <body>
73 <pre>Input: <span id="input"></span></pre>
74 <pre>Output: <span id="output"></span></pre>
75 <script src="http://code.jquery.com/jquery-2.0.3.min.js"></script>
76 <script src="frak.min.js"></script>
77 <script>
78 var strings = ["foo", "bar", "baz", "quux"];
79 // It's a good idea to use the `"exact?"` option.
80 var pattern = frak.pattern(strings, {"exact?": true})
81 jQuery("#input").text(strings.join(" "));
82 jQuery("#output").text(pattern);
83 </script>
84 </body>
85 </html>
86 ```
87
88 For even more fun try it with [AngularJS](http://angularjs.org/)!
89
90 ## How?
91
92 A frak pattern is constructed from a trie of characters and a
93 renderer which processes it. As characters are added to the trie, data
94 such as such as which characters are terminal are stored in it's
95 branches.
96
97 During the rendering process frak analyzes each branch and attempts to
98 emit the most concise regular expression possible. Additional post
99 operations are applied after rendering to improve the expression where
100 possible.
101
102 ## Why?
103
104 [Here's](https://github.com/guns/vim-clojure-static/blob/249328ee659190babe2b14cd119f972b21b80538/syntax/clojure.vim#L91-L92)
105 why. Also because.
106
107 ## And now for something completely different
108
109 Let's build a regular expression for matching any word in
110 `/usr/share/dict/words`.
111
112 ```clojure
113 user> (require '[clojure.java.io :as io])
114 nil
115 user> (def words
116 (-> (io/file "/usr/share/dict/words")
117 io/reader
118 line-seq))
119 #'user/words
120 user> (def word-re (frak/pattern words))
121 #'user/word-re
122 user> (every? #(re-matches word-re %) words)
123 true
124 ```
125
126 The last two operations will take a moment since there are over
127 235,000 words to consider.
128
129 You can view the full expression
130 [here](https://gist.github.com/noprompt/6106573/raw/fcb683834bb2e171618ca91bf0b234014b5b957d/word-re.clj)
131 (it's approximately `1.5M`!).
132
133 ## Benchmarks
134
135 ```clojure
136 (use 'criterium.core)
137
138 (def words
139 (-> (io/file "/usr/share/dict/words")
140 io/reader
141 line-seq))
142
143 (defn naive-pattern
144 "Create a naive regular expression pattern for matching every string
145 in strs."
146 [strs]
147 (->> strs
148 (clojure.string/join "|")
149 (format "(?:%s)")
150 re-pattern))
151
152 ;; Shuffle 10000 words and build a naive and frak pattern from them.
153 (def ws (shuffle (take 10000 words)))
154
155 (def n-pat (naive-pattern ws))
156 (def f-pat (frak/pattern ws))
157
158 ;; Verify the naive pattern matches everything it was constructed from.
159 (every? #(re-matches n-pat %) ws)
160 ;; => true
161
162 ;; Shuffle the words again since the naive pattern is built in the
163 ;; same order as it's inputs.
164 (def ws' (shuffle ws))
165
166 ;;;; Benchmarks
167
168 ;; Naive pattern
169
170 (bench (doseq [w ws'] (re-matches n-pat w)))
171 ;; Execution time mean : 1.499489 sec
172 ;; Execution time std-deviation : 181.365166 ms
173 ;; Execution time lower quantile : 1.337817 sec ( 2.5%)
174 ;; Execution time upper quantile : 1.828733 sec (97.5%)
175
176 ;; frak pattern
177
178 (bench (doseq [w ws'] (re-matches f-pat w)))
179 ;; Execution time mean : 155.515855 ms
180 ;; Execution time std-deviation : 5.663346 ms
181 ;; Execution time lower quantile : 148.168855 ms ( 2.5%)
182 ;; Execution time upper quantile : 164.164294 ms (97.5%)
183 ```