Allow user to specify own escape-chars
[frak.git] / README.md
... / ...
CommitLineData
1# frak
2
3frak transforms collections of strings into regular expressions for
4matching those strings. The primary goal of this library is to
5generate regular expressions from a known set of inputs which avoid
6backtracking as much as possible. It is available as a [command line
7utility](#command-line-usage) and for the [browser](#browser-usage)
8as a JavaScript library.
9
10## "Installation"
11
12Add frak as a dependency to your `project.clj` file.
13
14```clojure
15[frak "0.1.5"]
16```
17
18## Clojure(Script) usage
19
20```clojure
21user> (require 'frak)
22nil
23user> (frak/pattern ["foo" "bar" "baz" "quux"])
24#"(?:ba[rz]|foo|quux)"
25user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"])
26#"Cloj(?:ure(?:Script)?|ars)"
27user> (frak/pattern ["skill" "skills" "skull" "skulls"])
28#"sk(?:[ui]lls?)"
29```
30
31## Command line usage
32
33frak can be used from the command line with either Leiningen or NodeJS.
34
35### With Leiningen
36
37Use the `lein run` command:
38
39```shell
40$ lein run -e foo bar baz quux
41^(?:ba[rz]|foo|quux)$
42```
43
44### With NodeJS
45
46Compile the NodeJS version
47
48```shell
49$ lein do cljx once, cljsbuild once node
50$ chmod +x bin/frak
51$ bin/frak -e foo bar baz quux
52^(?:ba[rz]|foo|quux)$
53```
54
55## Browser usage
56
57To use frak as a standalone library in the browser with JavaScript
58compile the browser version:
59
60```shell
61$ lein do cljx once, cljsbuild once browser
62$ mv ./target/js/frak.min.js <destination>
63```
64
65Try it using this HTML:
66
67```html
68<!DOCTYPE html>
69<html>
70<head>
71</head>
72<body>
73 <pre>Input: <span id="input"></span></pre>
74 <pre>Output: <span id="output"></span></pre>
75 <script src="http://code.jquery.com/jquery-2.0.3.min.js"></script>
76 <script src="frak.min.js"></script>
77 <script>
78 var strings = ["foo", "bar", "baz", "quux"];
79 // It's a good idea to use the `"exact?"` option.
80 var pattern = frak.pattern(strings, {"exact?": true})
81 jQuery("#input").text(strings.join(" "));
82 jQuery("#output").text(pattern);
83 </script>
84</body>
85</html>
86```
87
88For even more fun try it with [AngularJS](http://angularjs.org/)!
89
90## How?
91
92A frak pattern is constructed from a trie of characters and a
93renderer which processes it. As characters are added to the trie, data
94such as such as which characters are terminal are stored in it's
95branches.
96
97During the rendering process frak analyzes each branch and attempts to
98emit the most concise regular expression possible. Additional post
99operations are applied after rendering to improve the expression where
100possible.
101
102## Why?
103
104[Here's](https://github.com/guns/vim-clojure-static/blob/249328ee659190babe2b14cd119f972b21b80538/syntax/clojure.vim#L91-L92)
105why. Also because.
106
107## And now for something completely different
108
109Let's build a regular expression for matching any word in
110`/usr/share/dict/words`.
111
112```clojure
113user> (require '[clojure.java.io :as io])
114nil
115user> (def words
116 (-> (io/file "/usr/share/dict/words")
117 io/reader
118 line-seq))
119#'user/words
120user> (def word-re (frak/pattern words))
121#'user/word-re
122user> (every? #(re-matches word-re %) words)
123true
124```
125
126The last two operations will take a moment since there are over
127235,000 words to consider.
128
129You can view the full expression
130[here](https://gist.github.com/noprompt/6106573/raw/fcb683834bb2e171618ca91bf0b234014b5b957d/word-re.clj)
131(it's approximately `1.5M`!).
132
133## Benchmarks
134
135```clojure
136(use 'criterium.core)
137
138(def words
139 (-> (io/file "/usr/share/dict/words")
140 io/reader
141 line-seq))
142
143(defn naive-pattern
144 "Create a naive regular expression pattern for matching every string
145 in strs."
146 [strs]
147 (->> strs
148 (clojure.string/join "|")
149 (format "(?:%s)")
150 re-pattern))
151
152;; Shuffle 10000 words and build a naive and frak pattern from them.
153(def ws (shuffle (take 10000 words)))
154
155(def n-pat (naive-pattern ws))
156(def f-pat (frak/pattern ws))
157
158;; Verify the naive pattern matches everything it was constructed from.
159(every? #(re-matches n-pat %) ws)
160;; => true
161
162;; Shuffle the words again since the naive pattern is built in the
163;; same order as it's inputs.
164(def ws' (shuffle ws))
165
166;;;; Benchmarks
167
168;; Naive pattern
169
170(bench (doseq [w ws'] (re-matches n-pat w)))
171;; Execution time mean : 1.499489 sec
172;; Execution time std-deviation : 181.365166 ms
173;; Execution time lower quantile : 1.337817 sec ( 2.5%)
174;; Execution time upper quantile : 1.828733 sec (97.5%)
175
176;; frak pattern
177
178(bench (doseq [w ws'] (re-matches f-pat w)))
179;; Execution time mean : 155.515855 ms
180;; Execution time std-deviation : 5.663346 ms
181;; Execution time lower quantile : 148.168855 ms ( 2.5%)
182;; Execution time upper quantile : 164.164294 ms (97.5%)
183```