Commit | Line | Data |
---|---|---|
fef2a3db JH |
1 | # frak |
2 | ||
3 | frak transforms collections of strings into regular expressions for | |
adbcb1dc JH |
4 | matching those strings. The primary goal of this library is to |
5 | generate regular expressions from a known set of inputs which avoid | |
6 | backtracking as much as possible. | |
fef2a3db | 7 | |
84e74fb3 JH |
8 | ## "Installation" |
9 | ||
10 | Add frak as a dependency to your `project.clj` file. | |
11 | ||
12 | ```clojure | |
ddfeefe3 | 13 | [frak "0.1.1"] |
84e74fb3 JH |
14 | ``` |
15 | ||
fef2a3db JH |
16 | ## Usage |
17 | ||
18 | ```clojure | |
19 | user> (require 'frak) | |
20 | nil | |
21 | user> (frak/pattern ["foo" "bar" "baz" "quux"]) | |
22 | #"(?:ba(?:r|z)|foo|quux)" | |
aa0a57a0 JH |
23 | user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"]) |
24 | #"Cloj(?:ure(?:Script)?|ars)" | |
fef2a3db | 25 | ``` |
adbcb1dc JH |
26 | |
27 | ## How? | |
28 | ||
29 | A frak pattern is constructed from a very stupid trie of characters. | |
3b2ac58d JH |
30 | As characters are added to it, meta data is stored in it's branches. |
31 | The meta data contains information such as which branches are terminal | |
32 | and a record of characters which have "visited" the branch. | |
adbcb1dc JH |
33 | |
34 | During the rendering process frak will prefer branch characters that | |
35 | have "visited" the most. In the example above, you will notice the | |
36 | `ba(?:r|z)` branch takes precedence over `foo` even though `"foo"` was | |
37 | the first to enter the trie. This is because the character `\b` has | |
38 | frequented the branch more than `\f` and `\q`. The example below | |
39 | illustrates this behavior on the second character of each input. | |
40 | ||
41 | ```clojure | |
42 | user> (frak/pattern ["bit" "bat" "ban" "bot" "bar" "box"]) | |
43 | #"b(?:a(?:t|n|r)|o(?:t|x)|it)" | |
44 | ``` | |
45 | ||
46 | ## Why? | |
47 | ||
48 | [Here's](https://github.com/guns/vim-clojure-static/blob/249328ee659190babe2b14cd119f972b21b80538/syntax/clojure.vim#L91-L92) | |
49 | why. Also because. | |
50 | ||
a7041588 JH |
51 | ## And now for something completely different |
52 | ||
adbcb1dc | 53 | Let's build a regular expression for matching any word in |
a7041588 | 54 | `/usr/share/dict/words`. |
adbcb1dc JH |
55 | |
56 | ```clojure | |
57 | user> (require '[clojure.java.io :as io]) | |
58 | nil | |
59 | user> (def words | |
60 | (-> (io/file "/usr/share/dict/words") | |
61 | io/reader | |
62 | line-seq)) | |
63 | #'user/words | |
64 | user> (def word-re (frak/pattern words)) | |
65 | #'user/word-re | |
66 | user> (every? #(re-matches word-re %) words) | |
67 | true | |
68 | ``` | |
69 | ||
70 | You can view the full expression | |
71 | [here](https://gist.github.com/noprompt/6106573/raw/fcb683834bb2e171618ca91bf0b234014b5b957d/word-re.clj) | |
72 | (it's approximately `1.5M`!). |