Commit | Line | Data |
---|---|---|
f656d135 | 1 | $Cambridge: exim/doc/doc-txt/README.SIEVE,v 1.4 2005/05/03 10:02:27 ph10 Exp $ |
495ae4b0 PH |
2 | |
3 | Notes on the Sieve implementation for Exim | |
4 | ||
5 | Exim Filter Versus Sieve Filter | |
6 | ||
7 | Exim supports two incompatible filters: The traditional Exim filter and | |
8 | the Sieve filter. Since Sieve is a extensible language, it is important | |
9 | to understand "Sieve" in this context as "the specific implementation | |
10 | of Sieve for Exim". | |
11 | ||
12 | The Exim filter contains more features, such as variable expansion, and | |
13 | better integration with the host environment, like external processes | |
14 | and pipes. | |
15 | ||
16 | Sieve is a standard for interoperable filters, defined in RFC 3028, | |
17 | with multiple implementations around. If interoperability is important, | |
18 | then there is no way around it. | |
19 | ||
20 | ||
21 | Exim Implementation | |
22 | ||
23 | The Exim Sieve implementation offers the core as defined by RFC 3028, the | |
24 | "envelope" (RFC 3028), the "fileinto" (RFC 3028), the "copy" (RFC 3894) | |
f656d135 | 25 | and the "vacation" (draft-ietf-sieve-vacation-01.txt) extension, |
495ae4b0 PH |
26 | the "i;ascii-numeric" comparator, but not the "reject" extension. |
27 | Exim does not support MDMs, so adding it just to the sieve filter makes | |
28 | little sense. | |
29 | ||
30 | The Sieve filter is integrated in Exim and works very similar to the | |
31 | Exim filter: Sieve scripts are recognized by the first line containing | |
32 | "# sieve filter". When using "keep" or "fileinto" to save a mail into a | |
33 | folder, the resulting string is available as the variable $address_file | |
34 | in the transport that stores it. A suitable transport could be: | |
35 | ||
36 | localuser: | |
37 | driver = appendfile | |
38 | file = ${if eq{$address_file}{inbox} \ | |
39 | {/var/mail/$local_part} \ | |
40 | {${if eq{${substr_0_1:$address_file}}{/} \ | |
41 | {$address_file} \ | |
42 | {$home/$address_file} \ | |
43 | }} \ | |
44 | } | |
45 | delivery_date_add | |
46 | envelope_to_add | |
47 | return_path_add | |
48 | mode = 0600 | |
49 | ||
50 | Absolute files are stored where specified, relative files are stored | |
51 | relative to $home and "inbox" goes to the standard mailbox location. | |
52 | ||
53 | To enable "vacation", set sieve_vacation_directory for the router to | |
54 | the directory where vacation databases are held (don't put anything | |
55 | else in that directory) and point reply_transport to an autoreply | |
56 | transport. | |
57 | ||
58 | ||
59 | RFC Compliance | |
60 | ||
61 | Exim requires the first line to be "# sieve filter". Of course the RFC | |
62 | does not enforce that line. Don't expect examples to work without adding | |
63 | it, though. | |
64 | ||
65 | RFC 3028 requires using CRLF to terminate the end of a line. | |
66 | The rationale was that CRLF is universally used in network protocols | |
67 | to mark the end of the line. This implementation does not embed Sieve | |
68 | in a network protocol, but uses Sieve scripts as part of the Exim MTA. | |
69 | Since all parts of Exim use \n as newline character, this implementation | |
70 | does, too. You can change this by defining the macro RFC_EOL at compile | |
71 | time to enforce CRLF being used. | |
72 | ||
73 | Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so | |
74 | this implementation repeats this violation to stay consistent with Exim. | |
75 | This is in preparation to UTF-8 data. | |
76 | ||
77 | Sieve scripts can not contain NUL characters in strings, but mail | |
78 | headers could contain MIME encoded NUL characters, which could never | |
79 | be matched by Sieve scripts using exact comparisons. For that reason, | |
80 | this implementation extends the Sieve quoted string syntax with \0 | |
81 | to describe a NUL character, violating \0 being the same as 0 in | |
82 | RFC 3028. Even without using \0, the following tests are all true in | |
83 | this implementation. Implementations that use C-style strings will only | |
84 | evaulate the first test as true. | |
85 | ||
86 | Subject: =?iso-8859-1?q?abc=00def | |
87 | ||
88 | header :contains "Subject" ["abc"] | |
89 | header :contains "Subject" ["def"] | |
90 | header :matches "Subject" ["abc?def"] | |
91 | ||
92 | Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted | |
93 | in a way that NUL characters truncating strings is allowed for Sieve | |
94 | implementations, although not recommended. It is further allowed to use | |
95 | encoded NUL characters in headers, but that's not recommended either. | |
96 | The above example shows why. Good code should still be able to deal | |
97 | with it. | |
98 | ||
99 | RFC 3028 states that if an implementation fails to convert a character | |
100 | set to UTF-8, two strings can not be equal if one contains octects greater | |
101 | than 127. Assuming that all unknown character sets are one-byte character | |
102 | sets with the lower 128 octects being US-ASCII is not sound, so this | |
103 | implementation violates RFC 3028 and treats such MIME words literally. | |
104 | That way at least something could be matched. | |
105 | ||
106 | The folder specified by "fileinto" must not contain the character | |
107 | sequence ".." to avoid security problems. RFC 3028 does not specifiy the | |
108 | syntax of folders apart from keep being equivalent to fileinto "INBOX". | |
109 | This implementation uses "inbox" instead. | |
110 | ||
111 | Sieve script errors currently cause that messages are silently filed into | |
112 | "inbox". RFC 3028 requires that the user is notified of that condition. | |
113 | This may be implemented in future by adding a header line to mails that | |
114 | are filed into "inbox" due to an error in the filter. | |
115 | ||
116 | ||
d1d97a76 | 117 | Strings Containing Header Names Or Envelope Elements |
495ae4b0 PH |
118 | |
119 | RFC 3028 does not specify what happens if a string denoting a header | |
d1d97a76 PH |
120 | field or envelope element does not contain a valid name, e.g. it |
121 | contains a colon for a header or it is not "from" or "to" for envelopes. | |
495ae4b0 | 122 | This implementation generates an error instead of ignoring the header |
d1d97a76 PH |
123 | field in order to ease script debugging, which fits in the common picture |
124 | of Sieve. | |
495ae4b0 PH |
125 | |
126 | ||
127 | Header Test With Invalid MIME Encoding In Header | |
128 | ||
129 | Some MUAs process invalid base64 encoded data, generating junk. | |
130 | Others ignore junk after seeing an equal sign in base64 encoded data. | |
131 | RFC 2047 does not specify how to react in this case, other than stating | |
132 | that a client must not forbid to process a message for that reason. | |
133 | RFC 2045 specifies that invalid data should be ignored (appearantly | |
134 | looking at end of line characters). It also specifies that invalid data | |
135 | may lead to rejecting messages containing them (and there it appears to | |
136 | talk about true encoding violations), which is a clear contradiction to | |
137 | ignoring them. | |
138 | ||
139 | RFC 3028 does not specify how to process incorrect MIME words. | |
140 | This implementation treats them literally, as it does if the word is | |
141 | correct, but its character set can not be converted to UTF-8. | |
142 | ||
143 | ||
144 | Address Test For Multiple Addresses Per Header | |
145 | ||
146 | A header may contain multiple addresses. RFC 3028 does not explicitly | |
147 | specify how to deal with them, but since the "address" test checks if | |
148 | anything matches anything else, matching one address suffices to | |
149 | satify the condition. That makes it impossible to test if a header | |
150 | contains a certain set of addresses and no more, but it is more logical | |
151 | than letting the test fail if the header contains an additional address | |
152 | besides the one the test checks for. | |
153 | ||
154 | ||
155 | Semantics Of Keep | |
156 | ||
157 | The keep command is equivalent to fileinto "inbox": It saves the | |
158 | message and resets the implicit keep flag. It does not set the | |
159 | implicit keep flag; there is no command to set it once it has | |
160 | been reset. | |
161 | ||
162 | ||
163 | Semantics of Fileinto | |
164 | ||
165 | RFC 3028 does not specify if "fileinto" tries to create a mail folder, | |
166 | in case it does not exist. This implementation allows to configure | |
167 | that aspect using the appendfile transport options "create_directory", | |
168 | "create_file" and "file_must_exist". See the appendfile transport in | |
169 | the Exim specification for details. | |
170 | ||
171 | ||
172 | Semantics of Redirect | |
173 | ||
174 | Sieve scripts are supposed to be interoperable between servers, so this | |
175 | implementation does not allow redirecting mail to unqualified addresses, | |
176 | because the domain would depend on the used system and on systems with | |
177 | virtual mail domains it is probably not what the user expects it to be. | |
178 | ||
179 | ||
180 | String Arguments | |
181 | ||
182 | There has been confusion if the string arguments to "require" are to be | |
183 | matched case-sensitive or not. This implementation matches them with | |
184 | the match type ":is" (default, see section 2.7.1) and the comparator | |
185 | "i;ascii-casemap" (default, see section 2.7.3). The RFC defines the | |
186 | command defaults clearly, so any different implementations violate RFC | |
187 | 3028. The same is valid for comparator names, also specified as strings. | |
188 | ||
189 | ||
190 | Number Units | |
191 | ||
192 | There is a mistake in RFC 3028: The suffix G denotes gibi-, not tebibyte. | |
193 | The mistake os obvious, because RFC 3028 specifies G to denote 2^30 | |
194 | (which is gibi, not tebi), and that's what this implementation uses as | |
195 | scaling factor for the suffix G. | |
196 | ||
197 | ||
198 | Sieve Syntax and Semantics | |
199 | ||
200 | RFC 3028 confuses syntax and semantics sometimes. It uses a generic | |
201 | grammar as syntax for actions and tests and performs many checks during | |
202 | semantic analysis. Syntax is specified as grammar rule, semantics | |
203 | with natural language, despire the latter often talking about syntax. | |
204 | The intention was to provide a framework for the syntax that describes | |
205 | current commands as well as future extensions, and describing commands | |
206 | by semantics. Since the semantic analysis is not specified by formal | |
207 | rules, it is easy to get that phase wrong, as demonstrated by the mistake | |
208 | in RFC 3028 to forbid "elsif" being followed by "elsif" (which is allowed | |
209 | in Sieve, it's just not specified correctly). | |
210 | ||
211 | RFC 3028 does not define if semantic checks are strict (always treat | |
212 | unknown extensions as errors) or lazy (treat unknown extensions as error, | |
213 | if they are executed), and since it employs a very generic grammar, | |
214 | it is not unreasonable for an implementation using a parser for the | |
215 | generic grammar to indeed process scripts that contain unknown commands | |
216 | in dead code. It is just required to treat disabled but known extensions | |
217 | the same as unknown extensions. | |
218 | ||
219 | The following suggestion for section 8.2 gives two grammars, one for | |
220 | the framework, and one for specific commands, thus removing most of the | |
221 | semantic analysis. Since the parser can not parse unsupported extensions, | |
222 | the result is strict error checking. As required in section 2.10.5, known | |
223 | but not enabled extensions must behave the same as unknown extensions, | |
224 | so those also result strictly in errors (though at the thin semantic | |
225 | layer), even if they can be parsed fine. | |
226 | ||
227 | 8.2. Grammar | |
228 | ||
229 | The atoms of the grammar are lexical tokens. White space or comments may | |
230 | appear anywhere between lexical tokens, they are not part of the grammar. | |
231 | The grammar is specified in ABNF with two extensions to describe tagged | |
232 | arguments that can be reordered and grammar extensions: { } denotes a | |
233 | sequence of symbols that may appear in any order. Example: | |
234 | ||
235 | start = { a b c } | |
236 | ||
237 | is equivalent to: | |
238 | ||
239 | start = ( a b c ) / ( a c b ) / ( b a c ) / ( b c a ) / ( c a b ) / ( c b a ) | |
240 | ||
241 | The symbol =) is used to append to a rule: | |
242 | ||
243 | start = a | |
244 | start =) b | |
245 | ||
246 | is equivalent to | |
247 | ||
248 | start = a b | |
249 | ||
250 | All Sieve commands, including extensions, MUST be words of the following | |
251 | generic grammar with the start symbol "start". They SHOULD be specified | |
252 | using a specific grammar, though. | |
253 | ||
254 | argument = string-list / number / tag | |
255 | arguments = *argument [test / test-list] | |
256 | block = "{" commands "}" | |
257 | commands = *command | |
258 | string = quoted-string / multi-line | |
259 | string-list = "[" string *("," string) "]" / string | |
260 | test = identifier arguments | |
261 | test-list = "(" test *("," test) ")" | |
262 | command = identifier arguments ( ";" / block ) | |
263 | start = command | |
264 | ||
265 | The basic Sieve commands are specified using the following grammar, which | |
266 | language is a subset of the generic grammar above. The start symbol is | |
267 | "start". | |
268 | ||
269 | address-part = ":localpart" / ":domain" / ":all" | |
270 | comparator = ":comparator" string | |
271 | match-type = ":is" / ":contains" / ":matches" | |
272 | string = quoted-string / multi-line | |
273 | string-list = "[" string *("," string) "]" / string | |
274 | address-test = "address" { [address-part] [comparator] [match-type] } | |
275 | string-list string-list | |
276 | test-list = "(" test *("," test) ")" | |
277 | allof-test = "allof" test-list | |
278 | anyof-test = "anyof" test-list | |
279 | exists-test = "exists" string-list | |
280 | false-test = "false" | |
281 | true=test = "true" | |
282 | header-test = "header" { [comparator] [match-type] } | |
283 | string-list string-list | |
284 | not-test = "not" test | |
285 | relop = ":over" / ":under" | |
286 | size-test = "size" relop number | |
287 | block = "{" commands "}" | |
288 | if-command = "if" test block *( "elsif" test block ) [ "else" block ] | |
289 | stop-command = "stop" { stop-options } ";" | |
290 | stop-options = | |
291 | keep-command = "keep" { keep-options } ";" | |
292 | keep-options = | |
293 | discard-command = "discard" { discard-options } ";" | |
294 | discard-options = | |
295 | redirect-command = "redirect" { redirect-options } string ";" | |
296 | redirect-options = | |
297 | require-command = "require" { require-options } string-list ";" | |
298 | require-options = | |
299 | test = address-test / allof-test / anyof-test / exists-test | |
300 | / false-test / true-test / header-test / not-test | |
301 | / size-test | |
302 | command = if-command / stop-command / keep-command | |
303 | / discard-command / redirect-command | |
304 | commands = *command | |
305 | start = *require-command commands | |
306 | ||
307 | The extensions "envelope" and "fileinto" are specified using the following | |
308 | grammar extension. | |
309 | ||
310 | envelope-test = "envelope" { [comparator] [address-part] [match-type] } | |
311 | string-list string-list | |
312 | test =/ envelope-test | |
313 | ||
314 | fileinto-command = "fileinto" { fileinto-options } string ";" | |
315 | fileinto-options = | |
316 | command =/ fileinto-command | |
317 | ||
318 | The extension "copy" is specified as: | |
319 | ||
320 | fileinto-options =) ":copy" | |
321 | redirect-options =) ":copy" | |
322 | ||
323 | ||
324 | The i;ascii-numeric Comparator | |
325 | ||
326 | RFC 2244 describes this comparator and specifies that non-numeric strings | |
327 | are considered equal with an ordinal value higher than any numeric string. | |
328 | Although not stated explicitly, this includes the empty string. A range | |
329 | of at least 2^31 is required. This implementation does not limit the | |
330 | range, because it does not convert numbers to binary representation | |
331 | before comparing them. | |
332 | ||
333 | ||
334 | The vacation extension | |
335 | ||
336 | The extension "vacation" is specified using the following grammar | |
337 | extension. | |
338 | ||
339 | vacation-command = "vacation" { vacation-options } <reason: string> | |
340 | vacation-options = [":days" number] | |
495ae4b0 | 341 | [":subject" string] |
f656d135 PH |
342 | [":from" string] |
343 | [":addresses" string-list] | |
495ae4b0 | 344 | [":mime"] |
f656d135 | 345 | [":handle" string] |
495ae4b0 PH |
346 | command =/ vacation-command |
347 | ||
348 | ||
349 | Semantics Of ":mime" | |
350 | ||
f656d135 PH |
351 | The draft does not specify how strings using MIME entities are used |
352 | to compose messages. As a result, different implementations generate | |
353 | different mails. The Exim Sieve implementation splits the reason into | |
354 | header and body. It adds the header to the mail header and uses the body | |
355 | as mail body. Be aware, that other imlementations compose a multipart | |
356 | structure with the reason as only part. Both conform to the specification | |
357 | (or lack thereof). | |
495ae4b0 PH |
358 | |
359 | ||
360 | Semantics Of Not Using ":mime" | |
361 | ||
362 | Sieve scripts are written in UTF-8, so is the reason string in this | |
363 | case. This implementation adds MIME headers to indicate that. This | |
364 | is not required by the vacation draft, which does not specify how | |
365 | the UTF-8 reason is processed to compose the resulting message. | |
366 | ||
367 | ||
495ae4b0 PH |
368 | Default Subject |
369 | ||
370 | The draft specifies that the default message subject is "Re: " | |
371 | plus the old subject, stripped by any leading "Re: " strings. | |
372 | This string is to be taken literally, unlike some software which | |
373 | matches a regular expression like "[rR][eE]: *". Using this | |
374 | subject is dangerous, because many mailing lists verify addresses | |
375 | by sending a secret key in the subject of a message, asking to | |
376 | reply to the message for confirmation. Using the default vacation | |
377 | subject confirms any subscription request of this kind, allowing | |
378 | to subscribe a third party to any mailing list, either to annoy | |
379 | the user or to declare spam as legitimate mail by proving to | |
380 | use opt-in. The draft specifies to use "Re: " in front of the | |
381 | subject, but this implementation uses "Auto: ", as suggested in | |
f656d135 | 382 | RFC 3834, section 3.1.5. |
495ae4b0 PH |
383 | |
384 | ||
385 | Rate Limiting Responses | |
386 | ||
f656d135 PH |
387 | In absence of a handle, this implementation hashes the reason, |
388 | ":subject" option, ":mime" option and ":from" option and uses the hex | |
389 | string representation as filename within the "sieve_vacation_directory" | |
390 | to store the recipient addresses for this vacation parameter set. | |
495ae4b0 PH |
391 | |
392 | The draft specifies that sites may define a minimum ":days" value than 1. | |
393 | This implementation uses 1. The maximum value MUST greater than 7, | |
394 | and SHOULD be greater than 30. This implementation uses a maximum of 31. | |
395 | ||
396 | Vacation recipient address databases older than 31 days are automatically | |
397 | removed. Users do not have to remove them manually when modifying their | |
398 | scripts. Don't put anything but vacation databases in that directory | |
399 | or you risk that it will be removed, too! | |
400 | ||
401 | ||
402 | Global Reply Address Blacklist | |
403 | ||
404 | The draft requires that each implementation offers a global black list | |
405 | of addresses that will never be replied to. Exim offers this as option | |
406 | "never_mail" in the autoreply transport. | |
407 | ||
408 | ||
409 | Interaction With Other Sieve Elements | |
410 | ||
411 | The draft describes the interaction with vacation, discard, keep, | |
412 | fileinto and redirect. It MUST describe compatibility with other | |
413 | actions, but doesn't. In this implementation, vacation is compatible | |
414 | with any other action. |