Start
[exim.git] / doc / doc-txt / README.SIEVE
CommitLineData
495ae4b0
PH
1$Cambridge: exim/doc/doc-txt/README.SIEVE,v 1.1 2004/10/07 15:04:35 ph10 Exp $
2
3 Notes on the Sieve implementation for Exim
4
5Exim Filter Versus Sieve Filter
6
7Exim supports two incompatible filters: The traditional Exim filter and
8the Sieve filter. Since Sieve is a extensible language, it is important
9to understand "Sieve" in this context as "the specific implementation
10of Sieve for Exim".
11
12The Exim filter contains more features, such as variable expansion, and
13better integration with the host environment, like external processes
14and pipes.
15
16Sieve is a standard for interoperable filters, defined in RFC 3028,
17with multiple implementations around. If interoperability is important,
18then there is no way around it.
19
20
21Exim Implementation
22
23The Exim Sieve implementation offers the core as defined by RFC 3028, the
24"envelope" (RFC 3028), the "fileinto" (RFC 3028), the "copy" (RFC 3894)
25and the "vacation" (draft-showalter-sieve-vacation-05.txt) extension,
26the "i;ascii-numeric" comparator, but not the "reject" extension.
27Exim does not support MDMs, so adding it just to the sieve filter makes
28little sense.
29
30The Sieve filter is integrated in Exim and works very similar to the
31Exim filter: Sieve scripts are recognized by the first line containing
32"# sieve filter". When using "keep" or "fileinto" to save a mail into a
33folder, the resulting string is available as the variable $address_file
34in the transport that stores it. A suitable transport could be:
35
36localuser:
37 driver = appendfile
38 file = ${if eq{$address_file}{inbox} \
39 {/var/mail/$local_part} \
40 {${if eq{${substr_0_1:$address_file}}{/} \
41 {$address_file} \
42 {$home/$address_file} \
43 }} \
44 }
45 delivery_date_add
46 envelope_to_add
47 return_path_add
48 mode = 0600
49
50Absolute files are stored where specified, relative files are stored
51relative to $home and "inbox" goes to the standard mailbox location.
52
53To enable "vacation", set sieve_vacation_directory for the router to
54the directory where vacation databases are held (don't put anything
55else in that directory) and point reply_transport to an autoreply
56transport.
57
58
59RFC Compliance
60
61Exim requires the first line to be "# sieve filter". Of course the RFC
62does not enforce that line. Don't expect examples to work without adding
63it, though.
64
65RFC 3028 requires using CRLF to terminate the end of a line.
66The rationale was that CRLF is universally used in network protocols
67to mark the end of the line. This implementation does not embed Sieve
68in a network protocol, but uses Sieve scripts as part of the Exim MTA.
69Since all parts of Exim use \n as newline character, this implementation
70does, too. You can change this by defining the macro RFC_EOL at compile
71time to enforce CRLF being used.
72
73Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so
74this implementation repeats this violation to stay consistent with Exim.
75This is in preparation to UTF-8 data.
76
77Sieve scripts can not contain NUL characters in strings, but mail
78headers could contain MIME encoded NUL characters, which could never
79be matched by Sieve scripts using exact comparisons. For that reason,
80this implementation extends the Sieve quoted string syntax with \0
81to describe a NUL character, violating \0 being the same as 0 in
82RFC 3028. Even without using \0, the following tests are all true in
83this implementation. Implementations that use C-style strings will only
84evaulate the first test as true.
85
86Subject: =?iso-8859-1?q?abc=00def
87
88header :contains "Subject" ["abc"]
89header :contains "Subject" ["def"]
90header :matches "Subject" ["abc?def"]
91
92Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted
93in a way that NUL characters truncating strings is allowed for Sieve
94implementations, although not recommended. It is further allowed to use
95encoded NUL characters in headers, but that's not recommended either.
96The above example shows why. Good code should still be able to deal
97with it.
98
99RFC 3028 states that if an implementation fails to convert a character
100set to UTF-8, two strings can not be equal if one contains octects greater
101than 127. Assuming that all unknown character sets are one-byte character
102sets with the lower 128 octects being US-ASCII is not sound, so this
103implementation violates RFC 3028 and treats such MIME words literally.
104That way at least something could be matched.
105
106The folder specified by "fileinto" must not contain the character
107sequence ".." to avoid security problems. RFC 3028 does not specifiy the
108syntax of folders apart from keep being equivalent to fileinto "INBOX".
109This implementation uses "inbox" instead.
110
111Sieve script errors currently cause that messages are silently filed into
112"inbox". RFC 3028 requires that the user is notified of that condition.
113This may be implemented in future by adding a header line to mails that
114are filed into "inbox" due to an error in the filter.
115
116
117Strings Containing Header Names
118
119RFC 3028 does not specify what happens if a string denoting a header
120field does not contain a valid header name, e.g. it contains a colon.
121This implementation generates an error instead of ignoring the header
122field in order to ease script debugging, which fits in the common
123picture of Sieve.
124
125
126Header Test With Invalid MIME Encoding In Header
127
128Some MUAs process invalid base64 encoded data, generating junk.
129Others ignore junk after seeing an equal sign in base64 encoded data.
130RFC 2047 does not specify how to react in this case, other than stating
131that a client must not forbid to process a message for that reason.
132RFC 2045 specifies that invalid data should be ignored (appearantly
133looking at end of line characters). It also specifies that invalid data
134may lead to rejecting messages containing them (and there it appears to
135talk about true encoding violations), which is a clear contradiction to
136ignoring them.
137
138RFC 3028 does not specify how to process incorrect MIME words.
139This implementation treats them literally, as it does if the word is
140correct, but its character set can not be converted to UTF-8.
141
142
143Address Test For Multiple Addresses Per Header
144
145A header may contain multiple addresses. RFC 3028 does not explicitly
146specify how to deal with them, but since the "address" test checks if
147anything matches anything else, matching one address suffices to
148satify the condition. That makes it impossible to test if a header
149contains a certain set of addresses and no more, but it is more logical
150than letting the test fail if the header contains an additional address
151besides the one the test checks for.
152
153
154Semantics Of Keep
155
156The keep command is equivalent to fileinto "inbox": It saves the
157message and resets the implicit keep flag. It does not set the
158implicit keep flag; there is no command to set it once it has
159been reset.
160
161
162Semantics of Fileinto
163
164RFC 3028 does not specify if "fileinto" tries to create a mail folder,
165in case it does not exist. This implementation allows to configure
166that aspect using the appendfile transport options "create_directory",
167"create_file" and "file_must_exist". See the appendfile transport in
168the Exim specification for details.
169
170
171Semantics of Redirect
172
173Sieve scripts are supposed to be interoperable between servers, so this
174implementation does not allow redirecting mail to unqualified addresses,
175because the domain would depend on the used system and on systems with
176virtual mail domains it is probably not what the user expects it to be.
177
178
179String Arguments
180
181There has been confusion if the string arguments to "require" are to be
182matched case-sensitive or not. This implementation matches them with
183the match type ":is" (default, see section 2.7.1) and the comparator
184"i;ascii-casemap" (default, see section 2.7.3). The RFC defines the
185command defaults clearly, so any different implementations violate RFC
1863028. The same is valid for comparator names, also specified as strings.
187
188
189Number Units
190
191There is a mistake in RFC 3028: The suffix G denotes gibi-, not tebibyte.
192The mistake os obvious, because RFC 3028 specifies G to denote 2^30
193(which is gibi, not tebi), and that's what this implementation uses as
194scaling factor for the suffix G.
195
196
197Sieve Syntax and Semantics
198
199RFC 3028 confuses syntax and semantics sometimes. It uses a generic
200grammar as syntax for actions and tests and performs many checks during
201semantic analysis. Syntax is specified as grammar rule, semantics
202with natural language, despire the latter often talking about syntax.
203The intention was to provide a framework for the syntax that describes
204current commands as well as future extensions, and describing commands
205by semantics. Since the semantic analysis is not specified by formal
206rules, it is easy to get that phase wrong, as demonstrated by the mistake
207in RFC 3028 to forbid "elsif" being followed by "elsif" (which is allowed
208in Sieve, it's just not specified correctly).
209
210RFC 3028 does not define if semantic checks are strict (always treat
211unknown extensions as errors) or lazy (treat unknown extensions as error,
212if they are executed), and since it employs a very generic grammar,
213it is not unreasonable for an implementation using a parser for the
214generic grammar to indeed process scripts that contain unknown commands
215in dead code. It is just required to treat disabled but known extensions
216the same as unknown extensions.
217
218The following suggestion for section 8.2 gives two grammars, one for
219the framework, and one for specific commands, thus removing most of the
220semantic analysis. Since the parser can not parse unsupported extensions,
221the result is strict error checking. As required in section 2.10.5, known
222but not enabled extensions must behave the same as unknown extensions,
223so those also result strictly in errors (though at the thin semantic
224layer), even if they can be parsed fine.
225
2268.2. Grammar
227
228The atoms of the grammar are lexical tokens. White space or comments may
229appear anywhere between lexical tokens, they are not part of the grammar.
230The grammar is specified in ABNF with two extensions to describe tagged
231arguments that can be reordered and grammar extensions: { } denotes a
232sequence of symbols that may appear in any order. Example:
233
234 start = { a b c }
235
236is equivalent to:
237
238 start = ( a b c ) / ( a c b ) / ( b a c ) / ( b c a ) / ( c a b ) / ( c b a )
239
240The symbol =) is used to append to a rule:
241
242 start = a
243 start =) b
244
245is equivalent to
246
247 start = a b
248
249All Sieve commands, including extensions, MUST be words of the following
250generic grammar with the start symbol "start". They SHOULD be specified
251using a specific grammar, though.
252
253 argument = string-list / number / tag
254 arguments = *argument [test / test-list]
255 block = "{" commands "}"
256 commands = *command
257 string = quoted-string / multi-line
258 string-list = "[" string *("," string) "]" / string
259 test = identifier arguments
260 test-list = "(" test *("," test) ")"
261 command = identifier arguments ( ";" / block )
262 start = command
263
264The basic Sieve commands are specified using the following grammar, which
265language is a subset of the generic grammar above. The start symbol is
266"start".
267
268 address-part = ":localpart" / ":domain" / ":all"
269 comparator = ":comparator" string
270 match-type = ":is" / ":contains" / ":matches"
271 string = quoted-string / multi-line
272 string-list = "[" string *("," string) "]" / string
273 address-test = "address" { [address-part] [comparator] [match-type] }
274 string-list string-list
275 test-list = "(" test *("," test) ")"
276 allof-test = "allof" test-list
277 anyof-test = "anyof" test-list
278 exists-test = "exists" string-list
279 false-test = "false"
280 true=test = "true"
281 header-test = "header" { [comparator] [match-type] }
282 string-list string-list
283 not-test = "not" test
284 relop = ":over" / ":under"
285 size-test = "size" relop number
286 block = "{" commands "}"
287 if-command = "if" test block *( "elsif" test block ) [ "else" block ]
288 stop-command = "stop" { stop-options } ";"
289 stop-options =
290 keep-command = "keep" { keep-options } ";"
291 keep-options =
292 discard-command = "discard" { discard-options } ";"
293 discard-options =
294 redirect-command = "redirect" { redirect-options } string ";"
295 redirect-options =
296 require-command = "require" { require-options } string-list ";"
297 require-options =
298 test = address-test / allof-test / anyof-test / exists-test
299 / false-test / true-test / header-test / not-test
300 / size-test
301 command = if-command / stop-command / keep-command
302 / discard-command / redirect-command
303 commands = *command
304 start = *require-command commands
305
306The extensions "envelope" and "fileinto" are specified using the following
307grammar extension.
308
309 envelope-test = "envelope" { [comparator] [address-part] [match-type] }
310 string-list string-list
311 test =/ envelope-test
312
313 fileinto-command = "fileinto" { fileinto-options } string ";"
314 fileinto-options =
315 command =/ fileinto-command
316
317The extension "copy" is specified as:
318
319 fileinto-options =) ":copy"
320 redirect-options =) ":copy"
321
322
323The i;ascii-numeric Comparator
324
325RFC 2244 describes this comparator and specifies that non-numeric strings
326are considered equal with an ordinal value higher than any numeric string.
327Although not stated explicitly, this includes the empty string. A range
328of at least 2^31 is required. This implementation does not limit the
329range, because it does not convert numbers to binary representation
330before comparing them.
331
332
333The vacation extension
334
335The extension "vacation" is specified using the following grammar
336extension.
337
338 vacation-command = "vacation" { vacation-options } <reason: string>
339 vacation-options = [":days" number]
340 [":addresses" string-list]
341 [":subject" string]
342 [":mime"]
343 command =/ vacation-command
344
345
346Semantics Of ":mime"
347
348RFC 3028 does not specify how strings using MIME parts are used to compose
349messages. The vacation draft refers to RFC 3028 and does not specify it
350either. As a result, different implementations generate different mails.
351The Exim Sieve implementation splits the reason into header and body.
352It adds the header to the mail header and uses the body as mail body.
353Be aware, that other imlementations compose a multipart structure with
354the reason as only part. Both conform to the specification (or lack
355thereof).
356
357
358Semantics Of Not Using ":mime"
359
360Sieve scripts are written in UTF-8, so is the reason string in this
361case. This implementation adds MIME headers to indicate that. This
362is not required by the vacation draft, which does not specify how
363the UTF-8 reason is processed to compose the resulting message.
364
365
366Envelope Sender
367
368The vacation draft does not specify the envelope sender. This
369implementation uses the empty envelope sender to prevent mail loops.
370
371
372Default Subject
373
374The draft specifies that the default message subject is "Re: "
375plus the old subject, stripped by any leading "Re: " strings.
376This string is to be taken literally, unlike some software which
377matches a regular expression like "[rR][eE]: *". Using this
378subject is dangerous, because many mailing lists verify addresses
379by sending a secret key in the subject of a message, asking to
380reply to the message for confirmation. Using the default vacation
381subject confirms any subscription request of this kind, allowing
382to subscribe a third party to any mailing list, either to annoy
383the user or to declare spam as legitimate mail by proving to
384use opt-in. The draft specifies to use "Re: " in front of the
385subject, but this implementation uses "Auto: ", as suggested in
386the current draft concerning automatic mail responses.
387
388
389Rate Limiting Responses
390
391The draft says:
392
393 Vacation responses are not just per address, but are per address
394 per vacation command.
395
396This is badly worded, because commands are not enumerated. It meant
397to say:
398
399 Vacation responses are not just per address, but are per address
400 per reason string and per specified subject and ":mime" option.
401
402Existing implementations work that way and it makes more sense, too.
403Including the ":mime" option is mostly for correctness, as the reason
404strings with and without this option are rarely equal.
405
406This implementation hashes the reason, specified subject and ":mime"
407option and uses the hex string representation as filename within the
408"sieve_vacation_directory" to store the recipient addresses for this
409vacation parameter set.
410
411The draft specifies that sites may define a minimum ":days" value than 1.
412This implementation uses 1. The maximum value MUST greater than 7,
413and SHOULD be greater than 30. This implementation uses a maximum of 31.
414
415Vacation recipient address databases older than 31 days are automatically
416removed. Users do not have to remove them manually when modifying their
417scripts. Don't put anything but vacation databases in that directory
418or you risk that it will be removed, too!
419
420
421Global Reply Address Blacklist
422
423The draft requires that each implementation offers a global black list
424of addresses that will never be replied to. Exim offers this as option
425"never_mail" in the autoreply transport.
426
427
428Interaction With Other Sieve Elements
429
430The draft describes the interaction with vacation, discard, keep,
431fileinto and redirect. It MUST describe compatibility with other
432actions, but doesn't. In this implementation, vacation is compatible
433with any other action.