Minor patch to Sieve documentation.
[exim.git] / doc / doc-txt / README.SIEVE
CommitLineData
31c4e005 1$Cambridge: exim/doc/doc-txt/README.SIEVE,v 1.6 2005/07/01 10:21:45 ph10 Exp $
495ae4b0
PH
2
3 Notes on the Sieve implementation for Exim
4
5Exim Filter Versus Sieve Filter
6
7Exim supports two incompatible filters: The traditional Exim filter and
8the Sieve filter. Since Sieve is a extensible language, it is important
9to understand "Sieve" in this context as "the specific implementation
10of Sieve for Exim".
11
12The Exim filter contains more features, such as variable expansion, and
13better integration with the host environment, like external processes
14and pipes.
15
16Sieve is a standard for interoperable filters, defined in RFC 3028,
17with multiple implementations around. If interoperability is important,
18then there is no way around it.
19
20
21Exim Implementation
22
5ea81592
PH
23The Exim Sieve implementation offers the core as defined by RFC 3028bis,
24the "envelope" (RFC 3028), the "fileinto" (RFC 3028), the "copy" (RFC
253894) and the "vacation" (draft-ietf-sieve-vacation-02.txt) extension,
495ae4b0
PH
26the "i;ascii-numeric" comparator, but not the "reject" extension.
27Exim does not support MDMs, so adding it just to the sieve filter makes
28little sense.
29
30The Sieve filter is integrated in Exim and works very similar to the
31Exim filter: Sieve scripts are recognized by the first line containing
32"# sieve filter". When using "keep" or "fileinto" to save a mail into a
33folder, the resulting string is available as the variable $address_file
34in the transport that stores it. A suitable transport could be:
35
36localuser:
37 driver = appendfile
38 file = ${if eq{$address_file}{inbox} \
39 {/var/mail/$local_part} \
40 {${if eq{${substr_0_1:$address_file}}{/} \
41 {$address_file} \
42 {$home/$address_file} \
43 }} \
44 }
45 delivery_date_add
46 envelope_to_add
47 return_path_add
48 mode = 0600
49
50Absolute files are stored where specified, relative files are stored
51relative to $home and "inbox" goes to the standard mailbox location.
52
53To enable "vacation", set sieve_vacation_directory for the router to
54the directory where vacation databases are held (don't put anything
55else in that directory) and point reply_transport to an autoreply
56transport.
57
58
59RFC Compliance
60
61Exim requires the first line to be "# sieve filter". Of course the RFC
62does not enforce that line. Don't expect examples to work without adding
63it, though.
64
65RFC 3028 requires using CRLF to terminate the end of a line.
66The rationale was that CRLF is universally used in network protocols
67to mark the end of the line. This implementation does not embed Sieve
68in a network protocol, but uses Sieve scripts as part of the Exim MTA.
69Since all parts of Exim use \n as newline character, this implementation
70does, too. You can change this by defining the macro RFC_EOL at compile
71time to enforce CRLF being used.
72
73Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so
74this implementation repeats this violation to stay consistent with Exim.
75This is in preparation to UTF-8 data.
76
77Sieve scripts can not contain NUL characters in strings, but mail
78headers could contain MIME encoded NUL characters, which could never
79be matched by Sieve scripts using exact comparisons. For that reason,
80this implementation extends the Sieve quoted string syntax with \0
81to describe a NUL character, violating \0 being the same as 0 in
82RFC 3028. Even without using \0, the following tests are all true in
83this implementation. Implementations that use C-style strings will only
84evaulate the first test as true.
85
86Subject: =?iso-8859-1?q?abc=00def
87
88header :contains "Subject" ["abc"]
89header :contains "Subject" ["def"]
90header :matches "Subject" ["abc?def"]
91
92Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted
93in a way that NUL characters truncating strings is allowed for Sieve
94implementations, although not recommended. It is further allowed to use
95encoded NUL characters in headers, but that's not recommended either.
96The above example shows why. Good code should still be able to deal
97with it.
98
99RFC 3028 states that if an implementation fails to convert a character
100set to UTF-8, two strings can not be equal if one contains octects greater
101than 127. Assuming that all unknown character sets are one-byte character
102sets with the lower 128 octects being US-ASCII is not sound, so this
103implementation violates RFC 3028 and treats such MIME words literally.
104That way at least something could be matched.
105
106The folder specified by "fileinto" must not contain the character
107sequence ".." to avoid security problems. RFC 3028 does not specifiy the
108syntax of folders apart from keep being equivalent to fileinto "INBOX".
109This implementation uses "inbox" instead.
110
111Sieve script errors currently cause that messages are silently filed into
112"inbox". RFC 3028 requires that the user is notified of that condition.
113This may be implemented in future by adding a header line to mails that
114are filed into "inbox" due to an error in the filter.
115
116
d1d97a76 117Strings Containing Header Names Or Envelope Elements
495ae4b0
PH
118
119RFC 3028 does not specify what happens if a string denoting a header
d1d97a76
PH
120field or envelope element does not contain a valid name, e.g. it
121contains a colon for a header or it is not "from" or "to" for envelopes.
495ae4b0 122This implementation generates an error instead of ignoring the header
d1d97a76
PH
123field in order to ease script debugging, which fits in the common picture
124of Sieve.
495ae4b0
PH
125
126
127Header Test With Invalid MIME Encoding In Header
128
129Some MUAs process invalid base64 encoded data, generating junk.
130Others ignore junk after seeing an equal sign in base64 encoded data.
131RFC 2047 does not specify how to react in this case, other than stating
132that a client must not forbid to process a message for that reason.
133RFC 2045 specifies that invalid data should be ignored (appearantly
134looking at end of line characters). It also specifies that invalid data
135may lead to rejecting messages containing them (and there it appears to
136talk about true encoding violations), which is a clear contradiction to
137ignoring them.
138
139RFC 3028 does not specify how to process incorrect MIME words.
140This implementation treats them literally, as it does if the word is
141correct, but its character set can not be converted to UTF-8.
142
143
495ae4b0
PH
144Semantics Of Keep
145
146The keep command is equivalent to fileinto "inbox": It saves the
147message and resets the implicit keep flag. It does not set the
148implicit keep flag; there is no command to set it once it has
149been reset.
150
151
152Semantics of Fileinto
153
154RFC 3028 does not specify if "fileinto" tries to create a mail folder,
155in case it does not exist. This implementation allows to configure
156that aspect using the appendfile transport options "create_directory",
157"create_file" and "file_must_exist". See the appendfile transport in
158the Exim specification for details.
159
160
495ae4b0
PH
161String Arguments
162
163There has been confusion if the string arguments to "require" are to be
5ea81592
PH
164matched case-sensitive or not. The comparator default is case-insensitive
165comparison, but "require" does not allow to specify a comparator, so
166this default does not apply. Lacking a clear specification, matching
167the strings exactly makes most sense. The same is valid for comparator
168names, also specified as strings.
495ae4b0
PH
169
170
171Sieve Syntax and Semantics
172
173RFC 3028 confuses syntax and semantics sometimes. It uses a generic
174grammar as syntax for actions and tests and performs many checks during
175semantic analysis. Syntax is specified as grammar rule, semantics
31c4e005 176with natural language, despite the latter often talking about syntax.
495ae4b0
PH
177The intention was to provide a framework for the syntax that describes
178current commands as well as future extensions, and describing commands
31c4e005 179by semantics.
495ae4b0
PH
180
181RFC 3028 does not define if semantic checks are strict (always treat
182unknown extensions as errors) or lazy (treat unknown extensions as error,
183if they are executed), and since it employs a very generic grammar,
184it is not unreasonable for an implementation using a parser for the
185generic grammar to indeed process scripts that contain unknown commands
186in dead code. It is just required to treat disabled but known extensions
187the same as unknown extensions.
188
189The following suggestion for section 8.2 gives two grammars, one for
190the framework, and one for specific commands, thus removing most of the
191semantic analysis. Since the parser can not parse unsupported extensions,
192the result is strict error checking. As required in section 2.10.5, known
193but not enabled extensions must behave the same as unknown extensions,
194so those also result strictly in errors (though at the thin semantic
195layer), even if they can be parsed fine.
196
1978.2. Grammar
198
199The atoms of the grammar are lexical tokens. White space or comments may
200appear anywhere between lexical tokens, they are not part of the grammar.
201The grammar is specified in ABNF with two extensions to describe tagged
202arguments that can be reordered and grammar extensions: { } denotes a
203sequence of symbols that may appear in any order. Example:
204
205 start = { a b c }
206
207is equivalent to:
208
209 start = ( a b c ) / ( a c b ) / ( b a c ) / ( b c a ) / ( c a b ) / ( c b a )
210
211The symbol =) is used to append to a rule:
212
213 start = a
214 start =) b
215
216is equivalent to
217
218 start = a b
219
220All Sieve commands, including extensions, MUST be words of the following
221generic grammar with the start symbol "start". They SHOULD be specified
222using a specific grammar, though.
223
224 argument = string-list / number / tag
225 arguments = *argument [test / test-list]
226 block = "{" commands "}"
227 commands = *command
228 string = quoted-string / multi-line
229 string-list = "[" string *("," string) "]" / string
230 test = identifier arguments
231 test-list = "(" test *("," test) ")"
232 command = identifier arguments ( ";" / block )
233 start = command
234
235The basic Sieve commands are specified using the following grammar, which
236language is a subset of the generic grammar above. The start symbol is
237"start".
238
239 address-part = ":localpart" / ":domain" / ":all"
240 comparator = ":comparator" string
241 match-type = ":is" / ":contains" / ":matches"
242 string = quoted-string / multi-line
243 string-list = "[" string *("," string) "]" / string
244 address-test = "address" { [address-part] [comparator] [match-type] }
245 string-list string-list
246 test-list = "(" test *("," test) ")"
247 allof-test = "allof" test-list
248 anyof-test = "anyof" test-list
249 exists-test = "exists" string-list
250 false-test = "false"
251 true=test = "true"
252 header-test = "header" { [comparator] [match-type] }
253 string-list string-list
254 not-test = "not" test
255 relop = ":over" / ":under"
256 size-test = "size" relop number
257 block = "{" commands "}"
258 if-command = "if" test block *( "elsif" test block ) [ "else" block ]
259 stop-command = "stop" { stop-options } ";"
260 stop-options =
261 keep-command = "keep" { keep-options } ";"
262 keep-options =
263 discard-command = "discard" { discard-options } ";"
264 discard-options =
265 redirect-command = "redirect" { redirect-options } string ";"
266 redirect-options =
267 require-command = "require" { require-options } string-list ";"
268 require-options =
269 test = address-test / allof-test / anyof-test / exists-test
270 / false-test / true-test / header-test / not-test
271 / size-test
272 command = if-command / stop-command / keep-command
273 / discard-command / redirect-command
274 commands = *command
275 start = *require-command commands
276
277The extensions "envelope" and "fileinto" are specified using the following
278grammar extension.
279
280 envelope-test = "envelope" { [comparator] [address-part] [match-type] }
281 string-list string-list
282 test =/ envelope-test
283
284 fileinto-command = "fileinto" { fileinto-options } string ";"
285 fileinto-options =
286 command =/ fileinto-command
287
288The extension "copy" is specified as:
289
290 fileinto-options =) ":copy"
291 redirect-options =) ":copy"
292
293
294The i;ascii-numeric Comparator
295
296RFC 2244 describes this comparator and specifies that non-numeric strings
297are considered equal with an ordinal value higher than any numeric string.
298Although not stated explicitly, this includes the empty string. A range
299of at least 2^31 is required. This implementation does not limit the
300range, because it does not convert numbers to binary representation
301before comparing them.
302
303
304The vacation extension
305
306The extension "vacation" is specified using the following grammar
307extension.
308
309 vacation-command = "vacation" { vacation-options } <reason: string>
310 vacation-options = [":days" number]
495ae4b0 311 [":subject" string]
f656d135
PH
312 [":from" string]
313 [":addresses" string-list]
495ae4b0 314 [":mime"]
f656d135 315 [":handle" string]
495ae4b0
PH
316 command =/ vacation-command
317
318
319Semantics Of ":mime"
320
f656d135
PH
321The draft does not specify how strings using MIME entities are used
322to compose messages. As a result, different implementations generate
323different mails. The Exim Sieve implementation splits the reason into
324header and body. It adds the header to the mail header and uses the body
325as mail body. Be aware, that other imlementations compose a multipart
326structure with the reason as only part. Both conform to the specification
327(or lack thereof).
495ae4b0
PH
328
329
330Semantics Of Not Using ":mime"
331
332Sieve scripts are written in UTF-8, so is the reason string in this
333case. This implementation adds MIME headers to indicate that. This
334is not required by the vacation draft, which does not specify how
335the UTF-8 reason is processed to compose the resulting message.
336
337
495ae4b0
PH
338Default Subject
339
5ea81592
PH
340The draft specifies that the default message subject is "Auto: " plus
341the old subject. Using this subject is dangerous, because many mailing
342lists verify addresses by sending a secret key in the subject of a
343message, asking to reply to the message for confirmation. Using the
344default vacation subject confirms any subscription request of this kind,
345allowing to subscribe a third party to any mailing list, either to annoy
346the user or to declare spam as legitimate mail by proving to use opt-in.
495ae4b0
PH
347
348
349Rate Limiting Responses
350
f656d135
PH
351In absence of a handle, this implementation hashes the reason,
352":subject" option, ":mime" option and ":from" option and uses the hex
353string representation as filename within the "sieve_vacation_directory"
354to store the recipient addresses for this vacation parameter set.
495ae4b0
PH
355
356The draft specifies that sites may define a minimum ":days" value than 1.
357This implementation uses 1. The maximum value MUST greater than 7,
358and SHOULD be greater than 30. This implementation uses a maximum of 31.
359
360Vacation recipient address databases older than 31 days are automatically
361removed. Users do not have to remove them manually when modifying their
362scripts. Don't put anything but vacation databases in that directory
363or you risk that it will be removed, too!
364
365
366Global Reply Address Blacklist
367
368The draft requires that each implementation offers a global black list
369of addresses that will never be replied to. Exim offers this as option
370"never_mail" in the autoreply transport.