| 1 | $Cambridge: exim/doc/doc-txt/README.SIEVE,v 1.4 2005/05/03 10:02:27 ph10 Exp $ |
| 2 | |
| 3 | Notes on the Sieve implementation for Exim |
| 4 | |
| 5 | Exim Filter Versus Sieve Filter |
| 6 | |
| 7 | Exim supports two incompatible filters: The traditional Exim filter and |
| 8 | the Sieve filter. Since Sieve is a extensible language, it is important |
| 9 | to understand "Sieve" in this context as "the specific implementation |
| 10 | of Sieve for Exim". |
| 11 | |
| 12 | The Exim filter contains more features, such as variable expansion, and |
| 13 | better integration with the host environment, like external processes |
| 14 | and pipes. |
| 15 | |
| 16 | Sieve is a standard for interoperable filters, defined in RFC 3028, |
| 17 | with multiple implementations around. If interoperability is important, |
| 18 | then there is no way around it. |
| 19 | |
| 20 | |
| 21 | Exim Implementation |
| 22 | |
| 23 | The Exim Sieve implementation offers the core as defined by RFC 3028, the |
| 24 | "envelope" (RFC 3028), the "fileinto" (RFC 3028), the "copy" (RFC 3894) |
| 25 | and the "vacation" (draft-ietf-sieve-vacation-01.txt) extension, |
| 26 | the "i;ascii-numeric" comparator, but not the "reject" extension. |
| 27 | Exim does not support MDMs, so adding it just to the sieve filter makes |
| 28 | little sense. |
| 29 | |
| 30 | The Sieve filter is integrated in Exim and works very similar to the |
| 31 | Exim filter: Sieve scripts are recognized by the first line containing |
| 32 | "# sieve filter". When using "keep" or "fileinto" to save a mail into a |
| 33 | folder, the resulting string is available as the variable $address_file |
| 34 | in the transport that stores it. A suitable transport could be: |
| 35 | |
| 36 | localuser: |
| 37 | driver = appendfile |
| 38 | file = ${if eq{$address_file}{inbox} \ |
| 39 | {/var/mail/$local_part} \ |
| 40 | {${if eq{${substr_0_1:$address_file}}{/} \ |
| 41 | {$address_file} \ |
| 42 | {$home/$address_file} \ |
| 43 | }} \ |
| 44 | } |
| 45 | delivery_date_add |
| 46 | envelope_to_add |
| 47 | return_path_add |
| 48 | mode = 0600 |
| 49 | |
| 50 | Absolute files are stored where specified, relative files are stored |
| 51 | relative to $home and "inbox" goes to the standard mailbox location. |
| 52 | |
| 53 | To enable "vacation", set sieve_vacation_directory for the router to |
| 54 | the directory where vacation databases are held (don't put anything |
| 55 | else in that directory) and point reply_transport to an autoreply |
| 56 | transport. |
| 57 | |
| 58 | |
| 59 | RFC Compliance |
| 60 | |
| 61 | Exim requires the first line to be "# sieve filter". Of course the RFC |
| 62 | does not enforce that line. Don't expect examples to work without adding |
| 63 | it, though. |
| 64 | |
| 65 | RFC 3028 requires using CRLF to terminate the end of a line. |
| 66 | The rationale was that CRLF is universally used in network protocols |
| 67 | to mark the end of the line. This implementation does not embed Sieve |
| 68 | in a network protocol, but uses Sieve scripts as part of the Exim MTA. |
| 69 | Since all parts of Exim use \n as newline character, this implementation |
| 70 | does, too. You can change this by defining the macro RFC_EOL at compile |
| 71 | time to enforce CRLF being used. |
| 72 | |
| 73 | Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so |
| 74 | this implementation repeats this violation to stay consistent with Exim. |
| 75 | This is in preparation to UTF-8 data. |
| 76 | |
| 77 | Sieve scripts can not contain NUL characters in strings, but mail |
| 78 | headers could contain MIME encoded NUL characters, which could never |
| 79 | be matched by Sieve scripts using exact comparisons. For that reason, |
| 80 | this implementation extends the Sieve quoted string syntax with \0 |
| 81 | to describe a NUL character, violating \0 being the same as 0 in |
| 82 | RFC 3028. Even without using \0, the following tests are all true in |
| 83 | this implementation. Implementations that use C-style strings will only |
| 84 | evaulate the first test as true. |
| 85 | |
| 86 | Subject: =?iso-8859-1?q?abc=00def |
| 87 | |
| 88 | header :contains "Subject" ["abc"] |
| 89 | header :contains "Subject" ["def"] |
| 90 | header :matches "Subject" ["abc?def"] |
| 91 | |
| 92 | Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted |
| 93 | in a way that NUL characters truncating strings is allowed for Sieve |
| 94 | implementations, although not recommended. It is further allowed to use |
| 95 | encoded NUL characters in headers, but that's not recommended either. |
| 96 | The above example shows why. Good code should still be able to deal |
| 97 | with it. |
| 98 | |
| 99 | RFC 3028 states that if an implementation fails to convert a character |
| 100 | set to UTF-8, two strings can not be equal if one contains octects greater |
| 101 | than 127. Assuming that all unknown character sets are one-byte character |
| 102 | sets with the lower 128 octects being US-ASCII is not sound, so this |
| 103 | implementation violates RFC 3028 and treats such MIME words literally. |
| 104 | That way at least something could be matched. |
| 105 | |
| 106 | The folder specified by "fileinto" must not contain the character |
| 107 | sequence ".." to avoid security problems. RFC 3028 does not specifiy the |
| 108 | syntax of folders apart from keep being equivalent to fileinto "INBOX". |
| 109 | This implementation uses "inbox" instead. |
| 110 | |
| 111 | Sieve script errors currently cause that messages are silently filed into |
| 112 | "inbox". RFC 3028 requires that the user is notified of that condition. |
| 113 | This may be implemented in future by adding a header line to mails that |
| 114 | are filed into "inbox" due to an error in the filter. |
| 115 | |
| 116 | |
| 117 | Strings Containing Header Names Or Envelope Elements |
| 118 | |
| 119 | RFC 3028 does not specify what happens if a string denoting a header |
| 120 | field or envelope element does not contain a valid name, e.g. it |
| 121 | contains a colon for a header or it is not "from" or "to" for envelopes. |
| 122 | This implementation generates an error instead of ignoring the header |
| 123 | field in order to ease script debugging, which fits in the common picture |
| 124 | of Sieve. |
| 125 | |
| 126 | |
| 127 | Header Test With Invalid MIME Encoding In Header |
| 128 | |
| 129 | Some MUAs process invalid base64 encoded data, generating junk. |
| 130 | Others ignore junk after seeing an equal sign in base64 encoded data. |
| 131 | RFC 2047 does not specify how to react in this case, other than stating |
| 132 | that a client must not forbid to process a message for that reason. |
| 133 | RFC 2045 specifies that invalid data should be ignored (appearantly |
| 134 | looking at end of line characters). It also specifies that invalid data |
| 135 | may lead to rejecting messages containing them (and there it appears to |
| 136 | talk about true encoding violations), which is a clear contradiction to |
| 137 | ignoring them. |
| 138 | |
| 139 | RFC 3028 does not specify how to process incorrect MIME words. |
| 140 | This implementation treats them literally, as it does if the word is |
| 141 | correct, but its character set can not be converted to UTF-8. |
| 142 | |
| 143 | |
| 144 | Address Test For Multiple Addresses Per Header |
| 145 | |
| 146 | A header may contain multiple addresses. RFC 3028 does not explicitly |
| 147 | specify how to deal with them, but since the "address" test checks if |
| 148 | anything matches anything else, matching one address suffices to |
| 149 | satify the condition. That makes it impossible to test if a header |
| 150 | contains a certain set of addresses and no more, but it is more logical |
| 151 | than letting the test fail if the header contains an additional address |
| 152 | besides the one the test checks for. |
| 153 | |
| 154 | |
| 155 | Semantics Of Keep |
| 156 | |
| 157 | The keep command is equivalent to fileinto "inbox": It saves the |
| 158 | message and resets the implicit keep flag. It does not set the |
| 159 | implicit keep flag; there is no command to set it once it has |
| 160 | been reset. |
| 161 | |
| 162 | |
| 163 | Semantics of Fileinto |
| 164 | |
| 165 | RFC 3028 does not specify if "fileinto" tries to create a mail folder, |
| 166 | in case it does not exist. This implementation allows to configure |
| 167 | that aspect using the appendfile transport options "create_directory", |
| 168 | "create_file" and "file_must_exist". See the appendfile transport in |
| 169 | the Exim specification for details. |
| 170 | |
| 171 | |
| 172 | Semantics of Redirect |
| 173 | |
| 174 | Sieve scripts are supposed to be interoperable between servers, so this |
| 175 | implementation does not allow redirecting mail to unqualified addresses, |
| 176 | because the domain would depend on the used system and on systems with |
| 177 | virtual mail domains it is probably not what the user expects it to be. |
| 178 | |
| 179 | |
| 180 | String Arguments |
| 181 | |
| 182 | There has been confusion if the string arguments to "require" are to be |
| 183 | matched case-sensitive or not. This implementation matches them with |
| 184 | the match type ":is" (default, see section 2.7.1) and the comparator |
| 185 | "i;ascii-casemap" (default, see section 2.7.3). The RFC defines the |
| 186 | command defaults clearly, so any different implementations violate RFC |
| 187 | 3028. The same is valid for comparator names, also specified as strings. |
| 188 | |
| 189 | |
| 190 | Number Units |
| 191 | |
| 192 | There is a mistake in RFC 3028: The suffix G denotes gibi-, not tebibyte. |
| 193 | The mistake os obvious, because RFC 3028 specifies G to denote 2^30 |
| 194 | (which is gibi, not tebi), and that's what this implementation uses as |
| 195 | scaling factor for the suffix G. |
| 196 | |
| 197 | |
| 198 | Sieve Syntax and Semantics |
| 199 | |
| 200 | RFC 3028 confuses syntax and semantics sometimes. It uses a generic |
| 201 | grammar as syntax for actions and tests and performs many checks during |
| 202 | semantic analysis. Syntax is specified as grammar rule, semantics |
| 203 | with natural language, despire the latter often talking about syntax. |
| 204 | The intention was to provide a framework for the syntax that describes |
| 205 | current commands as well as future extensions, and describing commands |
| 206 | by semantics. Since the semantic analysis is not specified by formal |
| 207 | rules, it is easy to get that phase wrong, as demonstrated by the mistake |
| 208 | in RFC 3028 to forbid "elsif" being followed by "elsif" (which is allowed |
| 209 | in Sieve, it's just not specified correctly). |
| 210 | |
| 211 | RFC 3028 does not define if semantic checks are strict (always treat |
| 212 | unknown extensions as errors) or lazy (treat unknown extensions as error, |
| 213 | if they are executed), and since it employs a very generic grammar, |
| 214 | it is not unreasonable for an implementation using a parser for the |
| 215 | generic grammar to indeed process scripts that contain unknown commands |
| 216 | in dead code. It is just required to treat disabled but known extensions |
| 217 | the same as unknown extensions. |
| 218 | |
| 219 | The following suggestion for section 8.2 gives two grammars, one for |
| 220 | the framework, and one for specific commands, thus removing most of the |
| 221 | semantic analysis. Since the parser can not parse unsupported extensions, |
| 222 | the result is strict error checking. As required in section 2.10.5, known |
| 223 | but not enabled extensions must behave the same as unknown extensions, |
| 224 | so those also result strictly in errors (though at the thin semantic |
| 225 | layer), even if they can be parsed fine. |
| 226 | |
| 227 | 8.2. Grammar |
| 228 | |
| 229 | The atoms of the grammar are lexical tokens. White space or comments may |
| 230 | appear anywhere between lexical tokens, they are not part of the grammar. |
| 231 | The grammar is specified in ABNF with two extensions to describe tagged |
| 232 | arguments that can be reordered and grammar extensions: { } denotes a |
| 233 | sequence of symbols that may appear in any order. Example: |
| 234 | |
| 235 | start = { a b c } |
| 236 | |
| 237 | is equivalent to: |
| 238 | |
| 239 | start = ( a b c ) / ( a c b ) / ( b a c ) / ( b c a ) / ( c a b ) / ( c b a ) |
| 240 | |
| 241 | The symbol =) is used to append to a rule: |
| 242 | |
| 243 | start = a |
| 244 | start =) b |
| 245 | |
| 246 | is equivalent to |
| 247 | |
| 248 | start = a b |
| 249 | |
| 250 | All Sieve commands, including extensions, MUST be words of the following |
| 251 | generic grammar with the start symbol "start". They SHOULD be specified |
| 252 | using a specific grammar, though. |
| 253 | |
| 254 | argument = string-list / number / tag |
| 255 | arguments = *argument [test / test-list] |
| 256 | block = "{" commands "}" |
| 257 | commands = *command |
| 258 | string = quoted-string / multi-line |
| 259 | string-list = "[" string *("," string) "]" / string |
| 260 | test = identifier arguments |
| 261 | test-list = "(" test *("," test) ")" |
| 262 | command = identifier arguments ( ";" / block ) |
| 263 | start = command |
| 264 | |
| 265 | The basic Sieve commands are specified using the following grammar, which |
| 266 | language is a subset of the generic grammar above. The start symbol is |
| 267 | "start". |
| 268 | |
| 269 | address-part = ":localpart" / ":domain" / ":all" |
| 270 | comparator = ":comparator" string |
| 271 | match-type = ":is" / ":contains" / ":matches" |
| 272 | string = quoted-string / multi-line |
| 273 | string-list = "[" string *("," string) "]" / string |
| 274 | address-test = "address" { [address-part] [comparator] [match-type] } |
| 275 | string-list string-list |
| 276 | test-list = "(" test *("," test) ")" |
| 277 | allof-test = "allof" test-list |
| 278 | anyof-test = "anyof" test-list |
| 279 | exists-test = "exists" string-list |
| 280 | false-test = "false" |
| 281 | true=test = "true" |
| 282 | header-test = "header" { [comparator] [match-type] } |
| 283 | string-list string-list |
| 284 | not-test = "not" test |
| 285 | relop = ":over" / ":under" |
| 286 | size-test = "size" relop number |
| 287 | block = "{" commands "}" |
| 288 | if-command = "if" test block *( "elsif" test block ) [ "else" block ] |
| 289 | stop-command = "stop" { stop-options } ";" |
| 290 | stop-options = |
| 291 | keep-command = "keep" { keep-options } ";" |
| 292 | keep-options = |
| 293 | discard-command = "discard" { discard-options } ";" |
| 294 | discard-options = |
| 295 | redirect-command = "redirect" { redirect-options } string ";" |
| 296 | redirect-options = |
| 297 | require-command = "require" { require-options } string-list ";" |
| 298 | require-options = |
| 299 | test = address-test / allof-test / anyof-test / exists-test |
| 300 | / false-test / true-test / header-test / not-test |
| 301 | / size-test |
| 302 | command = if-command / stop-command / keep-command |
| 303 | / discard-command / redirect-command |
| 304 | commands = *command |
| 305 | start = *require-command commands |
| 306 | |
| 307 | The extensions "envelope" and "fileinto" are specified using the following |
| 308 | grammar extension. |
| 309 | |
| 310 | envelope-test = "envelope" { [comparator] [address-part] [match-type] } |
| 311 | string-list string-list |
| 312 | test =/ envelope-test |
| 313 | |
| 314 | fileinto-command = "fileinto" { fileinto-options } string ";" |
| 315 | fileinto-options = |
| 316 | command =/ fileinto-command |
| 317 | |
| 318 | The extension "copy" is specified as: |
| 319 | |
| 320 | fileinto-options =) ":copy" |
| 321 | redirect-options =) ":copy" |
| 322 | |
| 323 | |
| 324 | The i;ascii-numeric Comparator |
| 325 | |
| 326 | RFC 2244 describes this comparator and specifies that non-numeric strings |
| 327 | are considered equal with an ordinal value higher than any numeric string. |
| 328 | Although not stated explicitly, this includes the empty string. A range |
| 329 | of at least 2^31 is required. This implementation does not limit the |
| 330 | range, because it does not convert numbers to binary representation |
| 331 | before comparing them. |
| 332 | |
| 333 | |
| 334 | The vacation extension |
| 335 | |
| 336 | The extension "vacation" is specified using the following grammar |
| 337 | extension. |
| 338 | |
| 339 | vacation-command = "vacation" { vacation-options } <reason: string> |
| 340 | vacation-options = [":days" number] |
| 341 | [":subject" string] |
| 342 | [":from" string] |
| 343 | [":addresses" string-list] |
| 344 | [":mime"] |
| 345 | [":handle" string] |
| 346 | command =/ vacation-command |
| 347 | |
| 348 | |
| 349 | Semantics Of ":mime" |
| 350 | |
| 351 | The draft does not specify how strings using MIME entities are used |
| 352 | to compose messages. As a result, different implementations generate |
| 353 | different mails. The Exim Sieve implementation splits the reason into |
| 354 | header and body. It adds the header to the mail header and uses the body |
| 355 | as mail body. Be aware, that other imlementations compose a multipart |
| 356 | structure with the reason as only part. Both conform to the specification |
| 357 | (or lack thereof). |
| 358 | |
| 359 | |
| 360 | Semantics Of Not Using ":mime" |
| 361 | |
| 362 | Sieve scripts are written in UTF-8, so is the reason string in this |
| 363 | case. This implementation adds MIME headers to indicate that. This |
| 364 | is not required by the vacation draft, which does not specify how |
| 365 | the UTF-8 reason is processed to compose the resulting message. |
| 366 | |
| 367 | |
| 368 | Default Subject |
| 369 | |
| 370 | The draft specifies that the default message subject is "Re: " |
| 371 | plus the old subject, stripped by any leading "Re: " strings. |
| 372 | This string is to be taken literally, unlike some software which |
| 373 | matches a regular expression like "[rR][eE]: *". Using this |
| 374 | subject is dangerous, because many mailing lists verify addresses |
| 375 | by sending a secret key in the subject of a message, asking to |
| 376 | reply to the message for confirmation. Using the default vacation |
| 377 | subject confirms any subscription request of this kind, allowing |
| 378 | to subscribe a third party to any mailing list, either to annoy |
| 379 | the user or to declare spam as legitimate mail by proving to |
| 380 | use opt-in. The draft specifies to use "Re: " in front of the |
| 381 | subject, but this implementation uses "Auto: ", as suggested in |
| 382 | RFC 3834, section 3.1.5. |
| 383 | |
| 384 | |
| 385 | Rate Limiting Responses |
| 386 | |
| 387 | In absence of a handle, this implementation hashes the reason, |
| 388 | ":subject" option, ":mime" option and ":from" option and uses the hex |
| 389 | string representation as filename within the "sieve_vacation_directory" |
| 390 | to store the recipient addresses for this vacation parameter set. |
| 391 | |
| 392 | The draft specifies that sites may define a minimum ":days" value than 1. |
| 393 | This implementation uses 1. The maximum value MUST greater than 7, |
| 394 | and SHOULD be greater than 30. This implementation uses a maximum of 31. |
| 395 | |
| 396 | Vacation recipient address databases older than 31 days are automatically |
| 397 | removed. Users do not have to remove them manually when modifying their |
| 398 | scripts. Don't put anything but vacation databases in that directory |
| 399 | or you risk that it will be removed, too! |
| 400 | |
| 401 | |
| 402 | Global Reply Address Blacklist |
| 403 | |
| 404 | The draft requires that each implementation offers a global black list |
| 405 | of addresses that will never be replied to. Exim offers this as option |
| 406 | "never_mail" in the autoreply transport. |
| 407 | |
| 408 | |
| 409 | Interaction With Other Sieve Elements |
| 410 | |
| 411 | The draft describes the interaction with vacation, discard, keep, |
| 412 | fileinto and redirect. It MUST describe compatibility with other |
| 413 | actions, but doesn't. In this implementation, vacation is compatible |
| 414 | with any other action. |