rfc5256.txt (40779B)
1 2 3 4 5 6 7 Network Working Group M. Crispin 8 Request for Comments: 5256 Panda Programming 9 Category: Standards Track K. Murchison 10 Carnegie Mellon University 11 June 2008 12 13 14 Internet Message Access Protocol - SORT and THREAD Extensions 15 16 Status of This Memo 17 18 This document specifies an Internet standards track protocol for the 19 Internet community, and requests discussion and suggestions for 20 improvements. Please refer to the current edition of the "Internet 21 Official Protocol Standards" (STD 1) for the standardization state 22 and status of this protocol. Distribution of this memo is unlimited. 23 24 Abstract 25 26 This document describes the base-level server-based sorting and 27 threading extensions to the IMAP protocol. These extensions provide 28 substantial performance improvements for IMAP clients that offer 29 sorted and threaded views. 30 31 1. Introduction 32 33 The SORT and THREAD extensions to the [IMAP] protocol provide a means 34 of server-based sorting and threading of messages, without requiring 35 that the client download the necessary data to do so itself. This is 36 particularly useful for online clients as described in [IMAP-MODELS]. 37 38 A server that supports the base-level SORT extension indicates this 39 with a capability name which starts with "SORT". Future, upwards- 40 compatible extensions to the SORT extension will all start with 41 "SORT", indicating support for this base level. 42 43 A server that supports the THREAD extension indicates this with one 44 or more capability names consisting of "THREAD=" followed by a 45 supported threading algorithm name as described in this document. 46 This provides for future upwards-compatible extensions. 47 48 A server that implements the SORT and/or THREAD extensions MUST 49 collate strings in accordance with the requirements of I18NLEVEL=1, 50 as described in [IMAP-I18N], and SHOULD implement and advertise the 51 I18NLEVEL=1 extension. Alternatively, a server MAY implement 52 I18NLEVEL=2 (or higher) and comply with the rules of that level. 53 54 55 56 57 58 Crispin & Murchison Standards Track [Page 1] 59 60 RFC 5256 IMAP Sort June 2008 61 62 63 Discussion: The SORT and THREAD extensions predate [IMAP-I18N] by 64 several years. At the time of this writing, all known server 65 implementations of SORT and THREAD comply with the rules of 66 I18NLEVEL=1, but do not necessarily advertise it. As discussed in 67 [IMAP-I18N] section 4.5, all server implementations should 68 eventually be updated to comply with the I18NLEVEL=2 extension. 69 70 Historical note: The REFERENCES threading algorithm is based on the 71 [THREADING] algorithm written and used in "Netscape Mail and News" 72 versions 2.0 through 3.0. 73 74 2. Terminology 75 76 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 77 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 78 document are to be interpreted as described in [KEYWORDS]. 79 80 The word "can" (not "may") is used to refer to a possible 81 circumstance or situation, as opposed to an optional facility of the 82 protocol. 83 84 "User" is used to refer to a human user, whereas "client" refers to 85 the software being run by the user. 86 87 In examples, "C:" and "S:" indicate lines sent by the client and 88 server, respectively. 89 90 2.1. Base Subject 91 92 Subject sorting and threading use the "base subject", which has 93 specific subject artifacts removed. Due to the complexity of these 94 artifacts, the formal syntax for the subject extraction rules is 95 ambiguous. The following procedure is followed to determine the 96 "base subject", using the [ABNF] formal syntax rules described in 97 section 5: 98 99 (1) Convert any RFC 2047 encoded-words in the subject to [UTF-8] 100 as described in "Internationalization Considerations". 101 Convert all tabs and continuations to space. Convert all 102 multiple spaces to a single space. 103 104 (2) Remove all trailing text of the subject that matches the 105 subj-trailer ABNF; repeat until no more matches are possible. 106 107 (3) Remove all prefix text of the subject that matches the subj- 108 leader ABNF. 109 110 111 112 113 114 Crispin & Murchison Standards Track [Page 2] 115 116 RFC 5256 IMAP Sort June 2008 117 118 119 (4) If there is prefix text of the subject that matches the subj- 120 blob ABNF, and removing that prefix leaves a non-empty subj- 121 base, then remove the prefix text. 122 123 (5) Repeat (3) and (4) until no matches remain. 124 125 Note: It is possible to defer step (2) until step (6), but this 126 requires checking for subj-trailer in step (4). 127 128 (6) If the resulting text begins with the subj-fwd-hdr ABNF and 129 ends with the subj-fwd-trl ABNF, remove the subj-fwd-hdr and 130 subj-fwd-trl and repeat from step (2). 131 132 (7) The resulting text is the "base subject" used in the SORT. 133 134 All servers and disconnected (as described in [IMAP-MODELS]) clients 135 MUST use exactly this algorithm to determine the "base subject". 136 Otherwise, there is potential for a user to get inconsistent results 137 based on whether they are running in connected or disconnected mode. 138 139 2.2. Sent Date 140 141 As used in this document, the term "sent date" refers to the date and 142 time from the Date: header, adjusted by time zone to normalize to 143 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the 144 UTC date and time of "1 Jan 2001 00:01:33 +0000". 145 146 If the time zone is invalid, the date and time SHOULD be treated as 147 UTC. If the time is also invalid, the time SHOULD be treated as 148 00:00:00. If there is no valid date or time, the date and time 149 SHOULD be treated as 00:00:00 on the earliest possible date. 150 151 This differs from the date-related criteria in the SEARCH command 152 (described in [IMAP] section 6.4.4), which use just the date and not 153 the time, and are not adjusted by time zone. 154 155 If the sent date cannot be determined (a Date: header is missing or 156 cannot be parsed), the INTERNALDATE for that message is used as the 157 sent date. 158 159 When comparing two sent dates that match exactly, the order in which 160 the two messages appear in the mailbox (that is, by sequence number) 161 is used as a tie-breaker to determine the order. 162 163 164 165 166 167 168 169 170 Crispin & Murchison Standards Track [Page 3] 171 172 RFC 5256 IMAP Sort June 2008 173 174 175 3. Additional Commands 176 177 These commands are extensions to the [IMAP] base protocol. 178 179 The section headings are intended to correspond with where they would 180 be located in the main document if they were part of the base 181 specification. 182 183 BASE.6.4.SORT. SORT Command 184 185 Arguments: sort program 186 charset specification 187 searching criteria (one or more) 188 189 Data: untagged responses: SORT 190 191 Result: OK - sort completed 192 NO - sort error: can't sort that charset or 193 criteria 194 BAD - command unknown or arguments invalid 195 196 The SORT command is a variant of SEARCH with sorting semantics for 197 the results. There are two arguments before the searching 198 criteria argument: a parenthesized list of sort criteria, and the 199 searching charset. 200 201 The charset argument is mandatory (unlike SEARCH) and indicates 202 the [CHARSET] of the strings that appear in the searching 203 criteria. The US-ASCII and [UTF-8] charsets MUST be implemented. 204 All other charsets are optional. 205 206 There is also a UID SORT command that returns unique identifiers 207 instead of message sequence numbers. Note that there are separate 208 searching criteria for message sequence numbers and UIDs; thus, 209 the arguments to UID SORT are interpreted the same as in SORT. 210 This is analogous to the behavior of UID SEARCH, as opposed to UID 211 COPY, UID FETCH, or UID STORE. 212 213 The SORT command first searches the mailbox for messages that 214 match the given searching criteria using the charset argument for 215 the interpretation of strings in the searching criteria. It then 216 returns the matching messages in an untagged SORT response, sorted 217 according to one or more sort criteria. 218 219 Sorting is in ascending order. Earlier dates sort before later 220 dates; smaller sizes sort before larger sizes; and strings are 221 sorted according to ascending values established by their 222 collation algorithm (see "Internationalization Considerations"). 223 224 225 226 Crispin & Murchison Standards Track [Page 4] 227 228 RFC 5256 IMAP Sort June 2008 229 230 231 If two or more messages exactly match according to the sorting 232 criteria, these messages are sorted according to the order in 233 which they appear in the mailbox. In other words, there is an 234 implicit sort criterion of "sequence number". 235 236 When multiple sort criteria are specified, the result is sorted in 237 the priority order that the criteria appear. For example, 238 (SUBJECT DATE) will sort messages in order by their base subject 239 text; and for messages with the same base subject text, it will 240 sort by their sent date. 241 242 Untagged EXPUNGE responses are not permitted while the server is 243 responding to a SORT command, but are permitted during a UID SORT 244 command. 245 246 The defined sort criteria are as follows. Refer to the Formal 247 Syntax section for the precise syntactic definitions of the 248 arguments. If the associated RFC-822 header for a particular 249 criterion is absent, it is treated as the empty string. The empty 250 string always collates before non-empty strings. 251 252 ARRIVAL 253 Internal date and time of the message. This differs from the 254 ON criteria in SEARCH, which uses just the internal date. 255 256 CC 257 [IMAP] addr-mailbox of the first "cc" address. 258 259 DATE 260 Sent date and time, as described in section 2.2. 261 262 FROM 263 [IMAP] addr-mailbox of the first "From" address. 264 265 REVERSE 266 Followed by another sort criterion, has the effect of that 267 criterion but in reverse (descending) order. 268 Note: REVERSE only reverses a single criterion, and does not 269 affect the implicit "sequence number" sort criterion if all 270 other criteria are identical. Consequently, a sort of 271 REVERSE SUBJECT is not the same as a reverse ordering of a 272 SUBJECT sort. This can be avoided by use of additional 273 criteria, e.g., SUBJECT DATE vs. REVERSE SUBJECT REVERSE 274 DATE. In general, however, it's better (and faster, if the 275 client has a "reverse current ordering" command) to reverse 276 the results in the client instead of issuing a new SORT. 277 278 279 280 281 282 Crispin & Murchison Standards Track [Page 5] 283 284 RFC 5256 IMAP Sort June 2008 285 286 287 SIZE 288 Size of the message in octets. 289 290 SUBJECT 291 Base subject text. 292 293 TO 294 [IMAP] addr-mailbox of the first "To" address. 295 296 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994 297 S: * SORT 2 84 882 298 S: A282 OK SORT completed 299 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL 300 S: * SORT 5 3 4 1 2 301 S: A283 OK SORT completed 302 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox" 303 S: * SORT 304 S: A284 OK SORT completed 305 306 BASE.6.4.THREAD. THREAD Command 307 308 Arguments: threading algorithm 309 charset specification 310 searching criteria (one or more) 311 312 Data: untagged responses: THREAD 313 314 Result: OK - thread completed 315 NO - thread error: can't thread that charset or 316 criteria 317 BAD - command unknown or arguments invalid 318 319 The THREAD command is a variant of SEARCH with threading semantics 320 for the results. Thread has two arguments before the searching 321 criteria argument: a threading algorithm and the searching 322 charset. 323 324 The charset argument is mandatory (unlike SEARCH) and indicates 325 the [CHARSET] of the strings that appear in the searching 326 criteria. The US-ASCII and [UTF-8] charsets MUST be implemented. 327 All other charsets are optional. 328 329 There is also a UID THREAD command that returns unique identifiers 330 instead of message sequence numbers. Note that there are separate 331 searching criteria for message sequence numbers and UIDs; thus the 332 arguments to UID THREAD are interpreted the same as in THREAD. 333 This is analogous to the behavior of UID SEARCH, as opposed to UID 334 COPY, UID FETCH, or UID STORE. 335 336 337 338 Crispin & Murchison Standards Track [Page 6] 339 340 RFC 5256 IMAP Sort June 2008 341 342 343 The THREAD command first searches the mailbox for messages that 344 match the given searching criteria using the charset argument for 345 the interpretation of strings in the searching criteria. It then 346 returns the matching messages in an untagged THREAD response, 347 threaded according to the specified threading algorithm. 348 349 All collation is in ascending order. Earlier dates collate before 350 later dates and strings are collated according to ascending values 351 established by their collation algorithm (see 352 "Internationalization Considerations"). 353 354 Untagged EXPUNGE responses are not permitted while the server is 355 responding to a THREAD command, but are permitted during a UID 356 THREAD command. 357 358 The defined threading algorithms are as follows: 359 360 ORDEREDSUBJECT 361 362 The ORDEREDSUBJECT threading algorithm is also referred to as 363 "poor man's threading". The searched messages are sorted by 364 base subject and then by the sent date. The messages are then 365 split into separate threads, with each thread containing 366 messages with the same base subject text. Finally, the threads 367 are sorted by the sent date of the first message in the thread. 368 369 The top level or "root" in ORDEREDSUBJECT threading contains 370 the first message of every thread. All messages in the root 371 are siblings of each other. The second message of a thread is 372 the child of the first message, and subsequent messages of the 373 thread are siblings of the second message and hence children of 374 the message at the root. Hence, there are no grandchildren in 375 ORDEREDSUBJECT threading. 376 377 Children in ORDEREDSUBJECT threading do not have descendents. 378 Client implementations SHOULD treat descendents of a child in a 379 server response as being siblings of that child. 380 381 REFERENCES 382 383 The REFERENCES threading algorithm threads the searched 384 messages by grouping them together in parent/child 385 relationships based on which messages are replies to others. 386 The parent/child relationships are built using two methods: 387 reconstructing a message's ancestry using the references 388 contained within it; and checking the original (not base) 389 subject of a message to see if it is a reply to (or forward of) 390 another message. 391 392 393 394 Crispin & Murchison Standards Track [Page 7] 395 396 RFC 5256 IMAP Sort June 2008 397 398 399 Note: "Message ID" in the following description refers to a 400 normalized form of the msg-id in [RFC2822]. The actual text 401 in RFC 2822 may use quoting, resulting in multiple ways of 402 expressing the same Message ID. Implementations of the 403 REFERENCES threading algorithm MUST normalize any msg-id in 404 order to avoid false non-matches due to differences in 405 quoting. 406 407 For example, the msg-id 408 <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 409 and the msg-id 410 <01KF8JCEOCBS0045PS@xxx.yyy.com> 411 MUST be interpreted as being the same Message ID. 412 413 The references used for reconstructing a message's ancestry are 414 found using the following rules: 415 416 If a message contains a References header line, then use the 417 Message IDs in the References header line as the references. 418 419 If a message does not contain a References header line, or 420 the References header line does not contain any valid 421 Message IDs, then use the first (if any) valid Message ID 422 found in the In-Reply-To header line as the only reference 423 (parent) for this message. 424 425 Note: Although [RFC2822] permits multiple Message IDs in 426 the In-Reply-To header, in actual practice this 427 discipline has not been followed. For example, 428 In-Reply-To headers have been observed with message 429 addresses after the Message ID, and there are no good 430 heuristics for software to determine the difference. 431 This is not a problem with the References header, 432 however. 433 434 If a message does not contain an In-Reply-To header line, or 435 the In-Reply-To header line does not contain a valid Message 436 ID, then the message does not have any references (NIL). 437 438 A message is considered to be a reply or forward if the base 439 subject extraction rules, applied to the original subject, 440 remove any of the following: a subj-refwd, a "(fwd)" subj- 441 trailer, or a subj-fwd-hdr and subj-fwd-trl. 442 443 The REFERENCES algorithm is significantly more complex than 444 ORDEREDSUBJECT and consists of six main steps. These steps are 445 outlined in detail below. 446 447 448 449 450 Crispin & Murchison Standards Track [Page 8] 451 452 RFC 5256 IMAP Sort June 2008 453 454 455 (1) For each searched message: 456 457 (A) Using the Message IDs in the message's references, link 458 the corresponding messages (those whose Message-ID 459 header line contains the given reference Message ID) 460 together as parent/child. Make the first reference the 461 parent of the second (and the second a child of the 462 first), the second the parent of the third (and the 463 third a child of the second), etc. The following rules 464 govern the creation of these links: 465 466 If a message does not contain a Message-ID header 467 line, or the Message-ID header line does not 468 contain a valid Message ID, then assign a unique 469 Message ID to this message. 470 471 If two or more messages have the same Message ID, 472 then only use that Message ID in the first (lowest 473 sequence number) message, and assign a unique 474 Message ID to each of the subsequent messages with 475 a duplicate of that Message ID. 476 477 If no message can be found with a given Message ID, 478 create a dummy message with this ID. Use this 479 dummy message for all subsequent references to this 480 ID. 481 482 If a message already has a parent, don't change the 483 existing link. This is done because the References 484 header line may have been truncated by a Mail User 485 Agent (MUA). As a result, there is no guarantee 486 that the messages corresponding to adjacent Message 487 IDs in the References header line are parent and 488 child. 489 490 Do not create a parent/child link if creating that 491 link would introduce a loop. For example, before 492 making message A the parent of B, make sure that A 493 is not a descendent of B. 494 495 Note: Message ID comparisons are case-sensitive. 496 497 (B) Create a parent/child link between the last reference 498 (or NIL if there are no references) and the current 499 message. If the current message already has a parent, 500 it is probably the result of a truncated References 501 header line, so break the current parent/child link 502 before creating the new correct one. As in step 1.A, 503 504 505 506 Crispin & Murchison Standards Track [Page 9] 507 508 RFC 5256 IMAP Sort June 2008 509 510 511 do not create the parent/child link if creating that 512 link would introduce a loop. Note that if this message 513 has no references, it will now have no parent. 514 515 Note: The parent/child links created in steps 1.A 516 and 1.B MUST be kept consistent with one another at 517 ALL times. 518 519 (2) Gather together all of the messages that have no parents 520 and make them all children (siblings of one another) of a 521 dummy parent (the "root"). These messages constitute the 522 first (head) message of the threads created thus far. 523 524 (3) Prune dummy messages from the thread tree. Traverse each 525 thread under the root, and for each message: 526 527 If it is a dummy message with NO children, delete it. 528 529 If it is a dummy message with children, delete it, but 530 promote its children to the current level. In other 531 words, splice them in with the dummy's siblings. 532 533 Do not promote the children if doing so would make them 534 children of the root, unless there is only one child. 535 536 (4) Sort the messages under the root (top-level siblings only) 537 by sent date as described in section 2.2. In the case of a 538 dummy message, sort its children by sent date and then use 539 the first child for the top-level sort. 540 541 (5) Gather together messages under the root that have the same 542 base subject text. 543 544 (A) Create a table for associating base subjects with 545 messages, called the subject table. 546 547 (B) Populate the subject table with one message per each 548 base subject. For each child of the root: 549 550 (i) Find the subject of this thread, by using the 551 base subject from either the current message or 552 its first child if the current message is a 553 dummy. This is the thread subject. 554 555 (ii) If the thread subject is empty, skip this 556 message. 557 558 559 560 561 562 Crispin & Murchison Standards Track [Page 10] 563 564 RFC 5256 IMAP Sort June 2008 565 566 567 (iii) Look up the message associated with the thread 568 subject in the subject table. 569 570 (iv) If there is no message in the subject table with 571 the thread subject, add the current message and 572 the thread subject to the subject table. 573 574 Otherwise, if the message in the subject table is 575 not a dummy, AND either of the following criteria 576 are true: 577 578 The current message is a dummy, OR 579 580 The message in the subject table is a reply 581 or forward and the current message is not. 582 583 then replace the message in the subject table 584 with the current message. 585 586 (C) Merge threads with the same thread subject. For each 587 child of the root: 588 589 (i) Find the message's thread subject as in step 590 5.B.i above. 591 592 (ii) If the thread subject is empty, skip this 593 message. 594 595 (iii) Lookup the message associated with this thread 596 subject in the subject table. 597 598 (iv) If the message in the subject table is the 599 current message, skip this message. 600 601 Otherwise, merge the current message with the one in 602 the subject table using the following rules: 603 604 If both messages are dummies, append the current 605 message's children to the children of the message 606 in the subject table (the children of both messages 607 become siblings), and then delete the current 608 message. 609 610 If the message in the subject table is a dummy and 611 the current message is not, make the current 612 message a child of the message in the subject table 613 (a sibling of its children). 614 615 616 617 618 Crispin & Murchison Standards Track [Page 11] 619 620 RFC 5256 IMAP Sort June 2008 621 622 623 If the current message is a reply or forward and 624 the message in the subject table is not, make the 625 current message a child of the message in the 626 subject table (a sibling of its children). 627 628 Otherwise, create a new dummy message and make both 629 the current message and the message in the subject 630 table children of the dummy. Then replace the 631 message in the subject table with the dummy 632 message. 633 634 Note: Subject comparisons are case-insensitive, 635 as described under "Internationalization 636 Considerations". 637 638 (6) Traverse the messages under the root and sort each set of 639 siblings by sent date as described in section 2.2. 640 Traverse the messages in such a way that the "youngest" set 641 of siblings are sorted first, and the "oldest" set of 642 siblings are sorted last (grandchildren are sorted before 643 children, etc). In the case of a dummy message (which can 644 only occur with top-level siblings), use its first child 645 for sorting. 646 647 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 648 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 649 (173)(174 (175)(176)(178)(181)(180))(179)(177 650 (183)(182)(188)(184)(185)(186)(187)(189))(190) 651 (191)(192)(193)(194 195)(196 (197)(198))(199) 652 (200 202)(201)(203)(204)(205)(206 207)(208) 653 S: A283 OK THREAD completed 654 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 655 S: * THREAD 656 S: A284 OK THREAD completed 657 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 658 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 659 (171)(173)((174)(175)(176)(178)(181)(180)) 660 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 661 (190)(191)(192)(193)((194)(195 196))(197 198) 662 (199)(200 202)(201)(203)(204)(205 206 207)(208) 663 S: A285 OK THREAD completed 664 665 Note: The line breaks in the first and third server 666 responses are for editorial clarity and do not appear in 667 real THREAD responses. 668 669 670 671 672 673 674 Crispin & Murchison Standards Track [Page 12] 675 676 RFC 5256 IMAP Sort June 2008 677 678 679 4. Additional Responses 680 681 These responses are extensions to the [IMAP] base protocol. 682 683 The section headings of these responses are intended to correspond 684 with where they would be located in the main document. 685 686 BASE.7.2.SORT. SORT Response 687 688 Data: zero or more numbers 689 690 The SORT response occurs as a result of a SORT or UID SORT 691 command. The number(s) refer to those messages that match the 692 search criteria. For SORT, these are message sequence numbers; 693 for UID SORT, these are unique identifiers. Each number is 694 delimited by a space. 695 696 Example: S: * SORT 2 3 6 697 698 BASE.7.2.THREAD. THREAD Response 699 700 Data: zero or more threads 701 702 The THREAD response occurs as a result of a THREAD or UID THREAD 703 command. It contains zero or more threads. A thread consists of 704 a parenthesized list of thread members. 705 706 Thread members consist of zero or more message numbers, delimited 707 by spaces, indicating successive parent and child. This continues 708 until the thread splits into multiple sub-threads, at which point, 709 the thread nests into multiple sub-threads with the first member 710 of each sub-thread being siblings at this level. There is no 711 limit to the nesting of threads. 712 713 The messages numbers refer to those messages that match the search 714 criteria. For THREAD, these are message sequence numbers; for UID 715 THREAD, these are unique identifiers. 716 717 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 718 719 The first thread consists only of message 2. The second thread 720 consists of the messages 3 (parent) and 6 (child), after which it 721 splits into two sub-threads; the first of which contains messages 722 4 (child of 6, sibling of 44) and 23 (child of 4), and the second 723 of which contains messages 44 (child of 6, sibling of 4), 7 (child 724 of 44), and 96 (child of 7). Since some later messages are 725 parents of earlier messages, the messages were probably moved from 726 some other mailbox at different times. 727 728 729 730 Crispin & Murchison Standards Track [Page 13] 731 732 RFC 5256 IMAP Sort June 2008 733 734 735 -- 2 736 737 -- 3 738 \-- 6 739 |-- 4 740 | \-- 23 741 | 742 \-- 44 743 \-- 7 744 \-- 96 745 746 Example: S: * THREAD ((3)(5)) 747 748 In this example, 3 and 5 are siblings of a parent that does not 749 match the search criteria (and/or does not exist in the mailbox); 750 however they are members of the same thread. 751 752 5. Formal Syntax of SORT and THREAD Commands and Responses 753 754 The following syntax specification uses the Augmented Backus-Naur 755 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF] 756 rules defined in [IMAP]. 757 758 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria 759 760 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")" 761 762 sort-criterion = ["REVERSE" SP] sort-key 763 764 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / 765 "SUBJECT" / "TO" 766 767 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria 768 769 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext 770 771 thread-alg-ext = atom 772 ; New algorithms MUST be registered with IANA 773 774 search-criteria = charset 1*(SP search-key) 775 776 charset = atom / quoted 777 ; CHARSET values MUST be registered with IANA 778 779 sort-data = "SORT" *(SP nz-number) 780 781 thread-data = "THREAD" [SP 1*thread-list] 782 783 784 785 786 Crispin & Murchison Standards Track [Page 14] 787 788 RFC 5256 IMAP Sort June 2008 789 790 791 thread-list = "(" (thread-members / thread-nested) ")" 792 793 thread-members = nz-number *(SP nz-number) [SP thread-nested] 794 795 thread-nested = 2*thread-list 796 797 The following syntax describes base subject extraction rules (2)-(6): 798 799 subject = *subj-leader [subj-middle] *subj-trailer 800 801 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 802 803 subj-blob = "[" *BLOBCHAR "]" *WSP 804 805 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 806 807 subj-fwd-hdr = "[fwd:" 808 809 subj-fwd-trl = "]" 810 811 subj-leader = (*subj-blob subj-refwd) / WSP 812 813 subj-middle = *subj-blob (subj-base / subj-fwd) 814 ; last subj-blob is subj-base if subj-base would 815 ; otherwise be empty 816 817 subj-trailer = "(fwd)" / WSP 818 819 subj-base = NONWSP *(*WSP NONWSP) 820 ; can be a subj-blob 821 822 BLOBCHAR = %x01-5a / %x5c / %x5e-ff 823 ; any CHAR8 except '[' and ']'. 824 ; SHOULD comply with [UTF-8] 825 826 NONWSP = %x01-08 / %x0a-1f / %x21-ff 827 ; any CHAR8 other than WSP. 828 ; SHOULD comply with [UTF-8] 829 830 6. Security Considerations 831 832 The SORT and THREAD extensions do not raise any security 833 considerations that are not present in the base [IMAP] protocol, and 834 these issues are discussed in [IMAP]. Nevertheless, it is important 835 to remember that [IMAP] protocol transactions, including message 836 data, are sent in the clear over the network unless protection from 837 snooping is negotiated, either by the use of STARTTLS, privacy 838 protection in AUTHENTICATE, or some other protection mechanism. 839 840 841 842 Crispin & Murchison Standards Track [Page 15] 843 844 RFC 5256 IMAP Sort June 2008 845 846 847 Although not a security consideration, it is important to recognize 848 that sorting by REFERENCES can lead to misleading threading trees. 849 For example, a message with false References: header data will cause 850 a thread to be incorporated into another thread. 851 852 The process of extracting the base subject may lead to incorrect 853 collation if the extracted data was significant text as opposed to a 854 subject artifact. 855 856 7. Internationalization Considerations 857 858 As stated in the introduction, the rules of I18NLEVEL=1 as described 859 in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD 860 extensions MUST collate strings according to the i;unicode-casemap 861 collation described in [UNICASEMAP]. Servers SHOULD also advertise 862 the I18NLEVEL=1 extension. Alternatively, a server MAY implement 863 I18NLEVEL=2 (or higher) and comply with the rules of that level. 864 865 As discussed in [IMAP-I18N] section 4.5, all server implementations 866 should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2 867 extension. 868 869 Translations of the "re" or "fw"/"fwd" tokens are not specified for 870 removal in the base subject extraction process. An attempt to add 871 such translated tokens would result in a geometrically complex, and 872 ultimately unimplementable, task. 873 874 Instead, note that [RFC2822] section 3.6.5 recommends that "re:" 875 (from the Latin "res", meaning "in the matter of") be used to 876 identify a reply. Although it is evident that, from the multiple 877 forms of token to identify a forwarded message, there is considerable 878 variation found in the wild, the variations are (still) manageable. 879 Consequently, it is suggested that "re:" and one of the variations of 880 the tokens for a forward supported by the base subject extraction 881 rules be adopted for Internet mail messages, since doing so makes it 882 a simple display-time task to localize the token language for the 883 user. 884 885 8. IANA Considerations 886 887 [IMAP] capabilities are registered by publishing a standards track or 888 IESG-approved experimental RFC. This document constitutes 889 registration of the SORT and THREAD capabilities in the [IMAP] 890 capabilities registry. 891 892 893 894 895 896 897 898 Crispin & Murchison Standards Track [Page 16] 899 900 RFC 5256 IMAP Sort June 2008 901 902 903 This document creates a new [IMAP] threading algorithms registry, 904 which registers threading algorithms by publishing a standards track 905 or IESG-approved experimental RFC. This document constitutes 906 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that 907 registry. 908 909 9. Normative References 910 911 [ABNF] Crocker, D., Ed., and P. Overell, "Augmented BNF for 912 Syntax Specifications: ABNF", STD 68, RFC 5234, January 913 2008. 914 915 [CHARSET] Freed, N. and J. Postel, "IANA Charset Registration 916 Procedures", BCP 19, RFC 2978, October 2000. 917 918 [IMAP] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - 919 VERSION 4rev1", RFC 3501, March 2003. 920 921 [IMAP-I18N] Newman, C., Gulbrandsen, A., and A. Melnikov, "Internet 922 Message Access Protocol Internationalization", RFC 923 5255, June 2008. 924 925 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 926 Requirement Levels", BCP 14, RFC 2119, March 1997. 927 928 [RFC2822] Resnick, P., Ed., "Internet Message Format", RFC 2822, 929 April 2001. 930 931 [UNICASEMAP] Crispin, M., "i;unicode-casemap - Simple Unicode 932 Collation Algorithm", RFC 5051, October 2007. 933 934 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 935 10646", STD 63, RFC 3629, November 2003. 936 937 10. Informative References 938 939 [IMAP-MODELS] Crispin, M., "Distributed Electronic Mail Models in 940 IMAP4", RFC 1733, December 1994. 941 942 [THREADING] Zawinski, J. "Message Threading", 943 http://www.jwz.org/doc/threading.html, 1997-2002. 944 945 946 947 948 949 950 951 952 953 954 Crispin & Murchison Standards Track [Page 17] 955 956 RFC 5256 IMAP Sort June 2008 957 958 959 Authors' Addresses 960 961 Mark R. Crispin 962 Panda Programming 963 6158 NE Lariat Loop 964 Bainbridge Island, WA 98110-2098 965 966 Phone: +1 (206) 842-2385 967 EMail: IMAP+SORT+THREAD@Lingling.Panda.COM 968 969 970 Kenneth Murchison 971 Carnegie Mellon University 972 5000 Forbes Avenue 973 Cyert Hall 285 974 Pittsburgh, PA 15213 975 976 Phone: +1 (412) 268-2638 977 EMail: murch@andrew.cmu.edu 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 Crispin & Murchison Standards Track [Page 18] 1011 1012 RFC 5256 IMAP Sort June 2008 1013 1014 1015 Full Copyright Statement 1016 1017 Copyright (C) The IETF Trust (2008). 1018 1019 This document is subject to the rights, licenses and restrictions 1020 contained in BCP 78, and except as set forth therein, the authors 1021 retain all their rights. 1022 1023 This document and the information contained herein are provided on an 1024 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1025 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1026 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1027 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1028 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1029 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1030 1031 Intellectual Property 1032 1033 The IETF takes no position regarding the validity or scope of any 1034 Intellectual Property Rights or other rights that might be claimed to 1035 pertain to the implementation or use of the technology described in 1036 this document or the extent to which any license under such rights 1037 might or might not be available; nor does it represent that it has 1038 made any independent effort to identify any such rights. Information 1039 on the procedures with respect to rights in RFC documents can be 1040 found in BCP 78 and BCP 79. 1041 1042 Copies of IPR disclosures made to the IETF Secretariat and any 1043 assurances of licenses to be made available, or the result of an 1044 attempt made to obtain a general license or permission for the use of 1045 such proprietary rights by implementers or users of this 1046 specification can be obtained from the IETF on-line IPR repository at 1047 http://www.ietf.org/ipr. 1048 1049 The IETF invites any interested party to bring to its attention any 1050 copyrights, patents or patent applications, or other proprietary 1051 rights that may cover technology that may be required to implement 1052 this standard. Please address the information to the IETF at 1053 ietf-ipr@ietf.org. 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 Crispin & Murchison Standards Track [Page 19] 1067