rfc2231.txt (19280B)
1 2 3 4 5 6 7 Network Working Group N. Freed 8 Request for Comments: 2231 Innosoft 9 Updates: 2045, 2047, 2183 K. Moore 10 Obsoletes: 2184 University of Tennessee 11 Category: Standards Track November 1997 12 13 14 MIME Parameter Value and Encoded Word Extensions: 15 Character Sets, Languages, and Continuations 16 17 18 Status of this Memo 19 20 This document specifies an Internet standards track protocol for the 21 Internet community, and requests discussion and suggestions for 22 improvements. Please refer to the current edition of the "Internet 23 Official Protocol Standards" (STD 1) for the standardization state 24 and status of this protocol. Distribution of this memo is unlimited. 25 26 Copyright Notice 27 28 Copyright (C) The Internet Society (1997). All Rights Reserved. 29 30 1. Abstract 31 32 This memo defines extensions to the RFC 2045 media type and RFC 2183 33 disposition parameter value mechanisms to provide 34 35 (1) a means to specify parameter values in character sets 36 other than US-ASCII, 37 38 (2) to specify the language to be used should the value be 39 displayed, and 40 41 (3) a continuation mechanism for long parameter values to 42 avoid problems with header line wrapping. 43 44 This memo also defines an extension to the encoded words defined in 45 RFC 2047 to allow the specification of the language to be used for 46 display as well as the character set. 47 48 2. Introduction 49 50 The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC- 51 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that 52 allows for: 53 54 55 56 57 58 Freed & Moore Standards Track [Page 1] 59 60 RFC 2231 MIME Value and Encoded Word Extensions November 1997 61 62 63 (1) textual message bodies in character sets other than 64 US-ASCII, 65 66 (2) non-textual message bodies, 67 68 (3) multi-part message bodies, and 69 70 (4) textual header information in character sets other than 71 US-ASCII. 72 73 MIME is now widely deployed and is used by a variety of Internet 74 protocols, including, of course, Internet email. However, MIME's 75 success has resulted in the need for additional mechanisms that were 76 not provided in the original protocol specification. 77 78 In particular, existing MIME mechanisms provide for named media type 79 (content-type field) parameters as well as named disposition 80 (content-disposition field). A MIME media type may specify any 81 number of parameters associated with all of its subtypes, and any 82 specific subtype may specify additional parameters for its own use. A 83 MIME disposition value may specify any number of associated 84 parameters, the most important of which is probably the attachment 85 disposition's filename parameter. 86 87 These parameter names and values end up appearing in the content-type 88 and content-disposition header fields in Internet email. This 89 inherently imposes three crucial limitations: 90 91 (1) Lines in Internet email header fields are folded 92 according to RFC 822 folding rules. This makes long 93 parameter values problematic. 94 95 (2) MIME headers, like the RFC 822 headers they often 96 appear in, are limited to 7bit US-ASCII, and the 97 encoded-word mechanisms of RFC 2047 are not available 98 to parameter values. This makes it impossible to have 99 parameter values in character sets other than US-ASCII 100 without specifying some sort of private per-parameter 101 encoding. 102 103 (3) It has recently become clear that character set 104 information is not sufficient to properly display some 105 sorts of information -- language information is also 106 needed [RFC-2130]. For example, support for 107 handicapped users may require reading text string 108 109 110 111 112 113 114 Freed & Moore Standards Track [Page 2] 115 116 RFC 2231 MIME Value and Encoded Word Extensions November 1997 117 118 119 aloud. The language the text is written in is needed 120 for this to be done correctly. Some parameter values 121 may need to be displayed, hence there is a need to 122 allow for the inclusion of language information. 123 124 The last problem on this list is also an issue for the encoded words 125 defined by RFC 2047, as encoded words are intended primarily for 126 display purposes. 127 128 This document defines extensions that address all of these 129 limitations. All of these extensions are implemented in a fashion 130 that is completely compatible at a syntactic level with existing MIME 131 implementations. In addition, the extensions are designed to have as 132 little impact as possible on existing uses of MIME. 133 134 IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when 135 they actually are used. As such, these mechanisms should not be used 136 lightly; they should be reserved for situations where a real need for 137 them exists. 138 139 2.1. Requirements notation 140 141 This document occasionally uses terms that appear in capital letters. 142 When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY" 143 appear capitalized, they are being used to indicate particular 144 requirements of this specification. A discussion of the meanings of 145 these terms appears in [RFC- 2119]. 146 147 3. Parameter Value Continuations 148 149 Long MIME media type or disposition parameter values do not interact 150 well with header line wrapping conventions. In particular, proper 151 header line wrapping depends on there being places where linear 152 whitespace (LWSP) is allowed, which may or may not be present in a 153 parameter value, and even if present may not be recognizable as such 154 since specific knowledge of parameter value syntax may not be 155 available to the agent doing the line wrapping. The result is that 156 long parameter values may end up getting truncated or otherwise 157 damaged by incorrect line wrapping implementations. 158 159 A mechanism is therefore needed to break up parameter values into 160 smaller units that are amenable to line wrapping. Any such mechanism 161 MUST be compatible with existing MIME processors. This means that 162 163 (1) the mechanism MUST NOT change the syntax of MIME media 164 type and disposition lines, and 165 166 167 168 169 170 Freed & Moore Standards Track [Page 3] 171 172 RFC 2231 MIME Value and Encoded Word Extensions November 1997 173 174 175 (2) the mechanism MUST NOT depend on parameter ordering 176 since MIME states that parameters are not order 177 sensitive. Note that while MIME does prohibit 178 modification of MIME headers during transport, it is 179 still possible that parameters will be reordered when 180 user agent level processing is done. 181 182 The obvious solution, then, is to use multiple parameters to contain 183 a single parameter value and to use some kind of distinguished name 184 to indicate when this is being done. And this obvious solution is 185 exactly what is specified here: The asterisk character ("*") followed 186 by a decimal count is employed to indicate that multiple parameters 187 are being used to encapsulate a single parameter value. The count 188 starts at 0 and increments by 1 for each subsequent section of the 189 parameter value. Decimal values are used and neither leading zeroes 190 nor gaps in the sequence are allowed. 191 192 The original parameter value is recovered by concatenating the 193 various sections of the parameter, in order. For example, the 194 content-type field 195 196 Content-Type: message/external-body; access-type=URL; 197 URL*0="ftp://"; 198 URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" 199 200 is semantically identical to 201 202 Content-Type: message/external-body; access-type=URL; 203 URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" 204 205 Note that quotes around parameter values are part of the value 206 syntax; they are NOT part of the value itself. Furthermore, it is 207 explicitly permitted to have a mixture of quoted and unquoted 208 continuation fields. 209 210 4. Parameter Value Character Set and Language Information 211 212 Some parameter values may need to be qualified with character set or 213 language information. It is clear that a distinguished parameter 214 name is needed to identify when this information is present along 215 with a specific syntax for the information in the value itself. In 216 addition, a lightweight encoding mechanism is needed to accommodate 8 217 bit information in parameter values. 218 219 220 221 222 223 224 225 226 Freed & Moore Standards Track [Page 4] 227 228 RFC 2231 MIME Value and Encoded Word Extensions November 1997 229 230 231 Asterisks ("*") are reused to provide the indicator that language and 232 character set information is present and encoding is being used. A 233 single quote ("'") is used to delimit the character set and language 234 information at the beginning of the parameter value. Percent signs 235 ("%") are used as the encoding flag, which agrees with RFC 2047. 236 237 Specifically, an asterisk at the end of a parameter name acts as an 238 indicator that character set and language information may appear at 239 the beginning of the parameter value. A single quote is used to 240 separate the character set, language, and actual value information in 241 the parameter value string, and an percent sign is used to flag 242 octets encoded in hexadecimal. For example: 243 244 Content-Type: application/x-stuff; 245 title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A 246 247 Note that it is perfectly permissible to leave either the character 248 set or language field blank. Note also that the single quote 249 delimiters MUST be present even when one of the field values is 250 omitted. This is done when either character set, language, or both 251 are not relevant to the parameter value at hand. This MUST NOT be 252 done in order to indicate a default character set or language -- 253 parameter field definitions MUST NOT assign a default character set 254 or language. 255 256 4.1. Combining Character Set, Language, and Parameter Continuations 257 258 Character set and language information may be combined with the 259 parameter continuation mechanism. For example: 260 261 Content-Type: application/x-stuff 262 title*0*=us-ascii'en'This%20is%20even%20more%20 263 title*1*=%2A%2A%2Afun%2A%2A%2A%20 264 title*2="isn't it!" 265 266 Note that: 267 268 (1) Language and character set information only appear at 269 the beginning of a given parameter value. 270 271 (2) Continuations do not provide a facility for using more 272 than one character set or language in the same 273 parameter value. 274 275 (3) A value presented using multiple continuations may 276 contain a mixture of encoded and unencoded segments. 277 278 279 280 281 282 Freed & Moore Standards Track [Page 5] 283 284 RFC 2231 MIME Value and Encoded Word Extensions November 1997 285 286 287 (4) The first segment of a continuation MUST be encoded if 288 language and character set information are given. 289 290 (5) If the first segment of a continued parameter value is 291 encoded the language and character set field delimiters 292 MUST be present even when the fields are left blank. 293 294 5. Language specification in Encoded Words 295 296 RFC 2047 provides support for non-US-ASCII character sets in RFC 822 297 message header comments, phrases, and any unstructured text field. 298 This is done by defining an encoded word construct which can appear 299 in any of these places. Given that these are fields intended for 300 display, it is sometimes necessary to associate language information 301 with encoded words as well as just the character set. This 302 specification extends the definition of an encoded word to allow the 303 inclusion of such information. This is simply done by suffixing the 304 character set specification with an asterisk followed by the language 305 tag. For example: 306 307 From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu> 308 309 6. IMAP4 Handling of Parameter Values 310 311 IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations 312 when generating the BODY and BODYSTRUCTURE fetch attributes. 313 314 7. Modifications to MIME ABNF 315 316 The ABNF for MIME parameter values given in RFC 2045 is: 317 318 parameter := attribute "=" value 319 320 attribute := token 321 ; Matching of attributes 322 ; is ALWAYS case-insensitive. 323 324 This specification changes this ABNF to: 325 326 parameter := regular-parameter / extended-parameter 327 328 regular-parameter := regular-parameter-name "=" value 329 330 regular-parameter-name := attribute [section] 331 332 attribute := 1*attribute-char 333 334 335 336 337 338 Freed & Moore Standards Track [Page 6] 339 340 RFC 2231 MIME Value and Encoded Word Extensions November 1997 341 342 343 attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs, 344 "*", "'", "%", or tspecials> 345 346 section := initial-section / other-sections 347 348 initial-section := "*0" 349 350 other-sections := "*" ("1" / "2" / "3" / "4" / "5" / 351 "6" / "7" / "8" / "9") *DIGIT) 352 353 extended-parameter := (extended-initial-name "=" 354 extended-value) / 355 (extended-other-names "=" 356 extended-other-values) 357 358 extended-initial-name := attribute [initial-section] "*" 359 360 extended-other-names := attribute other-sections "*" 361 362 extended-initial-value := [charset] "'" [language] "'" 363 extended-other-values 364 365 extended-other-values := *(ext-octet / attribute-char) 366 367 ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 368 369 charset := <registered character set name> 370 371 language := <registered language tag [RFC-1766]> 372 373 The ABNF given in RFC 2047 for encoded-words is: 374 375 encoded-word := "=?" charset "?" encoding "?" encoded-text "?=" 376 377 This specification changes this ABNF to: 378 379 encoded-word := "=?" charset ["*" language] "?" encoded-text "?=" 380 381 8. Character sets which allow specification of language 382 383 In the future it is likely that some character sets will provide 384 facilities for inline language labeling. Such facilities are 385 inherently more flexible than those defined here as they allow for 386 language switching in the middle of a string. 387 388 389 390 391 392 393 394 Freed & Moore Standards Track [Page 7] 395 396 RFC 2231 MIME Value and Encoded Word Extensions November 1997 397 398 399 If and when such facilities are developed they SHOULD be used in 400 preference to the language labeling facilities specified here. Note 401 that all the mechanisms defined here allow for the omission of 402 language labels so as to be able to accommodate this possible future 403 usage. 404 405 9. Security Considerations 406 407 This RFC does not discuss security issues and is not believed to 408 raise any security issues not already endemic in electronic mail and 409 present in fully conforming implementations of MIME. 410 411 10. References 412 413 [RFC-822] 414 Crocker, D., "Standard for the Format of ARPA Internet 415 Text Messages", STD 11, RFC 822 August 1982. 416 417 [RFC-1766] 418 Alvestrand, H., "Tags for the Identification of 419 Languages", RFC 1766, March 1995. 420 421 [RFC-2045] 422 Freed, N., and N. Borenstein, "Multipurpose Internet Mail 423 Extensions (MIME) Part One: Format of Internet Message 424 Bodies", RFC 2045, December 1996. 425 426 [RFC-2046] 427 Freed, N. and N. Borenstein, "Multipurpose Internet Mail 428 Extensions (MIME) Part Two: Media Types", RFC 2046, 429 December 1996. 430 431 [RFC-2047] 432 Moore, K., "Multipurpose Internet Mail Extensions (MIME) 433 Part Three: Representation of Non-ASCII Text in Internet 434 Message Headers", RFC 2047, December 1996. 435 436 [RFC-2048] 437 Freed, N., Klensin, J. and J. Postel, "Multipurpose 438 Internet Mail Extensions (MIME) Part Four: MIME 439 Registration Procedures", RFC 2048, December 1996. 440 441 [RFC-2049] 442 Freed, N. and N. Borenstein, "Multipurpose Internet Mail 443 Extensions (MIME) Part Five: Conformance Criteria and 444 Examples", RFC 2049, December 1996. 445 446 447 448 449 450 Freed & Moore Standards Track [Page 8] 451 452 RFC 2231 MIME Value and Encoded Word Extensions November 1997 453 454 455 [RFC-2060] 456 Crispin, M., "Internet Message Access Protocol - Version 457 4rev1", RFC 2060, December 1996. 458 459 [RFC-2119] 460 Bradner, S., "Key words for use in RFCs to Indicate 461 Requirement Levels", RFC 2119, March 1997. 462 463 [RFC-2130] 464 Weider, C., Preston, C., Simonsen, K., Alvestrand, H., 465 Atkinson, R., Crispin, M., and P. Svanberg, "Report from the 466 IAB Character Set Workshop", RFC 2130, April 1997. 467 468 [RFC-2183] 469 Troost, R., Dorner, S. and K. Moore, "Communicating 470 Presentation Information in Internet Messages: The 471 Content-Disposition Header", RFC 2183, August 1997. 472 473 11. Authors' Addresses 474 475 Ned Freed 476 Innosoft International, Inc. 477 1050 Lakes Drive 478 West Covina, CA 91790 479 USA 480 481 Phone: +1 626 919 3600 482 Fax: +1 626 919 3614 483 EMail: ned.freed@innosoft.com 484 485 486 Keith Moore 487 Computer Science Dept. 488 University of Tennessee 489 107 Ayres Hall 490 Knoxville, TN 37996-1301 491 USA 492 493 EMail: moore@cs.utk.edu 494 495 496 497 498 499 500 501 502 503 504 505 506 Freed & Moore Standards Track [Page 9] 507 508 RFC 2231 MIME Value and Encoded Word Extensions November 1997 509 510 511 12. Full Copyright Statement 512 513 Copyright (C) The Internet Society (1997). All Rights Reserved. 514 515 This document and translations of it may be copied and furnished to 516 others, and derivative works that comment on or otherwise explain it 517 or assist in its implementation may be prepared, copied, published 518 and distributed, in whole or in part, without restriction of any 519 kind, provided that the above copyright notice and this paragraph are 520 included on all such copies and derivative works. However, this 521 document itself may not be modified in any way, such as by removing 522 the copyright notice or references to the Internet Society or other 523 Internet organizations, except as needed for the purpose of 524 developing Internet standards in which case the procedures for 525 copyrights defined in the Internet Standards process must be 526 followed, or as required to translate it into languages other than 527 English. 528 529 The limited permissions granted above are perpetual and will not be 530 revoked by the Internet Society or its successors or assigns. 531 532 This document and the information contained herein is provided on an 533 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 534 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 535 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 536 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 537 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 Freed & Moore Standards Track [Page 10] 563