rfc2045.txt (72932B)
1 2 3 4 5 6 7 Network Working Group N. Freed 8 Request for Comments: 2045 Innosoft 9 Obsoletes: 1521, 1522, 1590 N. Borenstein 10 Category: Standards Track First Virtual 11 November 1996 12 13 14 Multipurpose Internet Mail Extensions 15 (MIME) Part One: 16 Format of Internet Message Bodies 17 18 Status of this Memo 19 20 This document specifies an Internet standards track protocol for the 21 Internet community, and requests discussion and suggestions for 22 improvements. Please refer to the current edition of the "Internet 23 Official Protocol Standards" (STD 1) for the standardization state 24 and status of this protocol. Distribution of this memo is unlimited. 25 26 Abstract 27 28 STD 11, RFC 822, defines a message representation protocol specifying 29 considerable detail about US-ASCII message headers, and leaves the 30 message content, or message body, as flat US-ASCII text. This set of 31 documents, collectively called the Multipurpose Internet Mail 32 Extensions, or MIME, redefines the format of messages to allow for 33 34 (1) textual message bodies in character sets other than 35 US-ASCII, 36 37 (2) an extensible set of different formats for non-textual 38 message bodies, 39 40 (3) multi-part message bodies, and 41 42 (4) textual header information in character sets other than 43 US-ASCII. 44 45 These documents are based on earlier work documented in RFC 934, STD 46 11, and RFC 1049, but extends and revises them. Because RFC 822 said 47 so little about message bodies, these documents are largely 48 orthogonal to (rather than a revision of) RFC 822. 49 50 This initial document specifies the various headers used to describe 51 the structure of MIME messages. The second document, RFC 2046, 52 defines the general structure of the MIME media typing system and 53 defines an initial set of media types. The third document, RFC 2047, 54 describes extensions to RFC 822 to allow non-US-ASCII text data in 55 56 57 58 Freed & Borenstein Standards Track [Page 1] 59 60 RFC 2045 Internet Message Bodies November 1996 61 62 63 Internet mail header fields. The fourth document, RFC 2048, specifies 64 various IANA registration procedures for MIME-related facilities. The 65 fifth and final document, RFC 2049, describes MIME conformance 66 criteria as well as providing some illustrative examples of MIME 67 message formats, acknowledgements, and the bibliography. 68 69 These documents are revisions of RFCs 1521, 1522, and 1590, which 70 themselves were revisions of RFCs 1341 and 1342. An appendix in RFC 71 2049 describes differences and changes from previous versions. 72 73 Table of Contents 74 75 1. Introduction ......................................... 3 76 2. Definitions, Conventions, and Generic BNF Grammar .... 5 77 2.1 CRLF ................................................ 5 78 2.2 Character Set ....................................... 6 79 2.3 Message ............................................. 6 80 2.4 Entity .............................................. 6 81 2.5 Body Part ........................................... 7 82 2.6 Body ................................................ 7 83 2.7 7bit Data ........................................... 7 84 2.8 8bit Data ........................................... 7 85 2.9 Binary Data ......................................... 7 86 2.10 Lines .............................................. 7 87 3. MIME Header Fields ................................... 8 88 4. MIME-Version Header Field ............................ 8 89 5. Content-Type Header Field ............................ 10 90 5.1 Syntax of the Content-Type Header Field ............. 12 91 5.2 Content-Type Defaults ............................... 14 92 6. Content-Transfer-Encoding Header Field ............... 14 93 6.1 Content-Transfer-Encoding Syntax .................... 14 94 6.2 Content-Transfer-Encodings Semantics ................ 15 95 6.3 New Content-Transfer-Encodings ...................... 16 96 6.4 Interpretation and Use .............................. 16 97 6.5 Translating Encodings ............................... 18 98 6.6 Canonical Encoding Model ............................ 19 99 6.7 Quoted-Printable Content-Transfer-Encoding .......... 19 100 6.8 Base64 Content-Transfer-Encoding .................... 24 101 7. Content-ID Header Field .............................. 26 102 8. Content-Description Header Field ..................... 27 103 9. Additional MIME Header Fields ........................ 27 104 10. Summary ............................................. 27 105 11. Security Considerations ............................. 27 106 12. Authors' Addresses .................................. 28 107 A. Collected Grammar .................................... 29 108 109 110 111 112 113 114 Freed & Borenstein Standards Track [Page 2] 115 116 RFC 2045 Internet Message Bodies November 1996 117 118 119 1. Introduction 120 121 Since its publication in 1982, RFC 822 has defined the standard 122 format of textual mail messages on the Internet. Its success has 123 been such that the RFC 822 format has been adopted, wholly or 124 partially, well beyond the confines of the Internet and the Internet 125 SMTP transport defined by RFC 821. As the format has seen wider use, 126 a number of limitations have proven increasingly restrictive for the 127 user community. 128 129 RFC 822 was intended to specify a format for text messages. As such, 130 non-text messages, such as multimedia messages that might include 131 audio or images, are simply not mentioned. Even in the case of text, 132 however, RFC 822 is inadequate for the needs of mail users whose 133 languages require the use of character sets richer than US-ASCII. 134 Since RFC 822 does not specify mechanisms for mail containing audio, 135 video, Asian language text, or even text in most European languages, 136 additional specifications are needed. 137 138 One of the notable limitations of RFC 821/822 based mail systems is 139 the fact that they limit the contents of electronic mail messages to 140 relatively short lines (e.g. 1000 characters or less [RFC-821]) of 141 7bit US-ASCII. This forces users to convert any non-textual data 142 that they may wish to send into seven-bit bytes representable as 143 printable US-ASCII characters before invoking a local mail UA (User 144 Agent, a program with which human users send and receive mail). 145 Examples of such encodings currently used in the Internet include 146 pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 147 RFC 1421, the Andrew Toolkit Representation [ATK], and many others. 148 149 The limitations of RFC 822 mail become even more apparent as gateways 150 are designed to allow for the exchange of mail messages between RFC 151 822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the 152 inclusion of non-textual material within electronic mail messages. 153 The current standards for the mapping of X.400 messages to RFC 822 154 messages specify either that X.400 non-textual material must be 155 converted to (not encoded in) IA5Text format, or that they must be 156 discarded, notifying the RFC 822 user that discarding has occurred. 157 This is clearly undesirable, as information that a user may wish to 158 receive is lost. Even though a user agent may not have the 159 capability of dealing with the non-textual material, the user might 160 have some mechanism external to the UA that can extract useful 161 information from the material. Moreover, it does not allow for the 162 fact that the message may eventually be gatewayed back into an X.400 163 message handling system (i.e., the X.400 message is "tunneled" 164 through Internet mail), where the non-textual information would 165 definitely become useful again. 166 167 168 169 170 Freed & Borenstein Standards Track [Page 3] 171 172 RFC 2045 Internet Message Bodies November 1996 173 174 175 This document describes several mechanisms that combine to solve most 176 of these problems without introducing any serious incompatibilities 177 with the existing world of RFC 822 mail. In particular, it 178 describes: 179 180 (1) A MIME-Version header field, which uses a version 181 number to declare a message to be conformant with MIME 182 and allows mail processing agents to distinguish 183 between such messages and those generated by older or 184 non-conformant software, which are presumed to lack 185 such a field. 186 187 (2) A Content-Type header field, generalized from RFC 1049, 188 which can be used to specify the media type and subtype 189 of data in the body of a message and to fully specify 190 the native representation (canonical form) of such 191 data. 192 193 (3) A Content-Transfer-Encoding header field, which can be 194 used to specify both the encoding transformation that 195 was applied to the body and the domain of the result. 196 Encoding transformations other than the identity 197 transformation are usually applied to data in order to 198 allow it to pass through mail transport mechanisms 199 which may have data or character set limitations. 200 201 (4) Two additional header fields that can be used to 202 further describe the data in a body, the Content-ID and 203 Content-Description header fields. 204 205 All of the header fields defined in this document are subject to the 206 general syntactic rules for header fields specified in RFC 822. In 207 particular, all of these header fields except for Content-Disposition 208 can include RFC 822 comments, which have no semantic content and 209 should be ignored during MIME processing. 210 211 Finally, to specify and promote interoperability, RFC 2049 provides a 212 basic applicability statement for a subset of the above mechanisms 213 that defines a minimal level of "conformance" with this document. 214 215 HISTORICAL NOTE: Several of the mechanisms described in this set of 216 documents may seem somewhat strange or even baroque at first reading. 217 It is important to note that compatibility with existing standards 218 AND robustness across existing practice were two of the highest 219 priorities of the working group that developed this set of documents. 220 In particular, compatibility was always favored over elegance. 221 222 223 224 225 226 Freed & Borenstein Standards Track [Page 4] 227 228 RFC 2045 Internet Message Bodies November 1996 229 230 231 Please refer to the current edition of the "Internet Official 232 Protocol Standards" for the standardization state and status of this 233 protocol. RFC 822 and STD 3, RFC 1123 also provide essential 234 background for MIME since no conforming implementation of MIME can 235 violate them. In addition, several other informational RFC documents 236 will be of interest to the MIME implementor, in particular RFC 1344, 237 RFC 1345, and RFC 1524. 238 239 2. Definitions, Conventions, and Generic BNF Grammar 240 241 Although the mechanisms specified in this set of documents are all 242 described in prose, most are also described formally in the augmented 243 BNF notation of RFC 822. Implementors will need to be familiar with 244 this notation in order to understand this set of documents, and are 245 referred to RFC 822 for a complete explanation of the augmented BNF 246 notation. 247 248 Some of the augmented BNF in this set of documents makes named 249 references to syntax rules defined in RFC 822. A complete formal 250 grammar, then, is obtained by combining the collected grammar 251 appendices in each document in this set with the BNF of RFC 822 plus 252 the modifications to RFC 822 defined in RFC 1123 (which specifically 253 changes the syntax for `return', `date' and `mailbox'). 254 255 All numeric and octet values are given in decimal notation in this 256 set of documents. All media type values, subtype values, and 257 parameter names as defined are case-insensitive. However, parameter 258 values are case-sensitive unless otherwise specified for the specific 259 parameter. 260 261 FORMATTING NOTE: Notes, such at this one, provide additional 262 nonessential information which may be skipped by the reader without 263 missing anything essential. The primary purpose of these non- 264 essential notes is to convey information about the rationale of this 265 set of documents, or to place these documents in the proper 266 historical or evolutionary context. Such information may in 267 particular be skipped by those who are focused entirely on building a 268 conformant implementation, but may be of use to those who wish to 269 understand why certain design choices were made. 270 271 2.1. CRLF 272 273 The term CRLF, in this set of documents, refers to the sequence of 274 octets corresponding to the two US-ASCII characters CR (decimal value 275 13) and LF (decimal value 10) which, taken together, in this order, 276 denote a line break in RFC 822 mail. 277 278 279 280 281 282 Freed & Borenstein Standards Track [Page 5] 283 284 RFC 2045 Internet Message Bodies November 1996 285 286 287 2.2. Character Set 288 289 The term "character set" is used in MIME to refer to a method of 290 converting a sequence of octets into a sequence of characters. Note 291 that unconditional and unambiguous conversion in the other direction 292 is not required, in that not all characters may be representable by a 293 given character set and a character set may provide more than one 294 sequence of octets to represent a particular sequence of characters. 295 296 This definition is intended to allow various kinds of character 297 encodings, from simple single-table mappings such as US-ASCII to 298 complex table switching methods such as those that use ISO 2022's 299 techniques, to be used as character sets. However, the definition 300 associated with a MIME character set name must fully specify the 301 mapping to be performed. In particular, use of external profiling 302 information to determine the exact mapping is not permitted. 303 304 NOTE: The term "character set" was originally to describe such 305 straightforward schemes as US-ASCII and ISO-8859-1 which have a 306 simple one-to-one mapping from single octets to single characters. 307 Multi-octet coded character sets and switching techniques make the 308 situation more complex. For example, some communities use the term 309 "character encoding" for what MIME calls a "character set", while 310 using the phrase "coded character set" to denote an abstract mapping 311 from integers (not octets) to characters. 312 313 2.3. Message 314 315 The term "message", when not further qualified, means either a 316 (complete or "top-level") RFC 822 message being transferred on a 317 network, or a message encapsulated in a body of type "message/rfc822" 318 or "message/partial". 319 320 2.4. Entity 321 322 The term "entity", refers specifically to the MIME-defined header 323 fields and contents of either a message or one of the parts in the 324 body of a multipart entity. The specification of such entities is 325 the essence of MIME. Since the contents of an entity are often 326 called the "body", it makes sense to speak about the body of an 327 entity. Any sort of field may be present in the header of an entity, 328 but only those fields whose names begin with "content-" actually have 329 any MIME-related meaning. Note that this does NOT imply thay they 330 have no meaning at all -- an entity that is also a message has non- 331 MIME header fields whose meanings are defined by RFC 822. 332 333 334 335 336 337 338 Freed & Borenstein Standards Track [Page 6] 339 340 RFC 2045 Internet Message Bodies November 1996 341 342 343 2.5. Body Part 344 345 The term "body part" refers to an entity inside of a multipart 346 entity. 347 348 2.6. Body 349 350 The term "body", when not further qualified, means the body of an 351 entity, that is, the body of either a message or of a body part. 352 353 NOTE: The previous four definitions are clearly circular. This is 354 unavoidable, since the overall structure of a MIME message is indeed 355 recursive. 356 357 2.7. 7bit Data 358 359 "7bit data" refers to data that is all represented as relatively 360 short lines with 998 octets or less between CRLF line separation 361 sequences [RFC-821]. No octets with decimal values greater than 127 362 are allowed and neither are NULs (octets with decimal value 0). CR 363 (decimal value 13) and LF (decimal value 10) octets only occur as 364 part of CRLF line separation sequences. 365 366 2.8. 8bit Data 367 368 "8bit data" refers to data that is all represented as relatively 369 short lines with 998 octets or less between CRLF line separation 370 sequences [RFC-821]), but octets with decimal values greater than 127 371 may be used. As with "7bit data" CR and LF octets only occur as part 372 of CRLF line separation sequences and no NULs are allowed. 373 374 2.9. Binary Data 375 376 "Binary data" refers to data where any sequence of octets whatsoever 377 is allowed. 378 379 2.10. Lines 380 381 "Lines" are defined as sequences of octets separated by a CRLF 382 sequences. This is consistent with both RFC 821 and RFC 822. 383 "Lines" only refers to a unit of data in a message, which may or may 384 not correspond to something that is actually displayed by a user 385 agent. 386 387 388 389 390 391 392 393 394 Freed & Borenstein Standards Track [Page 7] 395 396 RFC 2045 Internet Message Bodies November 1996 397 398 399 3. MIME Header Fields 400 401 MIME defines a number of new RFC 822 header fields that are used to 402 describe the content of a MIME entity. These header fields occur in 403 at least two contexts: 404 405 (1) As part of a regular RFC 822 message header. 406 407 (2) In a MIME body part header within a multipart 408 construct. 409 410 The formal definition of these header fields is as follows: 411 412 entity-headers := [ content CRLF ] 413 [ encoding CRLF ] 414 [ id CRLF ] 415 [ description CRLF ] 416 *( MIME-extension-field CRLF ) 417 418 MIME-message-headers := entity-headers 419 fields 420 version CRLF 421 ; The ordering of the header 422 ; fields implied by this BNF 423 ; definition should be ignored. 424 425 MIME-part-headers := entity-headers 426 [ fields ] 427 ; Any field not beginning with 428 ; "content-" can have no defined 429 ; meaning and may be ignored. 430 ; The ordering of the header 431 ; fields implied by this BNF 432 ; definition should be ignored. 433 434 The syntax of the various specific MIME header fields will be 435 described in the following sections. 436 437 4. MIME-Version Header Field 438 439 Since RFC 822 was published in 1982, there has really been only one 440 format standard for Internet messages, and there has been little 441 perceived need to declare the format standard in use. This document 442 is an independent specification that complements RFC 822. Although 443 the extensions in this document have been defined in such a way as to 444 be compatible with RFC 822, there are still circumstances in which it 445 might be desirable for a mail-processing agent to know whether a 446 message was composed with the new standard in mind. 447 448 449 450 Freed & Borenstein Standards Track [Page 8] 451 452 RFC 2045 Internet Message Bodies November 1996 453 454 455 Therefore, this document defines a new header field, "MIME-Version", 456 which is to be used to declare the version of the Internet message 457 body format standard in use. 458 459 Messages composed in accordance with this document MUST include such 460 a header field, with the following verbatim text: 461 462 MIME-Version: 1.0 463 464 The presence of this header field is an assertion that the message 465 has been composed in compliance with this document. 466 467 Since it is possible that a future document might extend the message 468 format standard again, a formal BNF is given for the content of the 469 MIME-Version field: 470 471 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 472 473 Thus, future format specifiers, which might replace or extend "1.0", 474 are constrained to be two integer fields, separated by a period. If 475 a message is received with a MIME-version value other than "1.0", it 476 cannot be assumed to conform with this document. 477 478 Note that the MIME-Version header field is required at the top level 479 of a message. It is not required for each body part of a multipart 480 entity. It is required for the embedded headers of a body of type 481 "message/rfc822" or "message/partial" if and only if the embedded 482 message is itself claimed to be MIME-conformant. 483 484 It is not possible to fully specify how a mail reader that conforms 485 with MIME as defined in this document should treat a message that 486 might arrive in the future with some value of MIME-Version other than 487 "1.0". 488 489 It is also worth noting that version control for specific media types 490 is not accomplished using the MIME-Version mechanism. In particular, 491 some formats (such as application/postscript) have version numbering 492 conventions that are internal to the media format. Where such 493 conventions exist, MIME does nothing to supersede them. Where no 494 such conventions exist, a MIME media type might use a "version" 495 parameter in the content-type field if necessary. 496 497 498 499 500 501 502 503 504 505 506 Freed & Borenstein Standards Track [Page 9] 507 508 RFC 2045 Internet Message Bodies November 1996 509 510 511 NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822 512 comment strings that are present must be ignored. In particular, the 513 following four MIME-Version fields are equivalent: 514 515 MIME-Version: 1.0 516 517 MIME-Version: 1.0 (produced by MetaSend Vx.x) 518 519 MIME-Version: (produced by MetaSend Vx.x) 1.0 520 521 MIME-Version: 1.(produced by MetaSend Vx.x)0 522 523 In the absence of a MIME-Version field, a receiving mail user agent 524 (whether conforming to MIME requirements or not) may optionally 525 choose to interpret the body of the message according to local 526 conventions. Many such conventions are currently in use and it 527 should be noted that in practice non-MIME messages can contain just 528 about anything. 529 530 It is impossible to be certain that a non-MIME mail message is 531 actually plain text in the US-ASCII character set since it might well 532 be a message that, using some set of nonstandard local conventions 533 that predate MIME, includes text in another character set or non- 534 textual data presented in a manner that cannot be automatically 535 recognized (e.g., a uuencoded compressed UNIX tar file). 536 537 5. Content-Type Header Field 538 539 The purpose of the Content-Type field is to describe the data 540 contained in the body fully enough that the receiving user agent can 541 pick an appropriate agent or mechanism to present the data to the 542 user, or otherwise deal with the data in an appropriate manner. The 543 value in this field is called a media type. 544 545 HISTORICAL NOTE: The Content-Type header field was first defined in 546 RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one 547 that is largely compatible with the mechanism given here. 548 549 The Content-Type header field specifies the nature of the data in the 550 body of an entity by giving media type and subtype identifiers, and 551 by providing auxiliary information that may be required for certain 552 media types. After the media type and subtype names, the remainder 553 of the header field is simply a set of parameters, specified in an 554 attribute=value notation. The ordering of parameters is not 555 significant. 556 557 558 559 560 561 562 Freed & Borenstein Standards Track [Page 10] 563 564 RFC 2045 Internet Message Bodies November 1996 565 566 567 In general, the top-level media type is used to declare the general 568 type of data, while the subtype specifies a specific format for that 569 type of data. Thus, a media type of "image/xyz" is enough to tell a 570 user agent that the data is an image, even if the user agent has no 571 knowledge of the specific image format "xyz". Such information can 572 be used, for example, to decide whether or not to show a user the raw 573 data from an unrecognized subtype -- such an action might be 574 reasonable for unrecognized subtypes of text, but not for 575 unrecognized subtypes of image or audio. For this reason, registered 576 subtypes of text, image, audio, and video should not contain embedded 577 information that is really of a different type. Such compound 578 formats should be represented using the "multipart" or "application" 579 types. 580 581 Parameters are modifiers of the media subtype, and as such do not 582 fundamentally affect the nature of the content. The set of 583 meaningful parameters depends on the media type and subtype. Most 584 parameters are associated with a single specific subtype. However, a 585 given top-level media type may define parameters which are applicable 586 to any subtype of that type. Parameters may be required by their 587 defining content type or subtype or they may be optional. MIME 588 implementations must ignore any parameters whose names they do not 589 recognize. 590 591 For example, the "charset" parameter is applicable to any subtype of 592 "text", while the "boundary" parameter is required for any subtype of 593 the "multipart" media type. 594 595 There are NO globally-meaningful parameters that apply to all media 596 types. Truly global mechanisms are best addressed, in the MIME 597 model, by the definition of additional Content-* header fields. 598 599 An initial set of seven top-level media types is defined in RFC 2046. 600 Five of these are discrete types whose content is essentially opaque 601 as far as MIME processing is concerned. The remaining two are 602 composite types whose contents require additional handling by MIME 603 processors. 604 605 This set of top-level media types is intended to be substantially 606 complete. It is expected that additions to the larger set of 607 supported types can generally be accomplished by the creation of new 608 subtypes of these initial types. In the future, more top-level types 609 may be defined only by a standards-track extension to this standard. 610 If another top-level type is to be used for any reason, it must be 611 given a name starting with "X-" to indicate its non-standard status 612 and to avoid a potential conflict with a future official name. 613 614 615 616 617 618 Freed & Borenstein Standards Track [Page 11] 619 620 RFC 2045 Internet Message Bodies November 1996 621 622 623 5.1. Syntax of the Content-Type Header Field 624 625 In the Augmented BNF notation of RFC 822, a Content-Type header field 626 value is defined as follows: 627 628 content := "Content-Type" ":" type "/" subtype 629 *(";" parameter) 630 ; Matching of media type and subtype 631 ; is ALWAYS case-insensitive. 632 633 type := discrete-type / composite-type 634 635 discrete-type := "text" / "image" / "audio" / "video" / 636 "application" / extension-token 637 638 composite-type := "message" / "multipart" / extension-token 639 640 extension-token := ietf-token / x-token 641 642 ietf-token := <An extension token defined by a 643 standards-track RFC and registered 644 with IANA.> 645 646 x-token := <The two characters "X-" or "x-" followed, with 647 no intervening white space, by any token> 648 649 subtype := extension-token / iana-token 650 651 iana-token := <A publicly-defined extension token. Tokens 652 of this form must be registered with IANA 653 as specified in RFC 2048.> 654 655 parameter := attribute "=" value 656 657 attribute := token 658 ; Matching of attributes 659 ; is ALWAYS case-insensitive. 660 661 value := token / quoted-string 662 663 token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, 664 or tspecials> 665 666 tspecials := "(" / ")" / "<" / ">" / "@" / 667 "," / ";" / ":" / "\" / <"> 668 "/" / "[" / "]" / "?" / "=" 669 ; Must be in quoted-string, 670 ; to use within parameter values 671 672 673 674 Freed & Borenstein Standards Track [Page 12] 675 676 RFC 2045 Internet Message Bodies November 1996 677 678 679 Note that the definition of "tspecials" is the same as the RFC 822 680 definition of "specials" with the addition of the three characters 681 "/", "?", and "=", and the removal of ".". 682 683 Note also that a subtype specification is MANDATORY -- it may not be 684 omitted from a Content-Type header field. As such, there are no 685 default subtypes. 686 687 The type, subtype, and parameter names are not case sensitive. For 688 example, TEXT, Text, and TeXt are all equivalent top-level media 689 types. Parameter values are normally case sensitive, but sometimes 690 are interpreted in a case-insensitive fashion, depending on the 691 intended use. (For example, multipart boundaries are case-sensitive, 692 but the "access-type" parameter for message/External-body is not 693 case-sensitive.) 694 695 Note that the value of a quoted string parameter does not include the 696 quotes. That is, the quotation marks in a quoted-string are not a 697 part of the value of the parameter, but are merely used to delimit 698 that parameter value. In addition, comments are allowed in 699 accordance with RFC 822 rules for structured header fields. Thus the 700 following two forms 701 702 Content-type: text/plain; charset=us-ascii (Plain text) 703 704 Content-type: text/plain; charset="us-ascii" 705 706 are completely equivalent. 707 708 Beyond this syntax, the only syntactic constraint on the definition 709 of subtype names is the desire that their uses must not conflict. 710 That is, it would be undesirable to have two different communities 711 using "Content-Type: application/foobar" to mean two different 712 things. The process of defining new media subtypes, then, is not 713 intended to be a mechanism for imposing restrictions, but simply a 714 mechanism for publicizing their definition and usage. There are, 715 therefore, two acceptable mechanisms for defining new media subtypes: 716 717 (1) Private values (starting with "X-") may be defined 718 bilaterally between two cooperating agents without 719 outside registration or standardization. Such values 720 cannot be registered or standardized. 721 722 (2) New standard values should be registered with IANA as 723 described in RFC 2048. 724 725 The second document in this set, RFC 2046, defines the initial set of 726 media types for MIME. 727 728 729 730 Freed & Borenstein Standards Track [Page 13] 731 732 RFC 2045 Internet Message Bodies November 1996 733 734 735 5.2. Content-Type Defaults 736 737 Default RFC 822 messages without a MIME Content-Type header are taken 738 by this protocol to be plain text in the US-ASCII character set, 739 which can be explicitly specified as: 740 741 Content-type: text/plain; charset=us-ascii 742 743 This default is assumed if no Content-Type header field is specified. 744 It is also recommend that this default be assumed when a 745 syntactically invalid Content-Type header field is encountered. In 746 the presence of a MIME-Version header field and the absence of any 747 Content-Type header field, a receiving User Agent can also assume 748 that plain US-ASCII text was the sender's intent. Plain US-ASCII 749 text may still be assumed in the absence of a MIME-Version or the 750 presence of an syntactically invalid Content-Type header field, but 751 the sender's intent might have been otherwise. 752 753 6. Content-Transfer-Encoding Header Field 754 755 Many media types which could be usefully transported via email are 756 represented, in their "natural" format, as 8bit character or binary 757 data. Such data cannot be transmitted over some transfer protocols. 758 For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII 759 data with lines no longer than 1000 characters including any trailing 760 CRLF line separator. 761 762 It is necessary, therefore, to define a standard mechanism for 763 encoding such data into a 7bit short line format. Proper labelling 764 of unencoded material in less restrictive formats for direct use over 765 less restrictive transports is also desireable. This document 766 specifies that such encodings will be indicated by a new "Content- 767 Transfer-Encoding" header field. This field has not been defined by 768 any previous standard. 769 770 6.1. Content-Transfer-Encoding Syntax 771 772 The Content-Transfer-Encoding field's value is a single token 773 specifying the type of encoding, as enumerated below. Formally: 774 775 encoding := "Content-Transfer-Encoding" ":" mechanism 776 777 mechanism := "7bit" / "8bit" / "binary" / 778 "quoted-printable" / "base64" / 779 ietf-token / x-token 780 781 These values are not case sensitive -- Base64 and BASE64 and bAsE64 782 are all equivalent. An encoding type of 7BIT requires that the body 783 784 785 786 Freed & Borenstein Standards Track [Page 14] 787 788 RFC 2045 Internet Message Bodies November 1996 789 790 791 is already in a 7bit mail-ready representation. This is the default 792 value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the 793 Content-Transfer-Encoding header field is not present. 794 795 6.2. Content-Transfer-Encodings Semantics 796 797 This single Content-Transfer-Encoding token actually provides two 798 pieces of information. It specifies what sort of encoding 799 transformation the body was subjected to and hence what decoding 800 operation must be used to restore it to its original form, and it 801 specifies what the domain of the result is. 802 803 The transformation part of any Content-Transfer-Encodings specifies, 804 either explicitly or implicitly, a single, well-defined decoding 805 algorithm, which for any sequence of encoded octets either transforms 806 it to the original sequence of octets which was encoded, or shows 807 that it is illegal as an encoded sequence. Content-Transfer- 808 Encodings transformations never depend on any additional external 809 profile information for proper operation. Note that while decoders 810 must produce a single, well-defined output for a valid encoding no 811 such restrictions exist for encoders: Encoding a given sequence of 812 octets to different, equivalent encoded sequences is perfectly legal. 813 814 Three transformations are currently defined: identity, the "quoted- 815 printable" encoding, and the "base64" encoding. The domains are 816 "binary", "8bit" and "7bit". 817 818 The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all 819 mean that the identity (i.e. NO) encoding transformation has been 820 performed. As such, they serve simply as indicators of the domain of 821 the body data, and provide useful information about the sort of 822 encoding that might be needed for transmission in a given transport 823 system. The terms "7bit data", "8bit data", and "binary data" are 824 all defined in Section 2. 825 826 The quoted-printable and base64 encodings transform their input from 827 an arbitrary domain into material in the "7bit" range, thus making it 828 safe to carry over restricted transports. The specific definition of 829 the transformations are given below. 830 831 The proper Content-Transfer-Encoding label must always be used. 832 Labelling unencoded data containing 8bit characters as "7bit" is not 833 allowed, nor is labelling unencoded non-line-oriented data as 834 anything other than "binary" allowed. 835 836 Unlike media subtypes, a proliferation of Content-Transfer-Encoding 837 values is both undesirable and unnecessary. However, establishing 838 only a single transformation into the "7bit" domain does not seem 839 840 841 842 Freed & Borenstein Standards Track [Page 15] 843 844 RFC 2045 Internet Message Bodies November 1996 845 846 847 possible. There is a tradeoff between the desire for a compact and 848 efficient encoding of largely- binary data and the desire for a 849 somewhat readable encoding of data that is mostly, but not entirely, 850 7bit. For this reason, at least two encoding mechanisms are 851 necessary: a more or less readable encoding (quoted-printable) and a 852 "dense" or "uniform" encoding (base64). 853 854 Mail transport for unencoded 8bit data is defined in RFC 1652. As of 855 the initial publication of this document, there are no standardized 856 Internet mail transports for which it is legitimate to include 857 unencoded binary data in mail bodies. Thus there are no 858 circumstances in which the "binary" Content-Transfer-Encoding is 859 actually valid in Internet mail. However, in the event that binary 860 mail transport becomes a reality in Internet mail, or when MIME is 861 used in conjunction with any other binary-capable mail transport 862 mechanism, binary bodies must be labelled as such using this 863 mechanism. 864 865 NOTE: The five values defined for the Content-Transfer-Encoding field 866 imply nothing about the media type other than the algorithm by which 867 it was encoded or the transport system requirements if unencoded. 868 869 6.3. New Content-Transfer-Encodings 870 871 Implementors may, if necessary, define private Content-Transfer- 872 Encoding values, but must use an x-token, which is a name prefixed by 873 "X-", to indicate its non-standard status, e.g., "Content-Transfer- 874 Encoding: x-my-new-encoding". Additional standardized Content- 875 Transfer-Encoding values must be specified by a standards-track RFC. 876 The requirements such specifications must meet are given in RFC 2048. 877 As such, all content-transfer-encoding namespace except that 878 beginning with "X-" is explicitly reserved to the IETF for future 879 use. 880 881 Unlike media types and subtypes, the creation of new Content- 882 Transfer-Encoding values is STRONGLY discouraged, as it seems likely 883 to hinder interoperability with little potential benefit 884 885 6.4. Interpretation and Use 886 887 If a Content-Transfer-Encoding header field appears as part of a 888 message header, it applies to the entire body of that message. If a 889 Content-Transfer-Encoding header field appears as part of an entity's 890 headers, it applies only to the body of that entity. If an entity is 891 of type "multipart" the Content-Transfer-Encoding is not permitted to 892 have any value other than "7bit", "8bit" or "binary". Even more 893 severe restrictions apply to some subtypes of the "message" type. 894 895 896 897 898 Freed & Borenstein Standards Track [Page 16] 899 900 RFC 2045 Internet Message Bodies November 1996 901 902 903 It should be noted that most media types are defined in terms of 904 octets rather than bits, so that the mechanisms described here are 905 mechanisms for encoding arbitrary octet streams, not bit streams. If 906 a bit stream is to be encoded via one of these mechanisms, it must 907 first be converted to an 8bit byte stream using the network standard 908 bit order ("big-endian"), in which the earlier bits in a stream 909 become the higher-order bits in a 8bit byte. A bit stream not ending 910 at an 8bit boundary must be padded with zeroes. RFC 2046 provides a 911 mechanism for noting the addition of such padding in the case of the 912 application/octet-stream media type, which has a "padding" parameter. 913 914 The encoding mechanisms defined here explicitly encode all data in 915 US-ASCII. Thus, for example, suppose an entity has header fields 916 such as: 917 918 Content-Type: text/plain; charset=ISO-8859-1 919 Content-transfer-encoding: base64 920 921 This must be interpreted to mean that the body is a base64 US-ASCII 922 encoding of data that was originally in ISO-8859-1, and will be in 923 that character set again after decoding. 924 925 Certain Content-Transfer-Encoding values may only be used on certain 926 media types. In particular, it is EXPRESSLY FORBIDDEN to use any 927 encodings other than "7bit", "8bit", or "binary" with any composite 928 media type, i.e. one that recursively includes other Content-Type 929 fields. Currently the only composite media types are "multipart" and 930 "message". All encodings that are desired for bodies of type 931 multipart or message must be done at the innermost level, by encoding 932 the actual body that needs to be encoded. 933 934 It should also be noted that, by definition, if a composite entity 935 has a transfer-encoding value such as "7bit", but one of the enclosed 936 entities has a less restrictive value such as "8bit", then either the 937 outer "7bit" labelling is in error, because 8bit data are included, 938 or the inner "8bit" labelling placed an unnecessarily high demand on 939 the transport system because the actual included data were actually 940 7bit-safe. 941 942 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using 943 content-transfer-encodings on composite body data may seem overly 944 restrictive, it is necessary to prevent nested encodings, in which 945 data are passed through an encoding algorithm multiple times, and 946 must be decoded multiple times in order to be properly viewed. 947 Nested encodings add considerable complexity to user agents: Aside 948 from the obvious efficiency problems with such multiple encodings, 949 they can obscure the basic structure of a message. In particular, 950 they can imply that several decoding operations are necessary simply 951 952 953 954 Freed & Borenstein Standards Track [Page 17] 955 956 RFC 2045 Internet Message Bodies November 1996 957 958 959 to find out what types of bodies a message contains. Banning nested 960 encodings may complicate the job of certain mail gateways, but this 961 seems less of a problem than the effect of nested encodings on user 962 agents. 963 964 Any entity with an unrecognized Content-Transfer-Encoding must be 965 treated as if it has a Content-Type of "application/octet-stream", 966 regardless of what the Content-Type header field actually says. 967 968 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER- 969 ENCODING: It may seem that the Content-Transfer-Encoding could be 970 inferred from the characteristics of the media that is to be encoded, 971 or, at the very least, that certain Content-Transfer-Encodings could 972 be mandated for use with specific media types. There are several 973 reasons why this is not the case. First, given the varying types of 974 transports used for mail, some encodings may be appropriate for some 975 combinations of media types and transports but not for others. (For 976 example, in an 8bit transport, no encoding would be required for text 977 in certain character sets, while such encodings are clearly required 978 for 7bit SMTP.) 979 980 Second, certain media types may require different types of transfer 981 encoding under different circumstances. For example, many PostScript 982 bodies might consist entirely of short lines of 7bit data and hence 983 require no encoding at all. Other PostScript bodies (especially 984 those using Level 2 PostScript's binary encoding mechanism) may only 985 be reasonably represented using a binary transport encoding. 986 Finally, since the Content-Type field is intended to be an open-ended 987 specification mechanism, strict specification of an association 988 between media types and encodings effectively couples the 989 specification of an application protocol with a specific lower-level 990 transport. This is not desirable since the developers of a media 991 type should not have to be aware of all the transports in use and 992 what their limitations are. 993 994 6.5. Translating Encodings 995 996 The quoted-printable and base64 encodings are designed so that 997 conversion between them is possible. The only issue that arises in 998 such a conversion is the handling of hard line breaks in quoted- 999 printable encoding output. When converting from quoted-printable to 1000 base64 a hard line break in the quoted-printable form represents a 1001 CRLF sequence in the canonical form of the data. It must therefore be 1002 converted to a corresponding encoded CRLF in the base64 form of the 1003 data. Similarly, a CRLF sequence in the canonical form of the data 1004 obtained after base64 decoding must be converted to a quoted- 1005 printable hard line break, but ONLY when converting text data. 1006 1007 1008 1009 1010 Freed & Borenstein Standards Track [Page 18] 1011 1012 RFC 2045 Internet Message Bodies November 1996 1013 1014 1015 6.6. Canonical Encoding Model 1016 1017 There was some confusion, in the previous versions of this RFC, 1018 regarding the model for when email data was to be converted to 1019 canonical form and encoded, and in particular how this process would 1020 affect the treatment of CRLFs, given that the representation of 1021 newlines varies greatly from system to system, and the relationship 1022 between content-transfer-encodings and character sets. A canonical 1023 model for encoding is presented in RFC 2049 for this reason. 1024 1025 6.7. Quoted-Printable Content-Transfer-Encoding 1026 1027 The Quoted-Printable encoding is intended to represent data that 1028 largely consists of octets that correspond to printable characters in 1029 the US-ASCII character set. It encodes the data in such a way that 1030 the resulting octets are unlikely to be modified by mail transport. 1031 If the data being encoded are mostly US-ASCII text, the encoded form 1032 of the data remains largely recognizable by humans. A body which is 1033 entirely US-ASCII may also be encoded in Quoted-Printable to ensure 1034 the integrity of the data should the message pass through a 1035 character-translating, and/or line-wrapping gateway. 1036 1037 In this encoding, octets are to be represented as determined by the 1038 following rules: 1039 1040 (1) (General 8bit representation) Any octet, except a CR or 1041 LF that is part of a CRLF line break of the canonical 1042 (standard) form of the data being encoded, may be 1043 represented by an "=" followed by a two digit 1044 hexadecimal representation of the octet's value. The 1045 digits of the hexadecimal alphabet, for this purpose, 1046 are "0123456789ABCDEF". Uppercase letters must be 1047 used; lowercase letters are not allowed. Thus, for 1048 example, the decimal value 12 (US-ASCII form feed) can 1049 be represented by "=0C", and the decimal value 61 (US- 1050 ASCII EQUAL SIGN) can be represented by "=3D". This 1051 rule must be followed except when the following rules 1052 allow an alternative encoding. 1053 1054 (2) (Literal representation) Octets with decimal values of 1055 33 through 60 inclusive, and 62 through 126, inclusive, 1056 MAY be represented as the US-ASCII characters which 1057 correspond to those octets (EXCLAMATION POINT through 1058 LESS THAN, and GREATER THAN through TILDE, 1059 respectively). 1060 1061 (3) (White Space) Octets with values of 9 and 32 MAY be 1062 represented as US-ASCII TAB (HT) and SPACE characters, 1063 1064 1065 1066 Freed & Borenstein Standards Track [Page 19] 1067 1068 RFC 2045 Internet Message Bodies November 1996 1069 1070 1071 respectively, but MUST NOT be so represented at the end 1072 of an encoded line. Any TAB (HT) or SPACE characters 1073 on an encoded line MUST thus be followed on that line 1074 by a printable character. In particular, an "=" at the 1075 end of an encoded line, indicating a soft line break 1076 (see rule #5) may follow one or more TAB (HT) or SPACE 1077 characters. It follows that an octet with decimal 1078 value 9 or 32 appearing at the end of an encoded line 1079 must be represented according to Rule #1. This rule is 1080 necessary because some MTAs (Message Transport Agents, 1081 programs which transport messages from one user to 1082 another, or perform a portion of such transfers) are 1083 known to pad lines of text with SPACEs, and others are 1084 known to remove "white space" characters from the end 1085 of a line. Therefore, when decoding a Quoted-Printable 1086 body, any trailing white space on a line must be 1087 deleted, as it will necessarily have been added by 1088 intermediate transport agents. 1089 1090 (4) (Line Breaks) A line break in a text body, represented 1091 as a CRLF sequence in the text canonical form, must be 1092 represented by a (RFC 822) line break, which is also a 1093 CRLF sequence, in the Quoted-Printable encoding. Since 1094 the canonical representation of media types other than 1095 text do not generally include the representation of 1096 line breaks as CRLF sequences, no hard line breaks 1097 (i.e. line breaks that are intended to be meaningful 1098 and to be displayed to the user) can occur in the 1099 quoted-printable encoding of such types. Sequences 1100 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 1101 appear in non-text data represented in quoted- 1102 printable, of course. 1103 1104 Note that many implementations may elect to encode the 1105 local representation of various content types directly 1106 rather than converting to canonical form first, 1107 encoding, and then converting back to local 1108 representation. In particular, this may apply to plain 1109 text material on systems that use newline conventions 1110 other than a CRLF terminator sequence. Such an 1111 implementation optimization is permissible, but only 1112 when the combined canonicalization-encoding step is 1113 equivalent to performing the three steps separately. 1114 1115 (5) (Soft Line Breaks) The Quoted-Printable encoding 1116 REQUIRES that encoded lines be no more than 76 1117 characters long. If longer lines are to be encoded 1118 with the Quoted-Printable encoding, "soft" line breaks 1119 1120 1121 1122 Freed & Borenstein Standards Track [Page 20] 1123 1124 RFC 2045 Internet Message Bodies November 1996 1125 1126 1127 must be used. An equal sign as the last character on a 1128 encoded line indicates such a non-significant ("soft") 1129 line break in the encoded text. 1130 1131 Thus if the "raw" form of the line is a single unencoded line that 1132 says: 1133 1134 Now's the time for all folk to come to the aid of their country. 1135 1136 This can be represented, in the Quoted-Printable encoding, as: 1137 1138 Now's the time = 1139 for all folk to come= 1140 to the aid of their country. 1141 1142 This provides a mechanism with which long lines are encoded in such a 1143 way as to be restored by the user agent. The 76 character limit does 1144 not count the trailing CRLF, but counts all other characters, 1145 including any equal signs. 1146 1147 Since the hyphen character ("-") may be represented as itself in the 1148 Quoted-Printable encoding, care must be taken, when encapsulating a 1149 quoted-printable encoded body inside one or more multipart entities, 1150 to ensure that the boundary delimiter does not appear anywhere in the 1151 encoded body. (A good strategy is to choose a boundary that includes 1152 a character sequence such as "=_" which can never appear in a 1153 quoted-printable body. See the definition of multipart messages in 1154 RFC 2046.) 1155 1156 NOTE: The quoted-printable encoding represents something of a 1157 compromise between readability and reliability in transport. Bodies 1158 encoded with the quoted-printable encoding will work reliably over 1159 most mail gateways, but may not work perfectly over a few gateways, 1160 notably those involving translation into EBCDIC. A higher level of 1161 confidence is offered by the base64 Content-Transfer-Encoding. A way 1162 to get reasonably reliable transport through EBCDIC gateways is to 1163 also quote the US-ASCII characters 1164 1165 !"#$@[\]^`{|}~ 1166 1167 according to rule #1. 1168 1169 Because quoted-printable data is generally assumed to be line- 1170 oriented, it is to be expected that the representation of the breaks 1171 between the lines of quoted-printable data may be altered in 1172 transport, in the same manner that plain text mail has always been 1173 altered in Internet mail when passing between systems with differing 1174 newline conventions. If such alterations are likely to constitute a 1175 1176 1177 1178 Freed & Borenstein Standards Track [Page 21] 1179 1180 RFC 2045 Internet Message Bodies November 1996 1181 1182 1183 corruption of the data, it is probably more sensible to use the 1184 base64 encoding rather than the quoted-printable encoding. 1185 1186 NOTE: Several kinds of substrings cannot be generated according to 1187 the encoding rules for the quoted-printable content-transfer- 1188 encoding, and hence are formally illegal if they appear in the output 1189 of a quoted-printable encoder. This note enumerates these cases and 1190 suggests ways to handle such illegal substrings if any are 1191 encountered in quoted-printable data that is to be decoded. 1192 1193 (1) An "=" followed by two hexadecimal digits, one or both 1194 of which are lowercase letters in "abcdef", is formally 1195 illegal. A robust implementation might choose to 1196 recognize them as the corresponding uppercase letters. 1197 1198 (2) An "=" followed by a character that is neither a 1199 hexadecimal digit (including "abcdef") nor the CR 1200 character of a CRLF pair is illegal. This case can be 1201 the result of US-ASCII text having been included in a 1202 quoted-printable part of a message without itself 1203 having been subjected to quoted-printable encoding. A 1204 reasonable approach by a robust implementation might be 1205 to include the "=" character and the following 1206 character in the decoded data without any 1207 transformation and, if possible, indicate to the user 1208 that proper decoding was not possible at this point in 1209 the data. 1210 1211 (3) An "=" cannot be the ultimate or penultimate character 1212 in an encoded object. This could be handled as in case 1213 (2) above. 1214 1215 (4) Control characters other than TAB, or CR and LF as 1216 parts of CRLF pairs, must not appear. The same is true 1217 for octets with decimal values greater than 126. If 1218 found in incoming quoted-printable data by a decoder, a 1219 robust implementation might exclude them from the 1220 decoded data and warn the user that illegal characters 1221 were discovered. 1222 1223 (5) Encoded lines must not be longer than 76 characters, 1224 not counting the trailing CRLF. If longer lines are 1225 found in incoming, encoded data, a robust 1226 implementation might nevertheless decode the lines, and 1227 might report the erroneous encoding to the user. 1228 1229 1230 1231 1232 1233 1234 Freed & Borenstein Standards Track [Page 22] 1235 1236 RFC 2045 Internet Message Bodies November 1996 1237 1238 1239 WARNING TO IMPLEMENTORS: If binary data is encoded in quoted- 1240 printable, care must be taken to encode CR and LF characters as "=0D" 1241 and "=0A", respectively. In particular, a CRLF sequence in binary 1242 data should be encoded as "=0D=0A". Otherwise, if CRLF were 1243 represented as a hard line break, it might be incorrectly decoded on 1244 platforms with different line break conventions. 1245 1246 For formalists, the syntax of quoted-printable data is described by 1247 the following grammar: 1248 1249 quoted-printable := qp-line *(CRLF qp-line) 1250 1251 qp-line := *(qp-segment transport-padding CRLF) 1252 qp-part transport-padding 1253 1254 qp-part := qp-section 1255 ; Maximum length of 76 characters 1256 1257 qp-segment := qp-section *(SPACE / TAB) "=" 1258 ; Maximum length of 76 characters 1259 1260 qp-section := [*(ptext / SPACE / TAB) ptext] 1261 1262 ptext := hex-octet / safe-char 1263 1264 safe-char := <any octet with decimal value of 33 through 1265 60 inclusive, and 62 through 126> 1266 ; Characters not listed as "mail-safe" in 1267 ; RFC 2049 are also not recommended. 1268 1269 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1270 ; Octet must be used for characters > 127, =, 1271 ; SPACEs or TABs at the ends of lines, and is 1272 ; recommended for any character not listed in 1273 ; RFC 2049 as "mail-safe". 1274 1275 transport-padding := *LWSP-char 1276 ; Composers MUST NOT generate 1277 ; non-zero length transport 1278 ; padding, but receivers MUST 1279 ; be able to handle padding 1280 ; added by message transports. 1281 1282 IMPORTANT: The addition of LWSP between the elements shown in this 1283 BNF is NOT allowed since this BNF does not specify a structured 1284 header field. 1285 1286 1287 1288 1289 1290 Freed & Borenstein Standards Track [Page 23] 1291 1292 RFC 2045 Internet Message Bodies November 1996 1293 1294 1295 6.8. Base64 Content-Transfer-Encoding 1296 1297 The Base64 Content-Transfer-Encoding is designed to represent 1298 arbitrary sequences of octets in a form that need not be humanly 1299 readable. The encoding and decoding algorithms are simple, but the 1300 encoded data are consistently only about 33 percent larger than the 1301 unencoded data. This encoding is virtually identical to the one used 1302 in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421. 1303 1304 A 65-character subset of US-ASCII is used, enabling 6 bits to be 1305 represented per printable character. (The extra 65th character, "=", 1306 is used to signify a special processing function.) 1307 1308 NOTE: This subset has the important property that it is represented 1309 identically in all versions of ISO 646, including US-ASCII, and all 1310 characters in the subset are also represented identically in all 1311 versions of EBCDIC. Other popular encodings, such as the encoding 1312 used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and 1313 the base85 encoding specified as part of Level 2 PostScript, do not 1314 share these properties, and thus do not fulfill the portability 1315 requirements a binary transport encoding for mail must meet. 1316 1317 The encoding process represents 24-bit groups of input bits as output 1318 strings of 4 encoded characters. Proceeding from left to right, a 1319 24-bit input group is formed by concatenating 3 8bit input groups. 1320 These 24 bits are then treated as 4 concatenated 6-bit groups, each 1321 of which is translated into a single digit in the base64 alphabet. 1322 When encoding a bit stream via the base64 encoding, the bit stream 1323 must be presumed to be ordered with the most-significant-bit first. 1324 That is, the first bit in the stream will be the high-order bit in 1325 the first 8bit byte, and the eighth bit will be the low-order bit in 1326 the first 8bit byte, and so on. 1327 1328 Each 6-bit group is used as an index into an array of 64 printable 1329 characters. The character referenced by the index is placed in the 1330 output string. These characters, identified in Table 1, below, are 1331 selected so as to be universally representable, and the set excludes 1332 characters with particular significance to SMTP (e.g., ".", CR, LF) 1333 and to the multipart boundary delimiters defined in RFC 2046 (e.g., 1334 "-"). 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 Freed & Borenstein Standards Track [Page 24] 1347 1348 RFC 2045 Internet Message Bodies November 1996 1349 1350 1351 Table 1: The Base64 Alphabet 1352 1353 Value Encoding Value Encoding Value Encoding Value Encoding 1354 0 A 17 R 34 i 51 z 1355 1 B 18 S 35 j 52 0 1356 2 C 19 T 36 k 53 1 1357 3 D 20 U 37 l 54 2 1358 4 E 21 V 38 m 55 3 1359 5 F 22 W 39 n 56 4 1360 6 G 23 X 40 o 57 5 1361 7 H 24 Y 41 p 58 6 1362 8 I 25 Z 42 q 59 7 1363 9 J 26 a 43 r 60 8 1364 10 K 27 b 44 s 61 9 1365 11 L 28 c 45 t 62 + 1366 12 M 29 d 46 u 63 / 1367 13 N 30 e 47 v 1368 14 O 31 f 48 w (pad) = 1369 15 P 32 g 49 x 1370 16 Q 33 h 50 y 1371 1372 The encoded output stream must be represented in lines of no more 1373 than 76 characters each. All line breaks or other characters not 1374 found in Table 1 must be ignored by decoding software. In base64 1375 data, characters other than those in Table 1, line breaks, and other 1376 white space probably indicate a transmission error, about which a 1377 warning message or even a message rejection might be appropriate 1378 under some circumstances. 1379 1380 Special processing is performed if fewer than 24 bits are available 1381 at the end of the data being encoded. A full encoding quantum is 1382 always completed at the end of a body. When fewer than 24 input bits 1383 are available in an input group, zero bits are added (on the right) 1384 to form an integral number of 6-bit groups. Padding at the end of 1385 the data is performed using the "=" character. Since all base64 1386 input is an integral number of octets, only the following cases can 1387 arise: (1) the final quantum of encoding input is an integral 1388 multiple of 24 bits; here, the final unit of encoded output will be 1389 an integral multiple of 4 characters with no "=" padding, (2) the 1390 final quantum of encoding input is exactly 8 bits; here, the final 1391 unit of encoded output will be two characters followed by two "=" 1392 padding characters, or (3) the final quantum of encoding input is 1393 exactly 16 bits; here, the final unit of encoded output will be three 1394 characters followed by one "=" padding character. 1395 1396 Because it is used only for padding at the end of the data, the 1397 occurrence of any "=" characters may be taken as evidence that the 1398 end of the data has been reached (without truncation in transit). No 1399 1400 1401 1402 Freed & Borenstein Standards Track [Page 25] 1403 1404 RFC 2045 Internet Message Bodies November 1996 1405 1406 1407 such assurance is possible, however, when the number of octets 1408 transmitted was a multiple of three and no "=" characters are 1409 present. 1410 1411 Any characters outside of the base64 alphabet are to be ignored in 1412 base64-encoded data. 1413 1414 Care must be taken to use the proper octets for line breaks if base64 1415 encoding is applied directly to text material that has not been 1416 converted to canonical form. In particular, text line breaks must be 1417 converted into CRLF sequences prior to base64 encoding. The 1418 important thing to note is that this may be done directly by the 1419 encoder rather than in a prior canonicalization step in some 1420 implementations. 1421 1422 NOTE: There is no need to worry about quoting potential boundary 1423 delimiters within base64-encoded bodies within multipart entities 1424 because no hyphen characters are used in the base64 encoding. 1425 1426 7. Content-ID Header Field 1427 1428 In constructing a high-level user agent, it may be desirable to allow 1429 one body to make reference to another. Accordingly, bodies may be 1430 labelled using the "Content-ID" header field, which is syntactically 1431 identical to the "Message-ID" header field: 1432 1433 id := "Content-ID" ":" msg-id 1434 1435 Like the Message-ID values, Content-ID values must be generated to be 1436 world-unique. 1437 1438 The Content-ID value may be used for uniquely identifying MIME 1439 entities in several contexts, particularly for caching data 1440 referenced by the message/external-body mechanism. Although the 1441 Content-ID header is generally optional, its use is MANDATORY in 1442 implementations which generate data of the optional MIME media type 1443 "message/external-body". That is, each message/external-body entity 1444 must have a Content-ID field to permit caching of such data. 1445 1446 It is also worth noting that the Content-ID value has special 1447 semantics in the case of the multipart/alternative media type. This 1448 is explained in the section of RFC 2046 dealing with 1449 multipart/alternative. 1450 1451 1452 1453 1454 1455 1456 1457 1458 Freed & Borenstein Standards Track [Page 26] 1459 1460 RFC 2045 Internet Message Bodies November 1996 1461 1462 1463 8. Content-Description Header Field 1464 1465 The ability to associate some descriptive information with a given 1466 body is often desirable. For example, it may be useful to mark an 1467 "image" body as "a picture of the Space Shuttle Endeavor." Such text 1468 may be placed in the Content-Description header field. This header 1469 field is always optional. 1470 1471 description := "Content-Description" ":" *text 1472 1473 The description is presumed to be given in the US-ASCII character 1474 set, although the mechanism specified in RFC 2047 may be used for 1475 non-US-ASCII Content-Description values. 1476 1477 9. Additional MIME Header Fields 1478 1479 Future documents may elect to define additional MIME header fields 1480 for various purposes. Any new header field that further describes 1481 the content of a message should begin with the string "Content-" to 1482 allow such fields which appear in a message header to be 1483 distinguished from ordinary RFC 822 message header fields. 1484 1485 MIME-extension-field := <Any RFC 822 header field which 1486 begins with the string 1487 "Content-"> 1488 1489 10. Summary 1490 1491 Using the MIME-Version, Content-Type, and Content-Transfer-Encoding 1492 header fields, it is possible to include, in a standardized way, 1493 arbitrary types of data with RFC 822 conformant mail messages. No 1494 restrictions imposed by either RFC 821 or RFC 822 are violated, and 1495 care has been taken to avoid problems caused by additional 1496 restrictions imposed by the characteristics of some Internet mail 1497 transport mechanisms (see RFC 2049). 1498 1499 The next document in this set, RFC 2046, specifies the initial set of 1500 media types that can be labelled and transported using these headers. 1501 1502 11. Security Considerations 1503 1504 Security issues are discussed in the second document in this set, RFC 1505 2046. 1506 1507 1508 1509 1510 1511 1512 1513 1514 Freed & Borenstein Standards Track [Page 27] 1515 1516 RFC 2045 Internet Message Bodies November 1996 1517 1518 1519 12. Authors' Addresses 1520 1521 For more information, the authors of this document are best contacted 1522 via Internet mail: 1523 1524 Ned Freed 1525 Innosoft International, Inc. 1526 1050 East Garvey Avenue South 1527 West Covina, CA 91790 1528 USA 1529 1530 Phone: +1 818 919 3600 1531 Fax: +1 818 919 3614 1532 EMail: ned@innosoft.com 1533 1534 1535 Nathaniel S. Borenstein 1536 First Virtual Holdings 1537 25 Washington Avenue 1538 Morristown, NJ 07960 1539 USA 1540 1541 Phone: +1 201 540 8967 1542 Fax: +1 201 993 3032 1543 EMail: nsb@nsb.fv.com 1544 1545 1546 MIME is a result of the work of the Internet Engineering Task Force 1547 Working Group on RFC 822 Extensions. The chairman of that group, 1548 Greg Vaudreuil, may be reached at: 1549 1550 Gregory M. Vaudreuil 1551 Octel Network Services 1552 17080 Dallas Parkway 1553 Dallas, TX 75248-1905 1554 USA 1555 1556 EMail: Greg.Vaudreuil@Octel.Com 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 Freed & Borenstein Standards Track [Page 28] 1571 1572 RFC 2045 Internet Message Bodies November 1996 1573 1574 1575 Appendix A -- Collected Grammar 1576 1577 This appendix contains the complete BNF grammar for all the syntax 1578 specified by this document. 1579 1580 By itself, however, this grammar is incomplete. It refers by name to 1581 several syntax rules that are defined by RFC 822. Rather than 1582 reproduce those definitions here, and risk unintentional differences 1583 between the two, this document simply refers the reader to RFC 822 1584 for the remaining definitions. Wherever a term is undefined, it 1585 refers to the RFC 822 definition. 1586 1587 attribute := token 1588 ; Matching of attributes 1589 ; is ALWAYS case-insensitive. 1590 1591 composite-type := "message" / "multipart" / extension-token 1592 1593 content := "Content-Type" ":" type "/" subtype 1594 *(";" parameter) 1595 ; Matching of media type and subtype 1596 ; is ALWAYS case-insensitive. 1597 1598 description := "Content-Description" ":" *text 1599 1600 discrete-type := "text" / "image" / "audio" / "video" / 1601 "application" / extension-token 1602 1603 encoding := "Content-Transfer-Encoding" ":" mechanism 1604 1605 entity-headers := [ content CRLF ] 1606 [ encoding CRLF ] 1607 [ id CRLF ] 1608 [ description CRLF ] 1609 *( MIME-extension-field CRLF ) 1610 1611 extension-token := ietf-token / x-token 1612 1613 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1614 ; Octet must be used for characters > 127, =, 1615 ; SPACEs or TABs at the ends of lines, and is 1616 ; recommended for any character not listed in 1617 ; RFC 2049 as "mail-safe". 1618 1619 iana-token := <A publicly-defined extension token. Tokens 1620 of this form must be registered with IANA 1621 as specified in RFC 2048.> 1622 1623 1624 1625 1626 Freed & Borenstein Standards Track [Page 29] 1627 1628 RFC 2045 Internet Message Bodies November 1996 1629 1630 1631 ietf-token := <An extension token defined by a 1632 standards-track RFC and registered 1633 with IANA.> 1634 1635 id := "Content-ID" ":" msg-id 1636 1637 mechanism := "7bit" / "8bit" / "binary" / 1638 "quoted-printable" / "base64" / 1639 ietf-token / x-token 1640 1641 MIME-extension-field := <Any RFC 822 header field which 1642 begins with the string 1643 "Content-"> 1644 1645 MIME-message-headers := entity-headers 1646 fields 1647 version CRLF 1648 ; The ordering of the header 1649 ; fields implied by this BNF 1650 ; definition should be ignored. 1651 1652 MIME-part-headers := entity-headers 1653 [fields] 1654 ; Any field not beginning with 1655 ; "content-" can have no defined 1656 ; meaning and may be ignored. 1657 ; The ordering of the header 1658 ; fields implied by this BNF 1659 ; definition should be ignored. 1660 1661 parameter := attribute "=" value 1662 1663 ptext := hex-octet / safe-char 1664 1665 qp-line := *(qp-segment transport-padding CRLF) 1666 qp-part transport-padding 1667 1668 qp-part := qp-section 1669 ; Maximum length of 76 characters 1670 1671 qp-section := [*(ptext / SPACE / TAB) ptext] 1672 1673 qp-segment := qp-section *(SPACE / TAB) "=" 1674 ; Maximum length of 76 characters 1675 1676 quoted-printable := qp-line *(CRLF qp-line) 1677 1678 1679 1680 1681 1682 Freed & Borenstein Standards Track [Page 30] 1683 1684 RFC 2045 Internet Message Bodies November 1996 1685 1686 1687 safe-char := <any octet with decimal value of 33 through 1688 60 inclusive, and 62 through 126> 1689 ; Characters not listed as "mail-safe" in 1690 ; RFC 2049 are also not recommended. 1691 1692 subtype := extension-token / iana-token 1693 1694 token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, 1695 or tspecials> 1696 1697 transport-padding := *LWSP-char 1698 ; Composers MUST NOT generate 1699 ; non-zero length transport 1700 ; padding, but receivers MUST 1701 ; be able to handle padding 1702 ; added by message transports. 1703 1704 tspecials := "(" / ")" / "<" / ">" / "@" / 1705 "," / ";" / ":" / "\" / <"> 1706 "/" / "[" / "]" / "?" / "=" 1707 ; Must be in quoted-string, 1708 ; to use within parameter values 1709 1710 type := discrete-type / composite-type 1711 1712 value := token / quoted-string 1713 1714 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1715 1716 x-token := <The two characters "X-" or "x-" followed, with 1717 no intervening white space, by any token> 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 Freed & Borenstein Standards Track [Page 31] 1739