rfc1341.txt (211117B)
1 2 3 4 5 6 7 Network Working Group N. Borenstein, Bellcore 8 Request for Comments: 1341 N. Freed, Innosoft 9 June 1992 10 11 12 13 MIME (Multipurpose Internet Mail Extensions): 14 15 16 Mechanisms for Specifying and Describing 17 the Format of Internet Message Bodies 18 19 20 Status of this Memo 21 22 This RFC specifies an IAB standards track protocol for the 23 Internet community, and requests discussion and suggestions 24 for improvements. Please refer to the current edition of 25 the "IAB Official Protocol Standards" for the 26 standardization state and status of this protocol. 27 Distribution of this memo is unlimited. 28 29 Abstract 30 31 RFC 822 defines a message representation protocol which 32 specifies considerable detail about message headers, but 33 which leaves the message content, or message body, as flat 34 ASCII text. This document redefines the format of message 35 bodies to allow multi-part textual and non-textual message 36 bodies to be represented and exchanged without loss of 37 information. This is based on earlier work documented in 38 RFC 934 and RFC 1049, but extends and revises that work. 39 Because RFC 822 said so little about message bodies, this 40 document is largely orthogonal to (rather than a revision 41 of) RFC 822. 42 43 In particular, this document is designed to provide 44 facilities to include multiple objects in a single message, 45 to represent body text in character sets other than US- 46 ASCII, to represent formatted multi-font text messages, to 47 represent non-textual material such as images and audio 48 fragments, and generally to facilitate later extensions 49 defining new types of Internet mail for use by cooperating 50 mail agents. 51 52 This document does NOT extend Internet mail header fields to 53 permit anything other than US-ASCII text data. It is 54 recognized that such extensions are necessary, and they are 55 the subject of a companion document [RFC -1342]. 56 57 A table of contents appears at the end of this document. 58 59 60 61 62 63 64 Borenstein & Freed [Page i] 65 66 67 68 69 70 71 72 1 Introduction 73 74 Since its publication in 1982, RFC 822 [RFC-822] has defined 75 the standard format of textual mail messages on the 76 Internet. Its success has been such that the RFC 822 format 77 has been adopted, wholly or partially, well beyond the 78 confines of the Internet and the Internet SMTP transport 79 defined by RFC 821 [RFC-821]. As the format has seen wider 80 use, a number of limitations have proven increasingly 81 restrictive for the user community. 82 83 RFC 822 was intended to specify a format for text messages. 84 As such, non-text messages, such as multimedia messages that 85 might include audio or images, are simply not mentioned. 86 Even in the case of text, however, RFC 822 is inadequate for 87 the needs of mail users whose languages require the use of 88 character sets richer than US ASCII [US-ASCII]. Since RFC 89 822 does not specify mechanisms for mail containing audio, 90 video, Asian language text, or even text in most European 91 languages, additional specifications are needed 92 93 One of the notable limitations of RFC 821/822 based mail 94 systems is the fact that they limit the contents of 95 electronic mail messages to relatively short lines of 96 seven-bit ASCII. This forces users to convert any non- 97 textual data that they may wish to send into seven-bit bytes 98 representable as printable ASCII characters before invoking 99 a local mail UA (User Agent, a program with which human 100 users send and receive mail). Examples of such encodings 101 currently used in the Internet include pure hexadecimal, 102 uuencode, the 3-in-4 base 64 scheme specified in RFC 1113, 103 the Andrew Toolkit Representation [ATK], and many others. 104 105 The limitations of RFC 822 mail become even more apparent as 106 gateways are designed to allow for the exchange of mail 107 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 108 specifies mechanisms for the inclusion of non-textual body 109 parts within electronic mail messages. The current 110 standards for the mapping of X.400 messages to RFC 822 111 messages specify that either X.400 non-textual body parts 112 should be converted to (not encoded in) an ASCII format, or 113 that they should be discarded, notifying the RFC 822 user 114 that discarding has occurred. This is clearly undesirable, 115 as information that a user may wish to receive is lost. 116 Even though a user's UA may not have the capability of 117 dealing with the non-textual body part, the user might have 118 some mechanism external to the UA that can extract useful 119 information from the body part. Moreover, it does not allow 120 for the fact that the message may eventually be gatewayed 121 back into an X.400 message handling system (i.e., the X.400 122 message is "tunneled" through Internet mail), where the 123 non-textual information would definitely become useful 124 again. 125 126 127 128 129 Borenstein & Freed [Page 1] 130 131 132 133 134 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 135 136 137 This document describes several mechanisms that combine to 138 solve most of these problems without introducing any serious 139 incompatibilities with the existing world of RFC 822 mail. 140 In particular, it describes: 141 142 1. A MIME-Version header field, which uses a version number 143 to declare a message to be conformant with this 144 specification and allows mail processing agents to 145 distinguish between such messages and those generated 146 by older or non-conformant software, which is presumed 147 to lack such a field. 148 149 2. A Content-Type header field, generalized from RFC 1049 150 [RFC-1049], which can be used to specify the type and 151 subtype of data in the body of a message and to fully 152 specify the native representation (encoding) of such 153 data. 154 155 2.a. A "text" Content-Type value, which can be used to 156 represent textual information in a number of 157 character sets and formatted text description 158 languages in a standardized manner. 159 160 2.b. A "multipart" Content-Type value, which can be 161 used to combine several body parts, possibly of 162 differing types of data, into a single message. 163 164 2.c. An "application" Content-Type value, which can be 165 used to transmit application data or binary data, 166 and hence, among other uses, to implement an 167 electronic mail file transfer service. 168 169 2.d. A "message" Content-Type value, for encapsulating 170 a mail message. 171 172 2.e An "image" Content-Type value, for transmitting 173 still image (picture) data. 174 175 2.f. An "audio" Content-Type value, for transmitting 176 audio or voice data. 177 178 2.g. A "video" Content-Type value, for transmitting 179 video or moving image data, possibly with audio as 180 part of the composite video data format. 181 182 3. A Content-Transfer-Encoding header field, which can be 183 used to specify an auxiliary encoding that was applied 184 to the data in order to allow it to pass through mail 185 transport mechanisms which may have data or character 186 set limitations. 187 188 4. Two optional header fields that can be used to further 189 describe the data in a message body, the Content-ID and 190 Content-Description header fields. 191 192 193 194 Borenstein & Freed [Page 2] 195 196 197 198 199 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 200 201 202 MIME has been carefully designed as an extensible mechanism, 203 and it is expected that the set of content-type/subtype 204 pairs and their associated parameters will grow 205 significantly with time. Several other MIME fields, notably 206 including character set names, are likely to have new values 207 defined over time. In order to ensure that the set of such 208 values is developed in an orderly, well-specified, and 209 public manner, MIME defines a registration process which 210 uses the Internet Assigned Numbers Authority (IANA) as a 211 central registry for such values. Appendix F provides 212 details about how IANA registration is accomplished. 213 214 Finally, to specify and promote interoperability, Appendix A 215 of this document provides a basic applicability statement 216 for a subset of the above mechanisms that defines a minimal 217 level of "conformance" with this document. 218 219 HISTORICAL NOTE: Several of the mechanisms described in 220 this document may seem somewhat strange or even baroque at 221 first reading. It is important to note that compatibility 222 with existing standards AND robustness across existing 223 practice were two of the highest priorities of the working 224 group that developed this document. In particular, 225 compatibility was always favored over elegance. 226 227 2 Notations, Conventions, and Generic BNF Grammar 228 229 This document is being published in two versions, one as 230 plain ASCII text and one as PostScript. The latter is 231 recommended, though the textual contents are identical. An 232 Andrew-format copy of this document is also available from 233 the first author (Borenstein). 234 235 Although the mechanisms specified in this document are all 236 described in prose, most are also described formally in the 237 modified BNF notation of RFC 822. Implementors will need to 238 be familiar with this notation in order to understand this 239 specification, and are referred to RFC 822 for a complete 240 explanation of the modified BNF notation. 241 242 Some of the modified BNF in this document makes reference to 243 syntactic entities that are defined in RFC 822 and not in 244 this document. A complete formal grammar, then, is obtained 245 by combining the collected grammar appendix of this document 246 with that of RFC 822. 247 248 The term CRLF, in this document, refers to the sequence of 249 the two ASCII characters CR (13) and LF (10) which, taken 250 together, in this order, denote a line break in RFC 822 251 mail. 252 253 The term "character set", wherever it is used in this 254 document, refers to a coded character set, in the sense of 255 ISO character set standardization work, and must not be 256 257 258 259 Borenstein & Freed [Page 3] 260 261 262 263 264 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 265 266 267 misinterpreted as meaning "a set of characters." 268 269 The term "message", when not further qualified, means either 270 the (complete or "top-level") message being transferred on a 271 network, or a message encapsulated in a body of type 272 "message". 273 274 The term "body part", in this document, means one of the 275 parts of the body of a multipart entity. A body part has a 276 header and a body, so it makes sense to speak about the body 277 of a body part. 278 279 The term "entity", in this document, means either a message 280 or a body part. All kinds of entities share the property 281 that they have a header and a body. 282 283 The term "body", when not further qualified, means the body 284 of an entity, that is the body of either a message or of a 285 body part. 286 287 Note : the previous four definitions are clearly circular. 288 This is unavoidable, since the overal structure of a MIME 289 message is indeed recursive. 290 291 In this document, all numeric and octet values are given in 292 decimal notation. 293 294 It must be noted that Content-Type values, subtypes, and 295 parameter names as defined in this document are case- 296 insensitive. However, parameter values are case-sensitive 297 unless otherwise specified for the specific parameter. 298 299 FORMATTING NOTE: This document has been carefully formatted 300 for ease of reading. The PostScript version of this 301 document, in particular, places notes like this one, which 302 may be skipped by the reader, in a smaller, italicized, 303 font, and indents it as well. In the text version, only the 304 indentation is preserved, so if you are reading the text 305 version of this you might consider using the PostScript 306 version instead. However, all such notes will be indented 307 and preceded by "NOTE:" or some similar introduction, even 308 in the text version. 309 310 The primary purpose of these non-essential notes is to 311 convey information about the rationale of this document, or 312 to place this document in the proper historical or 313 evolutionary context. Such information may be skipped by 314 those who are focused entirely on building a compliant 315 implementation, but may be of use to those who wish to 316 understand why this document is written as it is. 317 318 For ease of recognition, all BNF definitions have been 319 placed in a fixed-width font in the PostScript version of 320 this document. 321 322 323 324 Borenstein & Freed [Page 4] 325 326 327 328 329 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 330 331 332 3 The MIME-Version Header Field 333 334 Since RFC 822 was published in 1982, there has really been 335 only one format standard for Internet messages, and there 336 has been little perceived need to declare the format 337 standard in use. This document is an independent document 338 that complements RFC 822. Although the extensions in this 339 document have been defined in such a way as to be compatible 340 with RFC 822, there are still circumstances in which it 341 might be desirable for a mail-processing agent to know 342 whether a message was composed with the new standard in 343 mind. 344 345 Therefore, this document defines a new header field, "MIME- 346 Version", which is to be used to declare the version of the 347 Internet message body format standard in use. 348 349 Messages composed in accordance with this document MUST 350 include such a header field, with the following verbatim 351 text: 352 353 MIME-Version: 1.0 354 355 The presence of this header field is an assertion that the 356 message has been composed in compliance with this document. 357 358 Since it is possible that a future document might extend the 359 message format standard again, a formal BNF is given for the 360 content of the MIME-Version field: 361 362 MIME-Version := text 363 364 Thus, future format specifiers, which might replace or 365 extend "1.0", are (minimally) constrained by the definition 366 of "text", which appears in RFC 822. 367 368 Note that the MIME-Version header field is required at the 369 top level of a message. It is not required for each body 370 part of a multipart entity. It is required for the embedded 371 headers of a body of type "message" if and only if the 372 embedded message is itself claimed to be MIME-compliant. 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 Borenstein & Freed [Page 5] 390 391 392 393 394 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 395 396 397 4 The Content-Type Header Field 398 399 The purpose of the Content-Type field is to describe the 400 data contained in the body fully enough that the receiving 401 user agent can pick an appropriate agent or mechanism to 402 present the data to the user, or otherwise deal with the 403 data in an appropriate manner. 404 405 HISTORICAL NOTE: The Content-Type header field was first 406 defined in RFC 1049. RFC 1049 Content-types used a simpler 407 and less powerful syntax, but one that is largely compatible 408 with the mechanism given here. 409 410 The Content-Type header field is used to specify the nature 411 of the data in the body of an entity, by giving type and 412 subtype identifiers, and by providing auxiliary information 413 that may be required for certain types. After the type and 414 subtype names, the remainder of the header field is simply a 415 set of parameters, specified in an attribute/value notation. 416 The set of meaningful parameters differs for the different 417 types. The ordering of parameters is not significant. 418 Among the defined parameters is a "charset" parameter by 419 which the character set used in the body may be declared. 420 Comments are allowed in accordance with RFC 822 rules for 421 structured header fields. 422 423 In general, the top-level Content-Type is used to declare 424 the general type of data, while the subtype specifies a 425 specific format for that type of data. Thus, a Content-Type 426 of "image/xyz" is enough to tell a user agent that the data 427 is an image, even if the user agent has no knowledge of the 428 specific image format "xyz". Such information can be used, 429 for example, to decide whether or not to show a user the raw 430 data from an unrecognized subtype -- such an action might be 431 reasonable for unrecognized subtypes of text, but not for 432 unrecognized subtypes of image or audio. For this reason, 433 registered subtypes of audio, image, text, and video, should 434 not contain embedded information that is really of a 435 different type. Such compound types should be represented 436 using the "multipart" or "application" types. 437 438 Parameters are modifiers of the content-subtype, and do not 439 fundamentally affect the requirements of the host system. 440 Although most parameters make sense only with certain 441 content-types, others are "global" in the sense that they 442 might apply to any subtype. For example, the "boundary" 443 parameter makes sense only for the "multipart" content-type, 444 but the "charset" parameter might make sense with several 445 content-types. 446 447 An initial set of seven Content-Types is defined by this 448 document. This set of top-level names is intended to be 449 substantially complete. It is expected that additions to 450 the larger set of supported types can generally be 451 452 453 454 Borenstein & Freed [Page 6] 455 456 457 458 459 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 460 461 462 accomplished by the creation of new subtypes of these 463 initial types. In the future, more top-level types may be 464 defined only by an extension to this standard. If another 465 primary type is to be used for any reason, it must be given 466 a name starting with "X-" to indicate its non-standard 467 status and to avoid a potential conflict with a future 468 official name. 469 470 In the Extended BNF notation of RFC 822, a Content-Type 471 header field value is defined as follows: 472 473 Content-Type := type "/" subtype *[";" parameter] 474 475 type := "application" / "audio" 476 / "image" / "message" 477 / "multipart" / "text" 478 / "video" / x-token 479 480 x-token := <The two characters "X-" followed, with no 481 intervening white space, by any token> 482 483 subtype := token 484 485 parameter := attribute "=" value 486 487 attribute := token 488 489 value := token / quoted-string 490 491 token := 1*<any CHAR except SPACE, CTLs, or tspecials> 492 493 tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in 494 / "," / ";" / ":" / "\" / <"> ; quoted-string, 495 / "/" / "[" / "]" / "?" / "." ; to use within 496 / "=" ; parameter values 497 498 Note that the definition of "tspecials" is the same as the 499 RFC 822 definition of "specials" with the addition of the 500 three characters "/", "?", and "=". 501 502 Note also that a subtype specification is MANDATORY. There 503 are no default subtypes. 504 505 The type, subtype, and parameter names are not case 506 sensitive. For example, TEXT, Text, and TeXt are all 507 equivalent. Parameter values are normally case sensitive, 508 but certain parameters are interpreted to be case- 509 insensitive, depending on the intended use. (For example, 510 multipart boundaries are case-sensitive, but the "access- 511 type" for message/External-body is not case-sensitive.) 512 513 Beyond this syntax, the only constraint on the definition of 514 subtype names is the desire that their uses must not 515 conflict. That is, it would be undesirable to have two 516 517 518 519 Borenstein & Freed [Page 7] 520 521 522 523 524 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 525 526 527 different communities using "Content-Type: 528 application/foobar" to mean two different things. The 529 process of defining new content-subtypes, then, is not 530 intended to be a mechanism for imposing restrictions, but 531 simply a mechanism for publicizing the usages. There are, 532 therefore, two acceptable mechanisms for defining new 533 Content-Type subtypes: 534 535 1. Private values (starting with "X-") may be 536 defined bilaterally between two cooperating 537 agents without outside registration or 538 standardization. 539 540 2. New standard values must be documented, 541 registered with, and approved by IANA, as 542 described in Appendix F. Where intended for 543 public use, the formats they refer to must 544 also be defined by a published specification, 545 and possibly offered for standardization. 546 547 The seven standard initial predefined Content-Types are 548 detailed in the bulk of this document. They are: 549 550 text -- textual information. The primary subtype, 551 "plain", indicates plain (unformatted) text. No 552 special software is required to get the full 553 meaning of the text, aside from support for the 554 indicated character set. Subtypes are to be used 555 for enriched text in forms where application 556 software may enhance the appearance of the text, 557 but such software must not be required in order to 558 get the general idea of the content. Possible 559 subtypes thus include any readable word processor 560 format. A very simple and portable subtype, 561 richtext, is defined in this document. 562 multipart -- data consisting of multiple parts of 563 independent data types. Four initial subtypes 564 are defined, including the primary "mixed" 565 subtype, "alternative" for representing the same 566 data in multiple formats, "parallel" for parts 567 intended to be viewed simultaneously, and "digest" 568 for multipart entities in which each part is of 569 type "message". 570 message -- an encapsulated message. A body of 571 Content-Type "message" is itself a fully formatted 572 RFC 822 conformant message which may contain its 573 own different Content-Type header field. The 574 primary subtype is "rfc822". The "partial" 575 subtype is defined for partial messages, to permit 576 the fragmented transmission of bodies that are 577 thought to be too large to be passed through mail 578 transport facilities. Another subtype, 579 "External-body", is defined for specifying large 580 bodies by reference to an external data source. 581 582 583 584 Borenstein & Freed [Page 8] 585 586 587 588 589 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 590 591 592 image -- image data. Image requires a display device 593 (such as a graphical display, a printer, or a FAX 594 machine) to view the information. Initial 595 subtypes are defined for two widely-used image 596 formats, jpeg and gif. 597 audio -- audio data, with initial subtype "basic". 598 Audio requires an audio output device (such as a 599 speaker or a telephone) to "display" the contents. 600 video -- video data. Video requires the capability to 601 display moving images, typically including 602 specialized hardware and software. The initial 603 subtype is "mpeg". 604 application -- some other kind of data, typically 605 either uninterpreted binary data or information to 606 be processed by a mail-based application. The 607 primary subtype, "octet-stream", is to be used in 608 the case of uninterpreted binary data, in which 609 case the simplest recommended action is to offer 610 to write the information into a file for the user. 611 Two additional subtypes, "ODA" and "PostScript", 612 are defined for transporting ODA and PostScript 613 documents in bodies. Other expected uses for 614 "application" include spreadsheets, data for 615 mail-based scheduling systems, and languages for 616 "active" (computational) email. (Note that active 617 email entails several securityconsiderations, 618 which are discussed later in this memo, 619 particularly in the context of 620 application/PostScript.) 621 622 Default RFC 822 messages are typed by this protocol as plain 623 text in the US-ASCII character set, which can be explicitly 624 specified as "Content-type: text/plain; charset=us-ascii". 625 If no Content-Type is specified, either by error or by an 626 older user agent, this default is assumed. In the presence 627 of a MIME-Version header field, a receiving User Agent can 628 also assume that plain US-ASCII text was the sender's 629 intent. In the absence of a MIME-Version specification, 630 plain US-ASCII text must still be assumed, but the sender's 631 intent might have been otherwise. 632 633 RATIONALE: In the absence of any Content-Type header field 634 or MIME-Version header field, it is impossible to be certain 635 that a message is actually text in the US-ASCII character 636 set, since it might well be a message that, using the 637 conventions that predate this document, includes text in 638 another character set or non-textual data in a manner that 639 cannot be automatically recognized (e.g., a uuencoded 640 compressed UNIX tar file). Although there is no fully 641 acceptable alternative to treating such untyped messages as 642 "text/plain; charset=us-ascii", implementors should remain 643 aware that if a message lacks both the MIME-Version and the 644 Content-Type header fields, it may in practice contain 645 almost anything. 646 647 648 649 Borenstein & Freed [Page 9] 650 651 652 653 654 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 655 656 657 It should be noted that the list of Content-Type values 658 given here may be augmented in time, via the mechanisms 659 described above, and that the set of subtypes is expected to 660 grow substantially. 661 662 When a mail reader encounters mail with an unknown Content- 663 type value, it should generally treat it as equivalent to 664 "application/octet-stream", as described later in this 665 document. 666 667 5 The Content-Transfer-Encoding Header Field 668 669 Many Content-Types which could usefully be transported via 670 email are represented, in their "natural" format, as 8-bit 671 character or binary data. Such data cannot be transmitted 672 over some transport protocols. For example, RFC 821 673 restricts mail messages to 7-bit US-ASCII data with 1000 674 character lines. 675 676 It is necessary, therefore, to define a standard mechanism 677 for re-encoding such data into a 7-bit short-line format. 678 This document specifies that such encodings will be 679 indicated by a new "Content-Transfer-Encoding" header field. 680 The Content-Transfer-Encoding field is used to indicate the 681 type of transformation that has been used in order to 682 represent the body in an acceptable manner for transport. 683 684 Unlike Content-Types, a proliferation of Content-Transfer- 685 Encoding values is undesirable and unnecessary. However, 686 establishing only a single Content-Transfer-Encoding 687 mechanism does not seem possible. There is a tradeoff 688 between the desire for a compact and efficient encoding of 689 largely-binary data and the desire for a readable encoding 690 of data that is mostly, but not entirely, 7-bit data. For 691 this reason, at least two encoding mechanisms are necessary: 692 a "readable" encoding and a "dense" encoding. 693 694 The Content-Transfer-Encoding field is designed to specify 695 an invertible mapping between the "native" representation of 696 a type of data and a representation that can be readily 697 exchanged using 7 bit mail transport protocols, such as 698 those defined by RFC 821 (SMTP). This field has not been 699 defined by any previous standard. The field's value is a 700 single token specifying the type of encoding, as enumerated 701 below. Formally: 702 703 Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE" / 704 "8BIT" / "7BIT" / 705 "BINARY" / x-token 706 707 These values are not case sensitive. That is, Base64 and 708 BASE64 and bAsE64 are all equivalent. An encoding type of 709 7BIT requires that the body is already in a seven-bit mail- 710 ready representation. This is the default value -- that is, 711 712 713 714 Borenstein & Freed [Page 10] 715 716 717 718 719 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 720 721 722 "Content-Transfer-Encoding: 7BIT" is assumed if the 723 Content-Transfer-Encoding header field is not present. 724 725 The values "8bit", "7bit", and "binary" all imply that NO 726 encoding has been performed. However, they are potentially 727 useful as indications of the kind of data contained in the 728 object, and therefore of the kind of encoding that might 729 need to be performed for transmission in a given transport 730 system. "7bit" means that the data is all represented as 731 short lines of US-ASCII data. "8bit" means that the lines 732 are short, but there may be non-ASCII characters (octets 733 with the high-order bit set). "Binary" means that not only 734 may non-ASCII characters be present, but also that the lines 735 are not necessarily short enough for SMTP transport. 736 737 The difference between "8bit" (or any other conceivable 738 bit-width token) and the "binary" token is that "binary" 739 does not require adherence to any limits on line length or 740 to the SMTP CRLF semantics, while the bit-width tokens do 741 require such adherence. If the body contains data in any 742 bit-width other than 7-bit, the appropriate bit-width 743 Content-Transfer-Encoding token must be used (e.g., "8bit" 744 for unencoded 8 bit wide data). If the body contains binary 745 data, the "binary" Content-Transfer-Encoding token must be 746 used. 747 748 NOTE: The distinction between the Content-Transfer-Encoding 749 values of "binary," "8bit," etc. may seem unimportant, in 750 that all of them really mean "none" -- that is, there has 751 been no encoding of the data for transport. However, clear 752 labeling will be of enormous value to gateways between 753 future mail transport systems with differing capabilities in 754 transporting data that do not meet the restrictions of RFC 755 821 transport. 756 757 As of the publication of this document, there are no 758 standardized Internet transports for which it is legitimate 759 to include unencoded 8-bit or binary data in mail bodies. 760 Thus there are no circumstances in which the "8bit" or 761 "binary" Content-Transfer-Encoding is actually legal on the 762 Internet. However, in the event that 8-bit or binary mail 763 transport becomes a reality in Internet mail, or when this 764 document is used in conjunction with any other 8-bit or 765 binary-capable transport mechanism, 8-bit or binary bodies 766 should be labeled as such using this mechanism. 767 768 NOTE: The five values defined for the Content-Transfer- 769 Encoding field imply nothing about the Content-Type other 770 than the algorithm by which it was encoded or the transport 771 system requirements if unencoded. 772 773 Implementors may, if necessary, define new Content- 774 Transfer-Encoding values, but must use an x-token, which is 775 a name prefixed by "X-" to indicate its non-standard status, 776 777 778 779 Borenstein & Freed [Page 11] 780 781 782 783 784 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 785 786 787 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 788 However, unlike Content-Types and subtypes, the creation of 789 new Content-Transfer-Encoding values is explicitly and 790 strongly discouraged, as it seems likely to hinder 791 interoperability with little potential benefit. Their use 792 is allowed only as the result of an agreement between 793 cooperating user agents. 794 795 If a Content-Transfer-Encoding header field appears as part 796 of a message header, it applies to the entire body of that 797 message. If a Content-Transfer-Encoding header field 798 appears as part of a body part's headers, it applies only to 799 the body of that body part. If an entity is of type 800 "multipart" or "message", the Content-Transfer-Encoding is 801 not permitted to have any value other than a bit width 802 (e.g., "7bit", "8bit", etc.) or "binary". 803 804 It should be noted that email is character-oriented, so that 805 the mechanisms described here are mechanisms for encoding 806 arbitrary byte streams, not bit streams. If a bit stream is 807 to be encoded via one of these mechanisms, it must first be 808 converted to an 8-bit byte stream using the network standard 809 bit order ("big-endian"), in which the earlier bits in a 810 stream become the higher-order bits in a byte. A bit stream 811 not ending at an 8-bit boundary must be padded with zeroes. 812 This document provides a mechanism for noting the addition 813 of such padding in the case of the application Content-Type, 814 which has a "padding" parameter. 815 816 The encoding mechanisms defined here explicitly encode all 817 data in ASCII. Thus, for example, suppose an entity has 818 header fields such as: 819 820 Content-Type: text/plain; charset=ISO-8859-1 821 Content-transfer-encoding: base64 822 823 This should be interpreted to mean that the body is a base64 824 ASCII encoding of data that was originally in ISO-8859-1, 825 and will be in that character set again after decoding. 826 827 The following sections will define the two standard encoding 828 mechanisms. The definition of new content-transfer- 829 encodings is explicitly discouraged and should only occur 830 when absolutely necessary. All content-transfer-encoding 831 namespace except that beginning with "X-" is explicitly 832 reserved to the IANA for future use. Private agreements 833 about content-transfer-encodings are also explicitly 834 discouraged. 835 836 Certain Content-Transfer-Encoding values may only be used on 837 certain Content-Types. In particular, it is expressly 838 forbidden to use any encodings other than "7bit", "8bit", or 839 "binary" with any Content-Type that recursively includes 840 other Content-Type fields, notably the "multipart" and 841 842 843 844 Borenstein & Freed [Page 12] 845 846 847 848 849 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 850 851 852 "message" Content-Types. All encodings that are desired for 853 bodies of type multipart or message must be done at the 854 innermost level, by encoding the actual body that needs to 855 be encoded. 856 857 NOTE ON ENCODING RESTRICTIONS: Though the prohibition 858 against using content-transfer-encodings on data of type 859 multipart or message may seem overly restrictive, it is 860 necessary to prevent nested encodings, in which data are 861 passed through an encoding algorithm multiple times, and 862 must be decoded multiple times in order to be properly 863 viewed. Nested encodings add considerable complexity to 864 user agents: aside from the obvious efficiency problems 865 with such multiple encodings, they can obscure the basic 866 structure of a message. In particular, they can imply that 867 several decoding operations are necessary simply to find out 868 what types of objects a message contains. Banning nested 869 encodings may complicate the job of certain mail gateways, 870 but this seems less of a problem than the effect of nested 871 encodings on user agents. 872 873 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 874 TRANSFER-ENCODING: It may seem that the Content-Transfer- 875 Encoding could be inferred from the characteristics of the 876 Content-Type that is to be encoded, or, at the very least, 877 that certain Content-Transfer-Encodings could be mandated 878 for use with specific Content-Types. There are several 879 reasons why this is not the case. First, given the varying 880 types of transports used for mail, some encodings may be 881 appropriate for some Content-Type/transport combinations and 882 not for others. (For example, in an 8-bit transport, no 883 encoding would be required for text in certain character 884 sets, while such encodings are clearly required for 7-bit 885 SMTP.) Second, certain Content-Types may require different 886 types of transfer encoding under different circumstances. 887 For example, many PostScript bodies might consist entirely 888 of short lines of 7-bit data and hence require little or no 889 encoding. Other PostScript bodies (especially those using 890 Level 2 PostScript's binary encoding mechanism) may only be 891 reasonably represented using a binary transport encoding. 892 Finally, since Content-Type is intended to be an open-ended 893 specification mechanism, strict specification of an 894 association between Content-Types and encodings effectively 895 couples the specification of an application protocol with a 896 specific lower-level transport. This is not desirable since 897 the developers of a Content-Type should not have to be aware 898 of all the transports in use and what their limitations are. 899 900 NOTE ON TRANSLATING ENCODINGS: The quoted-printable and 901 base64 encodings are designed so that conversion between 902 them is possible. The only issue that arises in such a 903 conversion is the handling of line breaks. When converting 904 from quoted-printable to base64 a line break must be 905 converted into a CRLF sequence. Similarly, a CRLF sequence 906 907 908 909 Borenstein & Freed [Page 13] 910 911 912 913 914 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 915 916 917 in base64 data should be converted to a quoted-printable 918 line break, but ONLY when converting text data. 919 920 NOTE ON CANONICAL ENCODING MODEL: There was some 921 confusion, in earlier drafts of this memo, regarding the 922 model for when email data was to be converted to canonical 923 form and encoded, and in particular how this process would 924 affect the treatment of CRLFs, given that the representation 925 of newlines varies greatly from system to system. For this 926 reason, a canonical model for encoding is presented as 927 Appendix H. 928 929 5.1 Quoted-Printable Content-Transfer-Encoding 930 931 The Quoted-Printable encoding is intended to represent data 932 that largely consists of octets that correspond to printable 933 characters in the ASCII character set. It encodes the data 934 in such a way that the resulting octets are unlikely to be 935 modified by mail transport. If the data being encoded are 936 mostly ASCII text, the encoded form of the data remains 937 largely recognizable by humans. A body which is entirely 938 ASCII may also be encoded in Quoted-Printable to ensure the 939 integrity of the data should the message pass through a 940 character-translating, and/or line-wrapping gateway. 941 942 In this encoding, octets are to be represented as determined 943 by the following rules: 944 945 Rule #1: (General 8-bit representation) Any octet, 946 except those indicating a line break according to the 947 newline convention of the canonical form of the data 948 being encoded, may be represented by an "=" followed by 949 a two digit hexadecimal representation of the octet's 950 value. The digits of the hexadecimal alphabet, for this 951 purpose, are "0123456789ABCDEF". Uppercase letters must 952 be 953 used when sending hexadecimal data, though a robust 954 implementation may choose to recognize lowercase 955 letters on receipt. Thus, for example, the value 12 956 (ASCII form feed) can be represented by "=0C", and the 957 value 61 (ASCII EQUAL SIGN) can be represented by 958 "=3D". Except when the following rules allow an 959 alternative encoding, this rule is mandatory. 960 961 Rule #2: (Literal representation) Octets with decimal 962 values of 33 through 60 inclusive, and 62 through 126, 963 inclusive, MAY be represented as the ASCII characters 964 which correspond to those octets (EXCLAMATION POINT 965 through LESS THAN, and GREATER THAN through TILDE, 966 respectively). 967 968 Rule #3: (White Space): Octets with values of 9 and 32 969 MAY be represented as ASCII TAB (HT) and SPACE 970 characters, respectively, but MUST NOT be so 971 972 973 974 Borenstein & Freed [Page 14] 975 976 977 978 979 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 980 981 982 represented at the end of an encoded line. Any TAB (HT) 983 or SPACE characters on an encoded line MUST thus be 984 followed on that line by a printable character. In 985 particular, an "=" at the end of an encoded line, 986 indicating a soft line break (see rule #5) may follow 987 one or more TAB (HT) or SPACE characters. It follows 988 that an octet with value 9 or 32 appearing at the end 989 of an encoded line must be represented according to 990 Rule #1. This rule is necessary because some MTAs 991 (Message Transport Agents, programs which transport 992 messages from one user to another, or perform a part of 993 such transfers) are known to pad lines of text with 994 SPACEs, and others are known to remove "white space" 995 characters from the end of a line. Therefore, when 996 decoding a Quoted-Printable body, any trailing white 997 space on a line must be deleted, as it will necessarily 998 have been added by intermediate transport agents. 999 1000 Rule #4 (Line Breaks): A line break in a text body 1001 part, independent of what its representation is 1002 following the canonical representation of the data 1003 being encoded, must be represented by a (RFC 822) line 1004 break, which is a CRLF sequence, in the Quoted- 1005 Printable encoding. If isolated CRs and LFs, or LF CR 1006 and CR LF sequences are allowed to appear in binary 1007 data according to the canonical form, they must be 1008 represented using the "=0D", "=0A", "=0A=0D" and 1009 "=0D=0A" notations respectively. 1010 1011 Note that many implementation may elect to encode the 1012 local representation of various content types directly. 1013 In particular, this may apply to plain text material on 1014 systems that use newline conventions other than CRLF 1015 delimiters. Such an implementation is permissible, but 1016 the generation of line breaks must be generalized to 1017 account for the case where alternate representations of 1018 newline sequences are used. 1019 1020 Rule #5 (Soft Line Breaks): The Quoted-Printable 1021 encoding REQUIRES that encoded lines be no more than 76 1022 characters long. If longer lines are to be encoded with 1023 the Quoted-Printable encoding, 'soft' line breaks must 1024 be used. An equal sign as the last character on a 1025 encoded line indicates such a non-significant ('soft') 1026 line break in the encoded text. Thus if the "raw" form 1027 of the line is a single unencoded line that says: 1028 1029 Now's the time for all folk to come to the aid of 1030 their country. 1031 1032 This can be represented, in the Quoted-Printable 1033 encoding, as 1034 1035 1036 1037 1038 1039 Borenstein & Freed [Page 15] 1040 1041 1042 1043 1044 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1045 1046 1047 Now's the time = 1048 for all folk to come= 1049 to the aid of their country. 1050 1051 This provides a mechanism with which long lines are 1052 encoded in such a way as to be restored by the user 1053 agent. The 76 character limit does not count the 1054 trailing CRLF, but counts all other characters, 1055 including any equal signs. 1056 1057 Since the hyphen character ("-") is represented as itself in 1058 the Quoted-Printable encoding, care must be taken, when 1059 encapsulating a quoted-printable encoded body in a multipart 1060 entity, to ensure that the encapsulation boundary does not 1061 appear anywhere in the encoded body. (A good strategy is to 1062 choose a boundary that includes a character sequence such as 1063 "=_" which can never appear in a quoted-printable body. See 1064 the definition of multipart messages later in this 1065 document.) 1066 1067 NOTE: The quoted-printable encoding represents something of 1068 a compromise between readability and reliability in 1069 transport. Bodies encoded with the quoted-printable 1070 encoding will work reliably over most mail gateways, but may 1071 not work perfectly over a few gateways, notably those 1072 involving translation into EBCDIC. (In theory, an EBCDIC 1073 gateway could decode a quoted-printable body and re-encode 1074 it using base64, but such gateways do not yet exist.) A 1075 higher level of confidence is offered by the base64 1076 Content-Transfer-Encoding. A way to get reasonably reliable 1077 transport through EBCDIC gateways is to also quote the ASCII 1078 characters 1079 1080 !"#$@[\]^`{|}~ 1081 1082 according to rule #1. See Appendix B for more information. 1083 1084 Because quoted-printable data is generally assumed to be 1085 line-oriented, it is to be expected that the breaks between 1086 the lines of quoted printable data may be altered in 1087 transport, in the same manner that plain text mail has 1088 always been altered in Internet mail when passing between 1089 systems with differing newline conventions. If such 1090 alterations are likely to constitute a corruption of the 1091 data, it is probably more sensible to use the base64 1092 encoding rather than the quoted-printable encoding. 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 Borenstein & Freed [Page 16] 1105 1106 1107 1108 1109 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1110 1111 1112 5.2 Base64 Content-Transfer-Encoding 1113 1114 The Base64 Content-Transfer-Encoding is designed to 1115 represent arbitrary sequences of octets in a form that is 1116 not humanly readable. The encoding and decoding algorithms 1117 are simple, but the encoded data are consistently only about 1118 33 percent larger than the unencoded data. This encoding is 1119 based on the one used in Privacy Enhanced Mail applications, 1120 as defined in RFC 1113. The base64 encoding is adapted 1121 from RFC 1113, with one change: base64 eliminates the "*" 1122 mechanism for embedded clear text. 1123 1124 A 65-character subset of US-ASCII is used, enabling 6 bits 1125 to be represented per printable character. (The extra 65th 1126 character, "=", is used to signify a special processing 1127 function.) 1128 1129 NOTE: This subset has the important property that it is 1130 represented identically in all versions of ISO 646, 1131 including US ASCII, and all characters in the subset are 1132 also represented identically in all versions of EBCDIC. 1133 Other popular encodings, such as the encoding used by the 1134 UUENCODE utility and the base85 encoding specified as part 1135 of Level 2 PostScript, do not share these properties, and 1136 thus do not fulfill the portability requirements a binary 1137 transport encoding for mail must meet. 1138 1139 The encoding process represents 24-bit groups of input bits 1140 as output strings of 4 encoded characters. Proceeding from 1141 left to right, a 24-bit input group is formed by 1142 concatenating 3 8-bit input groups. These 24 bits are then 1143 treated as 4 concatenated 6-bit groups, each of which is 1144 translated into a single digit in the base64 alphabet. When 1145 encoding a bit stream via the base64 encoding, the bit 1146 stream must be presumed to be ordered with the most- 1147 significant-bit first. That is, the first bit in the stream 1148 will be the high-order bit in the first byte, and the eighth 1149 bit will be the low-order bit in the first byte, and so on. 1150 1151 Each 6-bit group is used as an index into an array of 64 1152 printable characters. The character referenced by the index 1153 is placed in the output string. These characters, identified 1154 in Table 1, below, are selected so as to be universally 1155 representable, and the set excludes characters with 1156 particular significance to SMTP (e.g., ".", "CR", "LF") and 1157 to the encapsulation boundaries defined in this document 1158 (e.g., "-"). 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 Borenstein & Freed [Page 17] 1170 1171 1172 1173 1174 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1175 1176 1177 Table 1: The Base64 Alphabet 1178 1179 Value Encoding Value Encoding Value Encoding Value 1180 Encoding 1181 0 A 17 R 34 i 51 z 1182 1 B 18 S 35 j 52 0 1183 2 C 19 T 36 k 53 1 1184 3 D 20 U 37 l 54 2 1185 4 E 21 V 38 m 55 3 1186 5 F 22 W 39 n 56 4 1187 6 G 23 X 40 o 57 5 1188 7 H 24 Y 41 p 58 6 1189 8 I 25 Z 42 q 59 7 1190 9 J 26 a 43 r 60 8 1191 10 K 27 b 44 s 61 9 1192 11 L 28 c 45 t 62 + 1193 12 M 29 d 46 u 63 / 1194 13 N 30 e 47 v 1195 14 O 31 f 48 w (pad) = 1196 15 P 32 g 49 x 1197 16 Q 33 h 50 y 1198 1199 The output stream (encoded bytes) must be represented in 1200 lines of no more than 76 characters each. All line breaks 1201 or other characters not found in Table 1 must be ignored by 1202 decoding software. In base64 data, characters other than 1203 those in Table 1, line breaks, and other white space 1204 probably indicate a transmission error, about which a 1205 warning message or even a message rejection might be 1206 appropriate under some circumstances. 1207 1208 Special processing is performed if fewer than 24 bits are 1209 available at the end of the data being encoded. A full 1210 encoding quantum is always completed at the end of a body. 1211 When fewer than 24 input bits are available in an input 1212 group, zero bits are added (on the right) to form an 1213 integral number of 6-bit groups. Output character positions 1214 which are not required to represent actual input data are 1215 set to the character "=". Since all base64 input is an 1216 integral number of octets, only the following cases can 1217 arise: (1) the final quantum of encoding input is an 1218 integral multiple of 24 bits; here, the final unit of 1219 encoded output will be an integral multiple of 4 characters 1220 with no "=" padding, (2) the final quantum of encoding input 1221 is exactly 8 bits; here, the final unit of encoded output 1222 will be two characters followed by two "=" padding 1223 characters, or (3) the final quantum of encoding input is 1224 exactly 16 bits; here, the final unit of encoded output will 1225 be three characters followed by one "=" padding character. 1226 1227 Care must be taken to use the proper octets for line breaks 1228 if base64 encoding is applied directly to text material that 1229 has not been converted to canonical form. In particular, 1230 text line breaks should be converted into CRLF sequences 1231 1232 1233 1234 Borenstein & Freed [Page 18] 1235 1236 1237 1238 1239 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1240 1241 1242 prior to base64 encoding. The important thing to note is 1243 that this may be done directly by the encoder rather than in 1244 a prior canonicalization step in some implementations. 1245 1246 NOTE: There is no need to worry about quoting apparent 1247 encapsulation boundaries within base64-encoded parts of 1248 multipart entities because no hyphen characters are used in 1249 the base64 encoding. 1250 1251 6 Additional Optional Content- Header Fields 1252 1253 6.1 Optional Content-ID Header Field 1254 1255 In constructing a high-level user agent, it may be desirable 1256 to allow one body to make reference to another. 1257 Accordingly, bodies may be labeled using the "Content-ID" 1258 header field, which is syntactically identical to the 1259 "Message-ID" header field: 1260 1261 Content-ID := msg-id 1262 1263 Like the Message-ID values, Content-ID values must be 1264 generated to be as unique as possible. 1265 1266 6.2 Optional Content-Description Header Field 1267 1268 The ability to associate some descriptive information with a 1269 given body is often desirable. For example, it may be useful 1270 to mark an "image" body as "a picture of the Space Shuttle 1271 Endeavor." Such text may be placed in the Content- 1272 Description header field. 1273 1274 Content-Description := *text 1275 1276 The description is presumed to be given in the US-ASCII 1277 character set, although the mechanism specified in [RFC- 1278 1342] may be used for non-US-ASCII Content-Description 1279 values. 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 Borenstein & Freed [Page 19] 1300 1301 1302 1303 1304 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1305 1306 1307 7 The Predefined Content-Type Values 1308 1309 This document defines seven initial Content-Type values and 1310 an extension mechanism for private or experimental types. 1311 Further standard types must be defined by new published 1312 specifications. It is expected that most innovation in new 1313 types of mail will take place as subtypes of the seven types 1314 defined here. The most essential characteristics of the 1315 seven content-types are summarized in Appendix G. 1316 1317 7.1 The Text Content-Type 1318 1319 The text Content-Type is intended for sending material which 1320 is principally textual in form. It is the default Content- 1321 Type. A "charset" parameter may be used to indicate the 1322 character set of the body text. The primary subtype of text 1323 is "plain". This indicates plain (unformatted) text. The 1324 default Content-Type for Internet mail is "text/plain; 1325 charset=us-ascii". 1326 1327 Beyond plain text, there are many formats for representing 1328 what might be known as "extended text" -- text with embedded 1329 formatting and presentation information. An interesting 1330 characteristic of many such representations is that they are 1331 to some extent readable even without the software that 1332 interprets them. It is useful, then, to distinguish them, 1333 at the highest level, from such unreadable data as images, 1334 audio, or text represented in an unreadable form. In the 1335 absence of appropriate interpretation software, it is 1336 reasonable to show subtypes of text to the user, while it is 1337 not reasonable to do so with most nontextual data. 1338 1339 Such formatted textual data should be represented using 1340 subtypes of text. Plausible subtypes of text are typically 1341 given by the common name of the representation format, e.g., 1342 "text/richtext". 1343 1344 7.1.1 The charset parameter 1345 1346 A critical parameter that may be specified in the Content- 1347 Type field for text data is the character set. This is 1348 specified with a "charset" parameter, as in: 1349 1350 Content-type: text/plain; charset=us-ascii 1351 1352 Unlike some other parameter values, the values of the 1353 charset parameter are NOT case sensitive. The default 1354 character set, which must be assumed in the absence of a 1355 charset parameter, is US-ASCII. 1356 1357 An initial list of predefined character set names can be 1358 found at the end of this section. Additional character sets 1359 may be registered with IANA as described in Appendix F, 1360 although the standardization of their use requires the usual 1361 1362 1363 1364 Borenstein & Freed [Page 20] 1365 1366 1367 1368 1369 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1370 1371 1372 IAB review and approval. Note that if the specified 1373 character set includes 8-bit data, a Content-Transfer- 1374 Encoding header field and a corresponding encoding on the 1375 data are required in order to transmit the body via some 1376 mail transfer protocols, such as SMTP. 1377 1378 The default character set, US-ASCII, has been the subject of 1379 some confusion and ambiguity in the past. Not only were 1380 there some ambiguities in the definition, there have been 1381 wide variations in practice. In order to eliminate such 1382 ambiguity and variations in the future, it is strongly 1383 recommended that new user agents explicitly specify a 1384 character set via the Content-Type header field. "US-ASCII" 1385 does not indicate an arbitrary seven-bit character code, but 1386 specifies that the body uses character coding that uses the 1387 exact correspondence of codes to characters specified in 1388 ASCII. National use variations of ISO 646 [ISO-646] are NOT 1389 ASCII and their use in Internet mail is explicitly 1390 discouraged. The omission of the ISO 646 character set is 1391 deliberate in this regard. The character set name of "US- 1392 ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only. 1393 The character set name "ASCII" is reserved and must not be 1394 used for any purpose. 1395 1396 NOTE: RFC 821 explicitly specifies "ASCII", and references 1397 an earlier version of the American Standard. Insofar as one 1398 of the purposes of specifying a Content-Type and character 1399 set is to permit the receiver to unambiguously determine how 1400 the sender intended the coded message to be interpreted, 1401 assuming anything other than "strict ASCII" as the default 1402 would risk unintentional and incompatible changes to the 1403 semantics of messages now being transmitted. This also 1404 implies that messages containing characters coded according 1405 to national variations on ISO 646, or using code-switching 1406 procedures (e.g., those of ISO 2022), as well as 8-bit or 1407 multiple octet character encodings MUST use an appropriate 1408 character set specification to be consistent with this 1409 specification. 1410 1411 The complete US-ASCII character set is listed in [US-ASCII]. 1412 Note that the control characters including DEL (0-31, 127) 1413 have no defined meaning apart from the combination CRLF 1414 (ASCII values 13 and 10) indicating a new line. Two of the 1415 characters have de facto meanings in wide use: FF (12) often 1416 means "start subsequent text on the beginning of a new 1417 page"; and TAB or HT (9) often (though not always) means 1418 "move the cursor to the next available column after the 1419 current position where the column number is a multiple of 8 1420 (counting the first column as column 0)." Apart from this, 1421 any use of the control characters or DEL in a body must be 1422 part of a private agreement between the sender and 1423 recipient. Such private agreements are discouraged and 1424 should be replaced by the other capabilities of this 1425 document. 1426 1427 1428 1429 Borenstein & Freed [Page 21] 1430 1431 1432 1433 1434 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1435 1436 1437 NOTE: Beyond US-ASCII, an enormous proliferation of 1438 character sets is possible. It is the opinion of the IETF 1439 working group that a large number of character sets is NOT a 1440 good thing. We would prefer to specify a single character 1441 set that can be used universally for representing all of the 1442 world's languages in electronic mail. Unfortunately, 1443 existing practice in several communities seems to point to 1444 the continued use of multiple character sets in the near 1445 future. For this reason, we define names for a small number 1446 of character sets for which a strong constituent base 1447 exists. It is our hope that ISO 10646 or some other 1448 effort will eventually define a single world character set 1449 which can then be specified for use in Internet mail, but in 1450 the advance of that definition we cannot specify the use of 1451 ISO 10646, Unicode, or any other character set whose 1452 definition is, as of this writing, incomplete. 1453 1454 The defined charset values are: 1455 1456 US-ASCII -- as defined in [US-ASCII]. 1457 1458 ISO-8859-X -- where "X" is to be replaced, as 1459 necessary, for the parts of ISO-8859 [ISO- 1460 8859]. Note that the ISO 646 character sets 1461 have deliberately been omitted in favor of 1462 their 8859 replacements, which are the 1463 designated character sets for Internet mail. 1464 As of the publication of this document, the 1465 legitimate values for "X" are the digits 1 1466 through 9. 1467 1468 Note that the character set used, if anything other than 1469 US-ASCII, must always be explicitly specified in the 1470 Content-Type field. 1471 1472 No other character set name may be used in Internet mail 1473 without the publication of a formal specification and its 1474 registration with IANA as described in Appendix F, or by 1475 private agreement, in which case the character set name must 1476 begin with "X-". 1477 1478 Implementors are discouraged from defining new character 1479 sets for mail use unless absolutely necessary. 1480 1481 The "charset" parameter has been defined primarily for the 1482 purpose of textual data, and is described in this section 1483 for that reason. However, it is conceivable that non- 1484 textual data might also wish to specify a charset value for 1485 some purpose, in which case the same syntax and values 1486 should be used. 1487 1488 In general, mail-sending software should always use the 1489 "lowest common denominator" character set possible. For 1490 example, if a body contains only US-ASCII characters, it 1491 1492 1493 1494 Borenstein & Freed [Page 22] 1495 1496 1497 1498 1499 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1500 1501 1502 should be marked as being in the US-ASCII character set, not 1503 ISO-8859-1, which, like all the ISO-8859 family of character 1504 sets, is a superset of US-ASCII. More generally, if a 1505 widely-used character set is a subset of another character 1506 set, and a body contains only characters in the widely-used 1507 subset, it should be labeled as being in that subset. This 1508 will increase the chances that the recipient will be able to 1509 view the mail correctly. 1510 1511 7.1.2 The Text/plain subtype 1512 1513 The primary subtype of text is "plain". This indicates 1514 plain (unformatted) text. The default Content-Type for 1515 Internet mail, "text/plain; charset=us-ascii", describes 1516 existing Internet practice, that is, it is the type of body 1517 defined by RFC 822. 1518 1519 7.1.3 The Text/richtext subtype 1520 1521 In order to promote the wider interoperability of simple 1522 formatted text, this document defines an extremely simple 1523 subtype of "text", the "richtext" subtype. This subtype was 1524 designed to meet the following criteria: 1525 1526 1. The syntax must be extremely simple to parse, 1527 so that even teletype-oriented mail systems can 1528 easily strip away the formatting information and 1529 leave only the readable text. 1530 1531 2. The syntax must be extensible to allow for new 1532 formatting commands that are deemed essential. 1533 1534 3. The capabilities must be extremely limited, to 1535 ensure that it can represent no more than is 1536 likely to be representable by the user's primary 1537 word processor. While this limits what can be 1538 sent, it increases the likelihood that what is 1539 sent can be properly displayed. 1540 1541 4. The syntax must be compatible with SGML, so 1542 that, with an appropriate DTD (Document Type 1543 Definition, the standard mechanism for defining a 1544 document type using SGML), a general SGML parser 1545 could be made to parse richtext. However, despite 1546 this compatibility, the syntax should be far 1547 simpler than full SGML, so that no SGML knowledge 1548 is required in order to implement it. 1549 1550 The syntax of "richtext" is very simple. It is assumed, at 1551 the top-level, to be in the US-ASCII character set, unless 1552 of course a different charset parameter was specified in the 1553 Content-type field. All characters represent themselves, 1554 with the exception of the "<" character (ASCII 60), which is 1555 used to mark the beginning of a formatting command. 1556 1557 1558 1559 Borenstein & Freed [Page 23] 1560 1561 1562 1563 1564 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1565 1566 1567 Formatting instructions consist of formatting commands 1568 surrounded by angle brackets ("<>", ASCII 60 and 62). Each 1569 formatting command may be no more than 40 characters in 1570 length, all in US-ASCII, restricted to the alphanumeric and 1571 hyphen ("-") characters. Formatting commands may be preceded 1572 by a forward slash or solidus ("/", ASCII 47), making them 1573 negations, and such negations must always exist to balance 1574 the initial opening commands, except as noted below. Thus, 1575 if the formatting command "<bold>" appears at some point, 1576 there must later be a "</bold>" to balance it. There are 1577 only three exceptions to this "balancing" rule: First, the 1578 command "<lt>" is used to represent a literal "<" character. 1579 Second, the command "<nl>" is used to represent a required 1580 line break. (Otherwise, CRLFs in the data are treated as 1581 equivalent to a single SPACE character.) Finally, the 1582 command "<np>" is used to represent a page break. (NOTE: 1583 The 40 character limit on formatting commands does not 1584 include the "<", ">", or "/" characters that might be 1585 attached to such commands.) 1586 1587 Initially defined formatting commands, not all of which will 1588 be implemented by all richtext implementations, include: 1589 1590 Bold -- causes the subsequent text to be in a bold 1591 font. 1592 Italic -- causes the subsequent text to be in an italic 1593 font. 1594 Fixed -- causes the subsequent text to be in a fixed 1595 width font. 1596 Smaller -- causes the subsequent text to be in a 1597 smaller font. 1598 Bigger -- causes the subsequent text to be in a bigger 1599 font. 1600 Underline -- causes the subsequent text to be 1601 underlined. 1602 Center -- causes the subsequent text to be centered. 1603 FlushLeft -- causes the subsequent text to be left 1604 justified. 1605 FlushRight -- causes the subsequent text to be right 1606 justified. 1607 Indent -- causes the subsequent text to be indented at 1608 the left margin. 1609 IndentRight -- causes the subsequent text to be 1610 indented at the right margin. 1611 Outdent -- causes the subsequent text to be outdented 1612 at the left margin. 1613 OutdentRight -- causes the subsequent text to be 1614 outdented at the right margin. 1615 SamePage -- causes the subsequent text to be grouped, 1616 if possible, on one page. 1617 Subscript -- causes the subsequent text to be 1618 interpreted as a subscript. 1619 1620 1621 1622 1623 1624 Borenstein & Freed [Page 24] 1625 1626 1627 1628 1629 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1630 1631 1632 Superscript -- causes the subsequent text to be 1633 interpreted as a superscript. 1634 Heading -- causes the subsequent text to be interpreted 1635 as a page heading. 1636 Footing -- causes the subsequent text to be interpreted 1637 as a page footing. 1638 ISO-8859-X (for any value of X that is legal as a 1639 "charset" parameter) -- causes the subsequent text 1640 to be interpreted as text in the appropriate 1641 character set. 1642 US-ASCII -- causes the subsequent text to be 1643 interpreted as text in the US-ASCII character set. 1644 Excerpt -- causes the subsequent text to be interpreted 1645 as a textual excerpt from another source. 1646 Typically this will be displayed using indentation 1647 and an alternate font, but such decisions are up 1648 to the viewer. 1649 Paragraph -- causes the subsequent text to be 1650 interpreted as a single paragraph, with 1651 appropriate paragraph breaks (typically blank 1652 space) before and after. 1653 Signature -- causes the subsequent text to be 1654 interpreted as a "signature". Some systems may 1655 wish to display signatures in a smaller font or 1656 otherwise set them apart from the main text of the 1657 message. 1658 Comment -- causes the subsequent text to be interpreted 1659 as a comment, and hence not shown to the reader. 1660 No-op -- has no effect on the subsequent text. 1661 lt -- <lt> is replaced by a literal "<" character. No 1662 balancing </lt> is allowed. 1663 nl -- <nl> causes a line break. No balancing </nl> is 1664 allowed. 1665 np -- <np> causes a page break. No balancing </np> is 1666 allowed. 1667 1668 Each positive formatting command affects all subsequent text 1669 until the matching negative formatting command. Such pairs 1670 of formatting commands must be properly balanced and nested. 1671 Thus, a proper way to describe text in bold italics is: 1672 1673 <bold><italic>the-text</italic></bold> 1674 1675 or, alternately, 1676 1677 <italic><bold>the-text</bold></italic> 1678 1679 but, in particular, the following is illegal 1680 richtext: 1681 1682 <bold><italic>the-text</bold></italic> 1683 1684 NOTE: The nesting requirement for formatting commands 1685 imposes a slightly higher burden upon the composers of 1686 1687 1688 1689 Borenstein & Freed [Page 25] 1690 1691 1692 1693 1694 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1695 1696 1697 richtext bodies, but potentially simplifies richtext 1698 displayers by allowing them to be stack-based. The main 1699 goal of richtext is to be simple enough to make multifont, 1700 formatted email widely readable, so that those with the 1701 capability of sending it will be able to do so with 1702 confidence. Thus slightly increased complexity in the 1703 composing software was deemed a reasonable tradeoff for 1704 simplified reading software. Nonetheless, implementors of 1705 richtext readers are encouraged to follow the general 1706 Internet guidelines of being conservative in what you send 1707 and liberal in what you accept. Those implementations that 1708 can do so are encouraged to deal reasonably with improperly 1709 nested richtext. 1710 1711 Implementations must regard any unrecognized formatting 1712 command as equivalent to "No-op", thus facilitating future 1713 extensions to "richtext". Private extensions may be defined 1714 using formatting commands that begin with "X-", by analogy 1715 to Internet mail header field names. 1716 1717 It is worth noting that no special behavior is required for 1718 the TAB (HT) character. It is recommended, however, that, at 1719 least when fixed-width fonts are in use, the common 1720 semantics of the TAB (HT) character should be observed, 1721 namely that it moves to the next column position that is a 1722 multiple of 8. (In other words, if a TAB (HT) occurs in 1723 column n, where the leftmost column is column 0, then that 1724 TAB (HT) should be replaced by 8-(n mod 8) SPACE 1725 characters.) 1726 1727 Richtext also differentiates between "hard" and "soft" line 1728 breaks. A line break (CRLF) in the richtext data stream is 1729 interpreted as a "soft" line break, one that is included 1730 only for purposes of mail transport, and is to be treated as 1731 white space by richtext interpreters. To include a "hard" 1732 line break (one that must be displayed as such), the "<nl>" 1733 or "<paragraph> formatting constructs should be used. In 1734 general, a soft line break should be treated as white space, 1735 but when soft line breaks immediately follow a <nl> or a 1736 </paragraph> tag they should be ignored rather than treated 1737 as white space. 1738 1739 Putting all this together, the following "text/richtext" 1740 body fragment: 1741 1742 <bold>Now</bold> is the time for 1743 <italic>all</italic> good men 1744 <smaller>(and <lt>women>)</smaller> to 1745 <ignoreme></ignoreme> come 1746 1747 to the aid of their 1748 <nl> 1749 1750 1751 1752 1753 1754 Borenstein & Freed [Page 26] 1755 1756 1757 1758 1759 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1760 1761 1762 beloved <nl><nl>country. <comment> Stupid 1763 quote! </comment> -- the end 1764 1765 represents the following formatted text (which will, no 1766 doubt, look cryptic in the text-only version of this 1767 document): 1768 1769 Now is the time for all good men (and <women>) to 1770 come to the aid of their 1771 beloved 1772 1773 country. -- the end 1774 1775 Richtext conformance: A minimal richtext implementation is 1776 one that simply converts "<lt>" to "<", converts CRLFs to 1777 SPACE, converts <nl> to a newline according to local newline 1778 convention, removes everything between a <comment> command 1779 and the next balancing </comment> command, and removes all 1780 other formatting commands (all text enclosed in angle 1781 brackets). 1782 1783 NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML: Richtext is 1784 decidedly not SGML, and must not be used to transport 1785 arbitrary SGML documents. Those who wish to use SGML 1786 document types as a mail transport format must define a new 1787 text or application subtype, e.g., "text/sgml-dtd-whatever" 1788 or "application/sgml-dtd-whatever", depending on the 1789 perceived readability of the DTD in use. Richtext is 1790 designed to be compatible with SGML, and specifically so 1791 that it will be possible to define a richtext DTD if one is 1792 needed. However, this does not imply that arbitrary SGML 1793 can be called richtext, nor that richtext implementors have 1794 any need to understand SGML; the description in this 1795 document is a complete definition of richtext, which is far 1796 simpler than complete SGML. 1797 1798 NOTE ON THE INTENDED USE OF RICHTEXT: It is recognized that 1799 implementors of future mail systems will want rich text 1800 functionality far beyond that currently defined for 1801 richtext. The intent of richtext is to provide a common 1802 format for expressing that functionality in a form in which 1803 much of it, at least, will be understood by interoperating 1804 software. Thus, in particular, software with a richer 1805 notion of formatted text than richtext can still use 1806 richtext as its basic representation, but can extend it with 1807 new formatting commands and by hiding information specific 1808 to that software system in richtext comments. As such 1809 systems evolve, it is expected that the definition of 1810 richtext will be further refined by future published 1811 specifications, but richtext as defined here provides a 1812 platform on which evolutionary refinements can be based. 1813 1814 IMPLEMENTATION NOTE: In some environments, it might be 1815 impossible to combine certain richtext formatting commands, 1816 1817 1818 1819 Borenstein & Freed [Page 27] 1820 1821 1822 1823 1824 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1825 1826 1827 whereas in others they might be combined easily. For 1828 example, the combination of <bold> and <italic> might 1829 produce bold italics on systems that support such fonts, but 1830 there exist systems that can make text bold or italicized, 1831 but not both. In such cases, the most recently issued 1832 recognized formatting command should be preferred. 1833 1834 One of the major goals in the design of richtext was to make 1835 it so simple that even text-only mailers will implement 1836 richtext-to-plain-text translators, thus increasing the 1837 likelihood that multifont text will become "safe" to use 1838 very widely. To demonstrate this simplicity, an extremely 1839 simple 35-line C program that converts richtext input into 1840 plain text output is included in Appendix D. 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 Borenstein & Freed [Page 28] 1885 1886 1887 1888 1889 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1890 1891 1892 7.2 The Multipart Content-Type 1893 1894 In the case of multiple part messages, in which one or more 1895 different sets of data are combined in a single body, a 1896 "multipart" Content-Type field must appear in the entity's 1897 header. The body must then contain one or more "body parts," 1898 each preceded by an encapsulation boundary, and the last one 1899 followed by a closing boundary. Each part starts with an 1900 encapsulation boundary, and then contains a body part 1901 consisting of header area, a blank line, and a body area. 1902 Thus a body part is similar to an RFC 822 message in syntax, 1903 but different in meaning. 1904 1905 A body part is NOT to be interpreted as actually being an 1906 RFC 822 message. To begin with, NO header fields are 1907 actually required in body parts. A body part that starts 1908 with a blank line, therefore, is allowed and is a body part 1909 for which all default values are to be assumed. In such a 1910 case, the absence of a Content-Type header field implies 1911 that the encapsulation is plain US-ASCII text. The only 1912 header fields that have defined meaning for body parts are 1913 those the names of which begin with "Content-". All other 1914 header fields are generally to be ignored in body parts. 1915 Although they should generally be retained in mail 1916 processing, they may be discarded by gateways if necessary. 1917 Such other fields are permitted to appear in body parts but 1918 should not be depended on. "X-" fields may be created for 1919 experimental or private purposes, with the recognition that 1920 the information they contain may be lost at some gateways. 1921 1922 The distinction between an RFC 822 message and a body part 1923 is subtle, but important. A gateway between Internet and 1924 X.400 mail, for example, must be able to tell the difference 1925 between a body part that contains an image and a body part 1926 that contains an encapsulated message, the body of which is 1927 an image. In order to represent the latter, the body part 1928 must have "Content-Type: message", and its body (after the 1929 blank line) must be the encapsulated message, with its own 1930 "Content-Type: image" header field. The use of similar 1931 syntax facilitates the conversion of messages to body parts, 1932 and vice versa, but the distinction between the two must be 1933 understood by implementors. (For the special case in which 1934 all parts actually are messages, a "digest" subtype is also 1935 defined.) 1936 1937 As stated previously, each body part is preceded by an 1938 encapsulation boundary. The encapsulation boundary MUST NOT 1939 appear inside any of the encapsulated parts. Thus, it is 1940 crucial that the composing agent be able to choose and 1941 specify the unique boundary that will separate the parts. 1942 1943 All present and future subtypes of the "multipart" type must 1944 use an identical syntax. Subtypes may differ in their 1945 semantics, and may impose additional restrictions on syntax, 1946 1947 1948 1949 Borenstein & Freed [Page 29] 1950 1951 1952 1953 1954 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 1955 1956 1957 but must conform to the required syntax for the multipart 1958 type. This requirement ensures that all conformant user 1959 agents will at least be able to recognize and separate the 1960 parts of any multipart entity, even of an unrecognized 1961 subtype. 1962 1963 As stated in the definition of the Content-Transfer-Encoding 1964 field, no encoding other than "7bit", "8bit", or "binary" is 1965 permitted for entities of type "multipart". The multipart 1966 delimiters and header fields are always 7-bit ASCII in any 1967 case, and data within the body parts can be encoded on a 1968 part-by-part basis, with Content-Transfer-Encoding fields 1969 for each appropriate body part. 1970 1971 Mail gateways, relays, and other mail handling agents are 1972 commonly known to alter the top-level header of an RFC 822 1973 message. In particular, they frequently add, remove, or 1974 reorder header fields. Such alterations are explicitly 1975 forbidden for the body part headers embedded in the bodies 1976 of messages of type "multipart." 1977 1978 7.2.1 Multipart: The common syntax 1979 1980 All subtypes of "multipart" share a common syntax, defined 1981 in this section. A simple example of a multipart message 1982 also appears in this section. An example of a more complex 1983 multipart message is given in Appendix C. 1984 1985 The Content-Type field for multipart entities requires one 1986 parameter, "boundary", which is used to specify the 1987 encapsulation boundary. The encapsulation boundary is 1988 defined as a line consisting entirely of two hyphen 1989 characters ("-", decimal code 45) followed by the boundary 1990 parameter value from the Content-Type header field. 1991 1992 NOTE: The hyphens are for rough compatibility with the 1993 earlier RFC 934 method of message encapsulation, and for 1994 ease of searching for the boundaries in some 1995 implementations. However, it should be noted that multipart 1996 messages are NOT completely compatible with RFC 934 1997 encapsulations; in particular, they do not obey RFC 934 1998 quoting conventions for embedded lines that begin with 1999 hyphens. This mechanism was chosen over the RFC 934 2000 mechanism because the latter causes lines to grow with each 2001 level of quoting. The combination of this growth with the 2002 fact that SMTP implementations sometimes wrap long lines 2003 made the RFC 934 mechanism unsuitable for use in the event 2004 that deeply-nested multipart structuring is ever desired. 2005 2006 Thus, a typical multipart Content-Type header field might 2007 look like this: 2008 2009 Content-Type: multipart/mixed; 2010 2011 2012 2013 2014 Borenstein & Freed [Page 30] 2015 2016 2017 2018 2019 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2020 2021 2022 boundary=gc0p4Jq0M2Yt08jU534c0p 2023 2024 This indicates that the entity consists of several parts, 2025 each itself with a structure that is syntactically identical 2026 to an RFC 822 message, except that the header area might be 2027 completely empty, and that the parts are each preceded by 2028 the line 2029 2030 --gc0p4Jq0M2Yt08jU534c0p 2031 2032 Note that the encapsulation boundary must occur at the 2033 beginning of a line, i.e., following a CRLF, and that that 2034 initial CRLF is considered to be part of the encapsulation 2035 boundary rather than part of the preceding part. The 2036 boundary must be followed immediately either by another CRLF 2037 and the header fields for the next part, or by two CRLFs, in 2038 which case there are no header fields for the next part (and 2039 it is therefore assumed to be of Content-Type text/plain). 2040 2041 NOTE: The CRLF preceding the encapsulation line is 2042 considered part of the boundary so that it is possible to 2043 have a part that does not end with a CRLF (line break). 2044 Body parts that must be considered to end with line breaks, 2045 therefore, should have two CRLFs preceding the encapsulation 2046 line, the first of which is part of the preceding body part, 2047 and the second of which is part of the encapsulation 2048 boundary. 2049 2050 The requirement that the encapsulation boundary begins with 2051 a CRLF implies that the body of a multipart entity must 2052 itself begin with a CRLF before the first encapsulation line 2053 -- that is, if the "preamble" area is not used, the entity 2054 headers must be followed by TWO CRLFs. This is indeed how 2055 such entities should be composed. A tolerant mail reading 2056 program, however, may interpret a body of type multipart 2057 that begins with an encapsulation line NOT initiated by a 2058 CRLF as also being an encapsulation boundary, but a 2059 compliant mail sending program must not generate such 2060 entities. 2061 2062 Encapsulation boundaries must not appear within the 2063 encapsulations, and must be no longer than 70 characters, 2064 not counting the two leading hyphens. 2065 2066 The encapsulation boundary following the last body part is a 2067 distinguished delimiter that indicates that no further body 2068 parts will follow. Such a delimiter is identical to the 2069 previous delimiters, with the addition of two more hyphens 2070 at the end of the line: 2071 2072 --gc0p4Jq0M2Yt08jU534c0p-- 2073 2074 There appears to be room for additional information prior to 2075 the first encapsulation boundary and following the final 2076 2077 2078 2079 Borenstein & Freed [Page 31] 2080 2081 2082 2083 2084 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2085 2086 2087 boundary. These areas should generally be left blank, and 2088 implementations should ignore anything that appears before 2089 the first boundary or after the last one. 2090 2091 NOTE: These "preamble" and "epilogue" areas are not used 2092 because of the lack of proper typing of these parts and the 2093 lack of clear semantics for handling these areas at 2094 gateways, particularly X.400 gateways. 2095 2096 NOTE: Because encapsulation boundaries must not appear in 2097 the body parts being encapsulated, a user agent must 2098 exercise care to choose a unique boundary. The boundary in 2099 the example above could have been the result of an algorithm 2100 designed to produce boundaries with a very low probability 2101 of already existing in the data to be encapsulated without 2102 having to prescan the data. Alternate algorithms might 2103 result in more 'readable' boundaries for a recipient with an 2104 old user agent, but would require more attention to the 2105 possibility that the boundary might appear in the 2106 encapsulated part. The simplest boundary possible is 2107 something like "---", with a closing boundary of "-----". 2108 2109 As a very simple example, the following multipart message 2110 has two parts, both of them plain text, one of them 2111 explicitly typed and one of them implicitly typed: 2112 2113 From: Nathaniel Borenstein <nsb@bellcore.com> 2114 To: Ned Freed <ned@innosoft.com> 2115 Subject: Sample message 2116 MIME-Version: 1.0 2117 Content-type: multipart/mixed; boundary="simple 2118 boundary" 2119 2120 This is the preamble. It is to be ignored, though it 2121 is a handy place for mail composers to include an 2122 explanatory note to non-MIME compliant readers. 2123 --simple boundary 2124 2125 This is implicitly typed plain ASCII text. 2126 It does NOT end with a linebreak. 2127 --simple boundary 2128 Content-type: text/plain; charset=us-ascii 2129 2130 This is explicitly typed plain ASCII text. 2131 It DOES end with a linebreak. 2132 2133 --simple boundary-- 2134 This is the epilogue. It is also to be ignored. 2135 2136 The use of a Content-Type of multipart in a body part within 2137 another multipart entity is explicitly allowed. In such 2138 cases, for obvious reasons, care must be taken to ensure 2139 that each nested multipart entity must use a different 2140 boundary delimiter. See Appendix C for an example of nested 2141 2142 2143 2144 Borenstein & Freed [Page 32] 2145 2146 2147 2148 2149 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2150 2151 2152 multipart entities. 2153 2154 The use of the multipart Content-Type with only a single 2155 body part may be useful in certain contexts, and is 2156 explicitly permitted. 2157 2158 The only mandatory parameter for the multipart Content-Type 2159 is the boundary parameter, which consists of 1 to 70 2160 characters from a set of characters known to be very robust 2161 through email gateways, and NOT ending with white space. 2162 (If a boundary appears to end with white space, the white 2163 space must be presumed to have been added by a gateway, and 2164 should be deleted.) It is formally specified by the 2165 following BNF: 2166 2167 boundary := 0*69<bchars> bcharsnospace 2168 2169 bchars := bcharsnospace / " " 2170 2171 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / 2172 "_" 2173 / "," / "-" / "." / "/" / ":" / "=" / "?" 2174 2175 Overall, the body of a multipart entity may be specified as 2176 follows: 2177 2178 multipart-body := preamble 1*encapsulation 2179 close-delimiter epilogue 2180 2181 encapsulation := delimiter CRLF body-part 2182 2183 delimiter := CRLF "--" boundary ; taken from Content-Type 2184 field. 2185 ; when content-type is 2186 multipart 2187 ; There must be no space 2188 ; between "--" and boundary. 2189 2190 close-delimiter := delimiter "--" ; Again, no space before 2191 "--" 2192 2193 preamble := *text ; to be ignored upon 2194 receipt. 2195 2196 epilogue := *text ; to be ignored upon 2197 receipt. 2198 2199 body-part = <"message" as defined in RFC 822, 2200 with all header fields optional, and with the 2201 specified delimiter not occurring anywhere in 2202 the message body, either on a line by itself 2203 or as a substring anywhere. Note that the 2204 2205 2206 2207 2208 2209 Borenstein & Freed [Page 33] 2210 2211 2212 2213 2214 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2215 2216 2217 semantics of a part differ from the semantics 2218 of a message, as described in the text.> 2219 2220 NOTE: Conspicuously missing from the multipart type is a 2221 notion of structured, related body parts. In general, it 2222 seems premature to try to standardize interpart structure 2223 yet. It is recommended that those wishing to provide a more 2224 structured or integrated multipart messaging facility should 2225 define a subtype of multipart that is syntactically 2226 identical, but that always expects the inclusion of a 2227 distinguished part that can be used to specify the structure 2228 and integration of the other parts, probably referring to 2229 them by their Content-ID field. If this approach is used, 2230 other implementations will not recognize the new subtype, 2231 but will treat it as the primary subtype (multipart/mixed) 2232 and will thus be able to show the user the parts that are 2233 recognized. 2234 2235 7.2.2 The Multipart/mixed (primary) subtype 2236 2237 The primary subtype for multipart, "mixed", is intended for 2238 use when the body parts are independent and intended to be 2239 displayed serially. Any multipart subtypes that an 2240 implementation does not recognize should be treated as being 2241 of subtype "mixed". 2242 2243 7.2.3 The Multipart/alternative subtype 2244 2245 The multipart/alternative type is syntactically identical to 2246 multipart/mixed, but the semantics are different. In 2247 particular, each of the parts is an "alternative" version of 2248 the same information. User agents should recognize that the 2249 content of the various parts are interchangeable. The user 2250 agent should either choose the "best" type based on the 2251 user's environment and preferences, or offer the user the 2252 available alternatives. In general, choosing the best type 2253 means displaying only the LAST part that can be displayed. 2254 This may be used, for example, to send mail in a fancy text 2255 format in such a way that it can easily be displayed 2256 anywhere: 2257 2258 From: Nathaniel Borenstein <nsb@bellcore.com> 2259 To: Ned Freed <ned@innosoft.com> 2260 Subject: Formatted text mail 2261 MIME-Version: 1.0 2262 Content-Type: multipart/alternative; boundary=boundary42 2263 2264 2265 --boundary42 2266 Content-Type: text/plain; charset=us-ascii 2267 2268 ...plain text version of message goes here.... 2269 2270 2271 2272 2273 2274 Borenstein & Freed [Page 34] 2275 2276 2277 2278 2279 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2280 2281 2282 --boundary42 2283 Content-Type: text/richtext 2284 2285 .... richtext version of same message goes here ... 2286 --boundary42 2287 Content-Type: text/x-whatever 2288 2289 .... fanciest formatted version of same message goes here 2290 ... 2291 --boundary42-- 2292 2293 In this example, users whose mail system understood the 2294 "text/x-whatever" format would see only the fancy version, 2295 while other users would see only the richtext or plain text 2296 version, depending on the capabilities of their system. 2297 2298 In general, user agents that compose multipart/alternative 2299 entities should place the body parts in increasing order of 2300 preference, that is, with the preferred format last. For 2301 fancy text, the sending user agent should put the plainest 2302 format first and the richest format last. Receiving user 2303 agents should pick and display the last format they are 2304 capable of displaying. In the case where one of the 2305 alternatives is itself of type "multipart" and contains 2306 unrecognized sub-parts, the user agent may choose either to 2307 show that alternative, an earlier alternative, or both. 2308 2309 NOTE: From an implementor's perspective, it might seem more 2310 sensible to reverse this ordering, and have the plainest 2311 alternative last. However, placing the plainest alternative 2312 first is the friendliest possible option when 2313 mutlipart/alternative entities are viewed using a non-MIME- 2314 compliant mail reader. While this approach does impose some 2315 burden on compliant mail readers, interoperability with 2316 older mail readers was deemed to be more important in this 2317 case. 2318 2319 It may be the case that some user agents, if they can 2320 recognize more than one of the formats, will prefer to offer 2321 the user the choice of which format to view. This makes 2322 sense, for example, if mail includes both a nicely-formatted 2323 image version and an easily-edited text version. What is 2324 most critical, however, is that the user not automatically 2325 be shown multiple versions of the same data. Either the 2326 user should be shown the last recognized version or should 2327 explicitly be given the choice. 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 Borenstein & Freed [Page 35] 2340 2341 2342 2343 2344 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2345 2346 2347 7.2.4 The Multipart/digest subtype 2348 2349 This document defines a "digest" subtype of the multipart 2350 Content-Type. This type is syntactically identical to 2351 multipart/mixed, but the semantics are different. In 2352 particular, in a digest, the default Content-Type value for 2353 a body part is changed from "text/plain" to 2354 "message/rfc822". This is done to allow a more readable 2355 digest format that is largely compatible (except for the 2356 quoting convention) with RFC 934. 2357 2358 A digest in this format might, then, look something like 2359 this: 2360 2361 From: Moderator-Address 2362 MIME-Version: 1.0 2363 Subject: Internet Digest, volume 42 2364 Content-Type: multipart/digest; 2365 boundary="---- next message ----" 2366 2367 2368 ------ next message ---- 2369 2370 From: someone-else 2371 Subject: my opinion 2372 2373 ...body goes here ... 2374 2375 ------ next message ---- 2376 2377 From: someone-else-again 2378 Subject: my different opinion 2379 2380 ... another body goes here... 2381 2382 ------ next message ------ 2383 2384 7.2.5 The Multipart/parallel subtype 2385 2386 This document defines a "parallel" subtype of the multipart 2387 Content-Type. This type is syntactically identical to 2388 multipart/mixed, but the semantics are different. In 2389 particular, in a parallel entity, all of the parts are 2390 intended to be presented in parallel, i.e., simultaneously, 2391 on hardware and software that are capable of doing so. 2392 Composing agents should be aware that many mail readers will 2393 lack this capability and will show the parts serially in any 2394 event. 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 Borenstein & Freed [Page 36] 2405 2406 2407 2408 2409 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2410 2411 2412 7.3 The Message Content-Type 2413 2414 It is frequently desirable, in sending mail, to encapsulate 2415 another mail message. For this common operation, a special 2416 Content-Type, "message", is defined. The primary subtype, 2417 message/rfc822, has no required parameters in the Content- 2418 Type field. Additional subtypes, "partial" and "External- 2419 body", do have required parameters. These subtypes are 2420 explained below. 2421 2422 NOTE: It has been suggested that subtypes of message might 2423 be defined for forwarded or rejected messages. However, 2424 forwarded and rejected messages can be handled as multipart 2425 messages in which the first part contains any control or 2426 descriptive information, and a second part, of type 2427 message/rfc822, is the forwarded or rejected message. 2428 Composing rejection and forwarding messages in this manner 2429 will preserve the type information on the original message 2430 and allow it to be correctly presented to the recipient, and 2431 hence is strongly encouraged. 2432 2433 As stated in the definition of the Content-Transfer-Encoding 2434 field, no encoding other than "7bit", "8bit", or "binary" is 2435 permitted for messages or parts of type "message". The 2436 message header fields are always US-ASCII in any case, and 2437 data within the body can still be encoded, in which case the 2438 Content-Transfer-Encoding header field in the encapsulated 2439 message will reflect this. Non-ASCII text in the headers of 2440 an encapsulated message can be specified using the 2441 mechanisms described in [RFC-1342]. 2442 2443 Mail gateways, relays, and other mail handling agents are 2444 commonly known to alter the top-level header of an RFC 822 2445 message. In particular, they frequently add, remove, or 2446 reorder header fields. Such alterations are explicitly 2447 forbidden for the encapsulated headers embedded in the 2448 bodies of messages of type "message." 2449 2450 7.3.1 The Message/rfc822 (primary) subtype 2451 2452 A Content-Type of "message/rfc822" indicates that the body 2453 contains an encapsulated message, with the syntax of an RFC 2454 822 message. 2455 2456 7.3.2 The Message/Partial subtype 2457 2458 A subtype of message, "partial", is defined in order to 2459 allow large objects to be delivered as several separate 2460 pieces of mail and automatically reassembled by the 2461 receiving user agent. (The concept is similar to IP 2462 fragmentation/reassembly in the basic Internet Protocols.) 2463 This mechanism can be used when intermediate transport 2464 agents limit the size of individual messages that can be 2465 sent. Content-Type "message/partial" thus indicates that 2466 2467 2468 2469 Borenstein & Freed [Page 37] 2470 2471 2472 2473 2474 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2475 2476 2477 the body contains a fragment of a larger message. 2478 2479 Three parameters must be specified in the Content-Type field 2480 of type message/partial: The first, "id", is a unique 2481 identifier, as close to a world-unique identifier as 2482 possible, to be used to match the parts together. (In 2483 general, the identifier is essentially a message-id; if 2484 placed in double quotes, it can be any message-id, in 2485 accordance with the BNF for "parameter" given earlier in 2486 this specification.) The second, "number", an integer, is 2487 the part number, which indicates where this part fits into 2488 the sequence of fragments. The third, "total", another 2489 integer, is the total number of parts. This third subfield 2490 is required on the final part, and is optional on the 2491 earlier parts. Note also that these parameters may be given 2492 in any order. 2493 2494 Thus, part 2 of a 3-part message may have either of the 2495 following header fields: 2496 2497 Content-Type: Message/Partial; 2498 number=2; total=3; 2499 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 2500 2501 Content-Type: Message/Partial; 2502 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 2503 number=2 2504 2505 But part 3 MUST specify the total number of parts: 2506 2507 Content-Type: Message/Partial; 2508 number=3; total=3; 2509 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 2510 2511 Note that part numbering begins with 1, not 0. 2512 2513 When the parts of a message broken up in this manner are put 2514 together, the result is a complete RFC 822 format message, 2515 which may have its own Content-Type header field, and thus 2516 may contain any other data type. 2517 2518 Message fragmentation and reassembly: The semantics of a 2519 reassembled partial message must be those of the "inner" 2520 message, rather than of a message containing the inner 2521 message. This makes it possible, for example, to send a 2522 large audio message as several partial messages, and still 2523 have it appear to the recipient as a simple audio message 2524 rather than as an encapsulated message containing an audio 2525 message. That is, the encapsulation of the message is 2526 considered to be "transparent". 2527 2528 When generating and reassembling the parts of a 2529 message/partial message, the headers of the encapsulated 2530 message must be merged with the headers of the enclosing 2531 2532 2533 2534 Borenstein & Freed [Page 38] 2535 2536 2537 2538 2539 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2540 2541 2542 entities. In this process the following rules must be 2543 observed: 2544 2545 (1) All of the headers from the initial enclosing 2546 entity (part one), except those that start with 2547 "Content-" and "Message-ID", must be copied, in 2548 order, to the new message. 2549 2550 (2) Only those headers in the enclosed message 2551 which start with "Content-" and "Message-ID" must 2552 be appended, in order, to the headers of the new 2553 message. Any headers in the enclosed message 2554 which do not start with "Content-" (except for 2555 "Message-ID") will be ignored. 2556 2557 (3) All of the headers from the second and any 2558 subsequent messages will be ignored. 2559 2560 For example, if an audio message is broken into two parts, 2561 the first part might look something like this: 2562 2563 X-Weird-Header-1: Foo 2564 From: Bill@host.com 2565 To: joe@otherhost.com 2566 Subject: Audio mail 2567 Message-ID: id1@host.com 2568 MIME-Version: 1.0 2569 Content-type: message/partial; 2570 id="ABC@host.com"; 2571 number=1; total=2 2572 2573 X-Weird-Header-1: Bar 2574 X-Weird-Header-2: Hello 2575 Message-ID: anotherid@foo.com 2576 Content-type: audio/basic 2577 Content-transfer-encoding: base64 2578 2579 ... first half of encoded audio data goes here... 2580 2581 and the second half might look something like this: 2582 2583 From: Bill@host.com 2584 To: joe@otherhost.com 2585 Subject: Audio mail 2586 MIME-Version: 1.0 2587 Message-ID: id2@host.com 2588 Content-type: message/partial; 2589 id="ABC@host.com"; number=2; total=2 2590 2591 ... second half of encoded audio data goes here... 2592 2593 Then, when the fragmented message is reassembled, the 2594 resulting message to be displayed to the user should look 2595 something like this: 2596 2597 2598 2599 Borenstein & Freed [Page 39] 2600 2601 2602 2603 2604 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2605 2606 2607 X-Weird-Header-1: Foo 2608 From: Bill@host.com 2609 To: joe@otherhost.com 2610 Subject: Audio mail 2611 Message-ID: anotherid@foo.com 2612 MIME-Version: 1.0 2613 Content-type: audio/basic 2614 Content-transfer-encoding: base64 2615 2616 ... first half of encoded audio data goes here... 2617 ... second half of encoded audio data goes here... 2618 2619 It should be noted that, because some message transfer 2620 agents may choose to automatically fragment large messages, 2621 and because such agents may use different fragmentation 2622 thresholds, it is possible that the pieces of a partial 2623 message, upon reassembly, may prove themselves to comprise a 2624 partial message. This is explicitly permitted. 2625 2626 It should also be noted that the inclusion of a "References" 2627 field in the headers of the second and subsequent pieces of 2628 a fragmented message that references the Message-Id on the 2629 previous piece may be of benefit to mail readers that 2630 understand and track references. However, the generation of 2631 such "References" fields is entirely optional. 2632 2633 7.3.3 The Message/External-Body subtype 2634 2635 The external-body subtype indicates that the actual body 2636 data are not included, but merely referenced. In this case, 2637 the parameters describe a mechanism for accessing the 2638 external data. 2639 2640 When a message body or body part is of type 2641 "message/external-body", it consists of a header, two 2642 consecutive CRLFs, and the message header for the 2643 encapsulated message. If another pair of consecutive CRLFs 2644 appears, this of course ends the message header for the 2645 encapsulated message. However, since the encapsulated 2646 message's body is itself external, it does NOT appear in the 2647 area that follows. For example, consider the following 2648 message: 2649 2650 Content-type: message/external-body; access- 2651 type=local-file; 2652 name=/u/nsb/Me.gif 2653 2654 Content-type: image/gif 2655 2656 THIS IS NOT REALLY THE BODY! 2657 2658 The area at the end, which might be called the "phantom 2659 body", is ignored for most external-body messages. However, 2660 it may be used to contain auxilliary information for some 2661 2662 2663 2664 Borenstein & Freed [Page 40] 2665 2666 2667 2668 2669 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2670 2671 2672 such messages, as indeed it is when the access-type is 2673 "mail-server". Of the access-types defined by this 2674 document, the phantom body is used only when the access-type 2675 is "mail-server". In all other cases, the phantom body is 2676 ignored. 2677 2678 The only always-mandatory parameter for message/external- 2679 body is "access-type"; all of the other parameters may be 2680 mandatory or optional depending on the value of access-type. 2681 2682 ACCESS-TYPE -- One or more case-insensitive words, 2683 comma-separated, indicating supported access 2684 mechanisms by which the file or data may be 2685 obtained. Values include, but are not limited to, 2686 "FTP", "ANON-FTP", "TFTP", "AFS", "LOCAL-FILE", 2687 and "MAIL-SERVER". Future values, except for 2688 experimental values beginning with "X-", must be 2689 registered with IANA, as described in Appendix F . 2690 2691 In addition, the following two parameters are optional for 2692 ALL access-types: 2693 2694 EXPIRATION -- The date (in the RFC 822 "date-time" 2695 syntax, as extended by RFC 1123 to permit 4 digits 2696 in the date field) after which the existence of 2697 the external data is not guaranteed. 2698 2699 SIZE -- The size (in octets) of the data. The 2700 intent of this parameter is to help the recipient 2701 decide whether or not to expend the necessary 2702 resources to retrieve the external data. 2703 2704 PERMISSION -- A field that indicates whether or 2705 not it is expected that clients might also attempt 2706 to overwrite the data. By default, or if 2707 permission is "read", the assumption is that they 2708 are not, and that if the data is retrieved once, 2709 it is never needed again. If PERMISSION is "read- 2710 write", this assumption is invalid, and any local 2711 copy must be considered no more than a cache. 2712 "Read" and "Read-write" are the only defined 2713 values of permission. 2714 2715 The precise semantics of the access-types defined here are 2716 described in the sections that follow. 2717 2718 7.3.3.1 The "ftp" and "tftp" access-types 2719 2720 An access-type of FTP or TFTP indicates that the message 2721 body is accessible as a file using the FTP [RFC-959] or TFTP 2722 [RFC-783] protocols, respectively. For these access-types, 2723 the following additional parameters are mandatory: 2724 2725 2726 2727 2728 2729 Borenstein & Freed [Page 41] 2730 2731 2732 2733 2734 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2735 2736 2737 NAME -- The name of the file that contains the 2738 actual body data. 2739 2740 SITE -- A machine from which the file may be 2741 obtained, using the given protocol 2742 2743 Before the data is retrieved, using these protocols, the 2744 user will generally need to be asked to provide a login id 2745 and a password for the machine named by the site parameter. 2746 2747 In addition, the following optional parameters may also 2748 appear when the access-type is FTP or ANON-FTP: 2749 2750 DIRECTORY -- A directory from which the data named 2751 by NAME should be retrieved. 2752 2753 MODE -- A transfer mode for retrieving the 2754 information, e.g. "image". 2755 2756 7.3.3.2 The "anon-ftp" access-type 2757 2758 The "anon-ftp" access-type is identical to the "ftp" access 2759 type, except that the user need not be asked to provide a 2760 name and password for the specified site. Instead, the ftp 2761 protocol will be used with login "anonymous" and a password 2762 that corresponds to the user's email address. 2763 2764 7.3.3.3 The "local-file" and "afs" access-types 2765 2766 An access-type of "local-file" indicates that the actual 2767 body is accessible as a file on the local machine. An 2768 access-type of "afs" indicates that the file is accessible 2769 via the global AFS file system. In both cases, only a 2770 single parameter is required: 2771 2772 NAME -- The name of the file that contains the 2773 actual body data. 2774 2775 The following optional parameter may be used to describe the 2776 locality of reference for the data, that is, the site or 2777 sites at which the file is expected to be visible: 2778 2779 SITE -- A domain specifier for a machine or set of 2780 machines that are known to have access to the data 2781 file. Asterisks may be used for wildcard matching 2782 to a part of a domain name, such as 2783 "*.bellcore.com", to indicate a set of machines on 2784 which the data should be directly visible, while a 2785 single asterisk may be used to indicate a file 2786 that is expected to be universally available, 2787 e.g., via a global file system. 2788 2789 7.3.3.4 The "mail-server" access-type 2790 2791 2792 2793 2794 Borenstein & Freed [Page 42] 2795 2796 2797 2798 2799 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2800 2801 2802 The "mail-server" access-type indicates that the actual body 2803 is available from a mail server. The mandatory parameter 2804 for this access-type is: 2805 2806 SERVER -- The email address of the mail server 2807 from which the actual body data can be obtained. 2808 2809 Because mail servers accept a variety of syntax, some of 2810 which is multiline, the full command to be sent to a mail 2811 server is not included as a parameter on the content-type 2812 line. Instead, it may be provided as the "phantom body" 2813 when the content-type is message/external-body and the 2814 access-type is mail-server. 2815 2816 Note that MIME does not define a mail server syntax. 2817 Rather, it allows the inclusion of arbitrary mail server 2818 commands in the phantom body. Implementations should 2819 include the phantom body in the body of the message it sends 2820 to the mail server address to retrieve the relevant data. 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 Borenstein & Freed [Page 43] 2860 2861 2862 2863 2864 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2865 2866 2867 7.3.3.5 Examples and Further Explanations 2868 2869 With the emerging possibility of very wide-area file 2870 systems, it becomes very hard to know in advance the set of 2871 machines where a file will and will not be accessible 2872 directly from the file system. Therefore it may make sense 2873 to provide both a file name, to be tried directly, and the 2874 name of one or more sites from which the file is known to be 2875 accessible. An implementation can try to retrieve remote 2876 files using FTP or any other protocol, using anonymous file 2877 retrieval or prompting the user for the necessary name and 2878 password. If an external body is accessible via multiple 2879 mechanisms, the sender may include multiple parts of type 2880 message/external-body within an entity of type 2881 multipart/alternative. 2882 2883 However, the external-body mechanism is not intended to be 2884 limited to file retrieval, as shown by the mail-server 2885 access-type. Beyond this, one can imagine, for example, 2886 using a video server for external references to video clips. 2887 2888 If an entity is of type "message/external-body", then the 2889 body of the entity will contain the header fields of the 2890 encapsulated message. The body itself is to be found in the 2891 external location. This means that if the body of the 2892 "message/external-body" message contains two consecutive 2893 CRLFs, everything after those pairs is NOT part of the 2894 message itself. For most message/external-body messages, 2895 this trailing area must simply be ignored. However, it is a 2896 convenient place for additional data that cannot be included 2897 in the content-type header field. In particular, if the 2898 "access-type" value is "mail-server", then the trailing area 2899 must contain commands to be sent to the mail server at the 2900 address given by NAME@SITE, where NAME and SITE are the 2901 values of the NAME and SITE parameters, respectively. 2902 2903 The embedded message header fields which appear in the body 2904 of the message/external-body data can be used to declare the 2905 Content-type of the external body. Thus a complete 2906 message/external-body message, referring to a document in 2907 PostScript format, might look like this: 2908 2909 From: Whomever 2910 Subject: whatever 2911 MIME-Version: 1.0 2912 Message-ID: id1@host.com 2913 Content-Type: multipart/alternative; boundary=42 2914 2915 2916 --42 2917 Content-Type: message/external-body; 2918 name="BodyFormats.ps"; 2919 2920 2921 2922 2923 2924 Borenstein & Freed [Page 44] 2925 2926 2927 2928 2929 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2930 2931 2932 site="thumper.bellcore.com"; 2933 access-type=ANON-FTP; 2934 directory="pub"; 2935 mode="image"; 2936 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2937 2938 Content-type: application/postscript 2939 2940 --42 2941 Content-Type: message/external-body; 2942 name="/u/nsb/writing/rfcs/RFC-XXXX.ps"; 2943 site="thumper.bellcore.com"; 2944 access-type=AFS 2945 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2946 2947 Content-type: application/postscript 2948 2949 --42 2950 Content-Type: message/external-body; 2951 access-type=mail-server 2952 server="listserv@bogus.bitnet"; 2953 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2954 2955 Content-type: application/postscript 2956 2957 get rfc-xxxx doc 2958 2959 --42-- 2960 2961 Like the message/partial type, the message/external-body 2962 type is intended to be transparent, that is, to convey the 2963 data type in the external body rather than to convey a 2964 message with a body of that type. Thus the headers on the 2965 outer and inner parts must be merged using the same rules as 2966 for message/partial. In particular, this means that the 2967 Content-type header is overridden, but the From and Subject 2968 headers are preserved. 2969 2970 Note that since the external bodies are not transported as 2971 mail, they need not conform to the 7-bit and line length 2972 requirements, but might in fact be binary files. Thus a 2973 Content-Transfer-Encoding is not generally necessary, though 2974 it is permitted. 2975 2976 Note that the body of a message of type "message/external- 2977 body" is governed by the basic syntax for an RFC 822 2978 message. In particular, anything before the first 2979 consecutive pair of CRLFs is header information, while 2980 anything after it is body information, which is ignored for 2981 most access-types. 2982 2983 2984 2985 2986 2987 2988 2989 Borenstein & Freed [Page 45] 2990 2991 2992 2993 2994 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 2995 2996 2997 7.4 The Application Content-Type 2998 2999 The "application" Content-Type is to be used for data which 3000 do not fit in any of the other categories, and particularly 3001 for data to be processed by mail-based uses of application 3002 programs. This is information which must be processed by an 3003 application before it is viewable or usable to a user. 3004 Expected uses for Content-Type application include mail- 3005 based file transfer, spreadsheets, data for mail-based 3006 scheduling systems, and languages for "active" 3007 (computational) email. (The latter, in particular, can pose 3008 security problems which should be understood by 3009 implementors, and are considered in detail in the discussion 3010 of the application/PostScript content-type.) 3011 3012 For example, a meeting scheduler might define a standard 3013 representation for information about proposed meeting dates. 3014 An intelligent user agent would use this information to 3015 conduct a dialog with the user, and might then send further 3016 mail based on that dialog. More generally, there have been 3017 several "active" messaging languages developed in which 3018 programs in a suitably specialized language are sent through 3019 the mail and automatically run in the recipient's 3020 environment. 3021 3022 Such applications may be defined as subtypes of the 3023 "application" Content-Type. This document defines three 3024 subtypes: octet-stream, ODA, and PostScript. 3025 3026 In general, the subtype of application will often be the 3027 name of the application for which the data are intended. 3028 This does not mean, however, that any application program 3029 name may be used freely as a subtype of application. Such 3030 usages must be registered with IANA, as described in 3031 Appendix F. 3032 3033 7.4.1 The Application/Octet-Stream (primary) subtype 3034 3035 The primary subtype of application, "octet-stream", may be 3036 used to indicate that a body contains binary data. The set 3037 of possible parameters includes, but is not limited to: 3038 3039 NAME -- a suggested name for the binary data if 3040 stored as a file. 3041 3042 TYPE -- the general type or category of binary 3043 data. This is intended as information for the 3044 human recipient rather than for any automatic 3045 processing. 3046 3047 CONVERSIONS -- the set of operations that have 3048 been performed on the data before putting it in 3049 the mail (and before any Content-Transfer-Encoding 3050 that might have been applied). If multiple 3051 3052 3053 3054 Borenstein & Freed [Page 46] 3055 3056 3057 3058 3059 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3060 3061 3062 conversions have occurred, they must be separated 3063 by commas and specified in the order they were 3064 applied -- that is, the leftmost conversion must 3065 have occurred first, and conversions are undone 3066 from right to left. Note that NO conversion 3067 values are defined by this document. Any 3068 conversion values that that do not begin with "X-" 3069 must be preceded by a published specification and 3070 by registration with IANA, as described in 3071 Appendix F. 3072 3073 PADDING -- the number of bits of padding that were 3074 appended to the bitstream comprising the actual 3075 contents to produce the enclosed byte-oriented 3076 data. This is useful for enclosing a bitstream in 3077 a body when the total number of bits is not a 3078 multiple of the byte size. 3079 3080 The values for these attributes are left undefined at 3081 present, but may require specification in the future. An 3082 example of a common (though UNIX-specific) usage might be: 3083 3084 Content-Type: application/octet-stream; 3085 name=foo.tar.Z; type=tar; 3086 conversions="x-encrypt,x-compress" 3087 3088 However, it should be noted that the use of such conversions 3089 is explicitly discouraged due to a lack of portability and 3090 standardization. The use of uuencode is particularly 3091 discouraged, in favor of the Content-Transfer-Encoding 3092 mechanism, which is both more standardized and more portable 3093 across mail boundaries. 3094 3095 The recommended action for an implementation that receives 3096 application/octet-stream mail is to simply offer to put the 3097 data in a file, with any Content-Transfer-Encoding undone, 3098 or perhaps to use it as input to a user-specified process. 3099 3100 To reduce the danger of transmitting rogue programs through 3101 the mail, it is strongly recommended that implementations 3102 NOT implement a path-search mechanism whereby an arbitrary 3103 program named in the Content-Type parameter (e.g., an 3104 "interpreter=" parameter) is found and executed using the 3105 mail body as input. 3106 3107 7.4.2 The Application/PostScript subtype 3108 3109 A Content-Type of "application/postscript" indicates a 3110 PostScript program. The language is defined in 3111 [POSTSCRIPT]. It is recommended that Postscript as sent 3112 through email should use Postscript document structuring 3113 conventions if at all possible, and correctly. 3114 3115 3116 3117 3118 3119 Borenstein & Freed [Page 47] 3120 3121 3122 3123 3124 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3125 3126 3127 The execution of general-purpose PostScript interpreters 3128 entails serious security risks, and implementors are 3129 discouraged from simply sending PostScript email bodies to 3130 "off-the-shelf" interpreters. While it is usually safe to 3131 send PostScript to a printer, where the potential for harm 3132 is greatly constrained, implementors should consider all of 3133 the following before they add interactive display of 3134 PostScript bodies to their mail readers. 3135 3136 The remainder of this section outlines some, though probably 3137 not all, of the possible problems with sending PostScript 3138 through the mail. 3139 3140 Dangerous operations in the PostScript language include, but 3141 may not be limited to, the PostScript operators deletefile, 3142 renamefile, filenameforall, and file. File is only 3143 dangerous when applied to something other than standard 3144 input or output. Implementations may also define additional 3145 nonstandard file operators; these may also pose a threat to 3146 security. Filenameforall, the wildcard file search 3147 operator, may appear at first glance to be harmless. Note, 3148 however, that this operator has the potential to reveal 3149 information about what files the recipient has access to, 3150 and this information may itself be sensitive. Message 3151 senders should avoid the use of potentially dangerous file 3152 operators, since these operators are quite likely to be 3153 unavailable in secure PostScript implementations. Message- 3154 receiving and -displaying software should either completely 3155 disable all potentially dangerous file operators or take 3156 special care not to delegate any special authority to their 3157 operation. These operators should be viewed as being done by 3158 an outside agency when interpreting PostScript documents. 3159 Such disabling and/or checking should be done completely 3160 outside of the reach of the PostScript language itself; care 3161 should be taken to insure that no method exists for 3162 reenabling full-function versions of these operators. 3163 3164 The PostScript language provides facilities for exiting the 3165 normal interpreter, or server, loop. Changes made in this 3166 "outer" environment are customarily retained across 3167 documents, and may in some cases be retained semipermanently 3168 in nonvolatile memory. The operators associated with exiting 3169 the interpreter loop have the potential to interfere with 3170 subsequent document processing. As such, their unrestrained 3171 use constitutes a threat of service denial. PostScript 3172 operators that exit the interpreter loop include, but may 3173 not be limited to, the exitserver and startjob operators. 3174 Message-sending software should not generate PostScript that 3175 depends on exiting the interpreter loop to operate. The 3176 ability to exit will probably be unavailable in secure 3177 PostScript implementations. Message-receiving and 3178 -displaying software should, if possible, disable the 3179 ability to make retained changes to the PostScript 3180 environment. Eliminate the startjob and exitserver commands. 3181 3182 3183 3184 Borenstein & Freed [Page 48] 3185 3186 3187 3188 3189 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3190 3191 3192 If these commands cannot be eliminated, at least set the 3193 password associated with them to a hard-to-guess value. 3194 3195 PostScript provides operators for setting system-wide and 3196 device-specific parameters. These parameter settings may be 3197 retained across jobs and may potentially pose a threat to 3198 the correct operation of the interpreter. The PostScript 3199 operators that set system and device parameters include, but 3200 may not be limited to, the setsystemparams and setdevparams 3201 operators. Message-sending software should not generate 3202 PostScript that depends on the setting of system or device 3203 parameters to operate correctly. The ability to set these 3204 parameters will probably be unavailable in secure PostScript 3205 implementations. Message-receiving and -displaying software 3206 should, if possible, disable the ability to change system 3207 and device parameters. If these operators cannot be 3208 disabled, at least set the password associated with them to 3209 a hard-to-guess value. 3210 3211 Some PostScript implementations provide nonstandard 3212 facilities for the direct loading and execution of machine 3213 code. Such facilities are quite obviously open to 3214 substantial abuse. Message-sending software should not 3215 make use of such features. Besides being totally hardware- 3216 specific, they are also likely to be unavailable in secure 3217 implementations of PostScript. Message-receiving and 3218 -displaying software should not allow such operators to be 3219 used if they exist. 3220 3221 PostScript is an extensible language, and many, if not most, 3222 implementations of it provide a number of their own 3223 extensions. This document does not deal with such extensions 3224 explicitly since they constitute an unknown factor. 3225 Message-sending software should not make use of nonstandard 3226 extensions; they are likely to be missing from some 3227 implementations. Message-receiving and -displaying software 3228 should make sure that any nonstandard PostScript operators 3229 are secure and don't present any kind of threat. 3230 3231 It is possible to write PostScript that consumes huge 3232 amounts of various system resources. It is also possible to 3233 write PostScript programs that loop infinitely. Both types 3234 of programs have the potential to cause damage if sent to 3235 unsuspecting recipients. Message-sending software should 3236 avoid the construction and dissemination of such programs, 3237 which is antisocial. Message-receiving and -displaying 3238 software should provide appropriate mechanisms to abort 3239 processing of a document after a reasonable amount of time 3240 has elapsed. In addition, PostScript interpreters should be 3241 limited to the consumption of only a reasonable amount of 3242 any given system resource. 3243 3244 Finally, bugs may exist in some PostScript interpreters 3245 which could possibly be exploited to gain unauthorized 3246 3247 3248 3249 Borenstein & Freed [Page 49] 3250 3251 3252 3253 3254 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3255 3256 3257 access to a recipient's system. Apart from noting this 3258 possibility, there is no specific action to take to prevent 3259 this, apart from the timely correction of such bugs if any 3260 are found. 3261 3262 7.4.3 The Application/ODA subtype 3263 3264 The "ODA" subtype of application is used to indicate that a 3265 body contains information encoded according to the Office 3266 Document Architecture [ODA] standards, using the ODIF 3267 representation format. For application/oda, the Content- 3268 Type line should also specify an attribute/value pair that 3269 indicates the document application profile (DAP), using the 3270 key word "profile". Thus an appropriate header field might 3271 look like this: 3272 3273 Content-Type: application/oda; profile=Q112 3274 3275 Consult the ODA standard [ODA] for further information. 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 Borenstein & Freed [Page 50] 3315 3316 3317 3318 3319 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3320 3321 3322 7.5 The Image Content-Type 3323 3324 A Content-Type of "image" indicates that the bodycontains an 3325 image. The subtype names the specific image format. These 3326 names are case insensitive. Two initial subtypes are "jpeg" 3327 for the JPEG format, JFIF encoding, and "gif" for GIF format 3328 [GIF]. 3329 3330 The list of image subtypes given here is neither exclusive 3331 nor exhaustive, and is expected to grow as more types are 3332 registered with IANA, as described in Appendix F. 3333 3334 7.6 The Audio Content-Type 3335 3336 A Content-Type of "audio" indicates that the body contains 3337 audio data. Although there is not yet a consensus on an 3338 "ideal" audio format for use with computers, there is a 3339 pressing need for a format capable of providing 3340 interoperable behavior. 3341 3342 The initial subtype of "basic" is specified to meet this 3343 requirement by providing an absolutely minimal lowest common 3344 denominator audio format. It is expected that richer 3345 formats for higher quality and/or lower bandwidth audio will 3346 be defined by a later document. 3347 3348 The content of the "audio/basic" subtype is audio encoded 3349 using 8-bit ISDN u-law [PCM]. When this subtype is present, 3350 a sample rate of 8000 Hz and a single channel is assumed. 3351 3352 7.7 The Video Content-Type 3353 3354 A Content-Type of "video" indicates that the body contains a 3355 time-varying-picture image, possibly with color and 3356 coordinated sound. The term "video" is used extremely 3357 generically, rather than with reference to any particular 3358 technology or format, and is not meant to preclude subtypes 3359 such as animated drawings encoded compactly. The subtype 3360 "mpeg" refers to video coded according to the MPEG standard 3361 [MPEG]. 3362 3363 Note that although in general this document strongly 3364 discourages the mixing of multiple media in a single body, 3365 it is recognized that many so-called "video" formats include 3366 a representation for synchronized audio, and this is 3367 explicitly permitted for subtypes of "video". 3368 3369 7.8 Experimental Content-Type Values 3370 3371 A Content-Type value beginning with the characters "X-" is a 3372 private value, to be used by consenting mail systems by 3373 mutual agreement. Any format without a rigorous and public 3374 definition must be named with an "X-" prefix, and publicly 3375 specified values shall never begin with "X-". (Older 3376 3377 3378 3379 Borenstein & Freed [Page 51] 3380 3381 3382 3383 3384 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3385 3386 3387 versions of the widely-used Andrew system use the "X-BE2" 3388 name, so new systems should probably choose a different 3389 name.) 3390 3391 In general, the use of "X-" top-level types is strongly 3392 discouraged. Implementors should invent subtypes of the 3393 existing types whenever possible. The invention of new 3394 types is intended to be restricted primarily to the 3395 development of new media types for email, such as digital 3396 odors or holography, and not for new data formats in 3397 general. In many cases, a subtype of application will be 3398 more appropriate than a new top-level type. 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 Borenstein & Freed [Page 52] 3445 3446 3447 3448 3449 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3450 3451 3452 Summary 3453 3454 Using the MIME-Version, Content-Type, and Content-Transfer- 3455 Encoding header fields, it is possible to include, in a 3456 standardized way, arbitrary types of data objects with RFC 3457 822 conformant mail messages. No restrictions imposed by 3458 either RFC 821 or RFC 822 are violated, and care has been 3459 taken to avoid problems caused by additional restrictions 3460 imposed by the characteristics of some Internet mail 3461 transport mechanisms (see Appendix B). The "multipart" and 3462 "message" Content-Types allow mixing and hierarchical 3463 structuring of objects of different types in a single 3464 message. Further Content-Types provide a standardized 3465 mechanism for tagging messages or body parts as audio, 3466 image, or several other kinds of data. A distinguished 3467 parameter syntax allows further specification of data format 3468 details, particularly the specification of alternate 3469 character sets. Additional optional header fields provide 3470 mechanisms for certain extensions deemed desirable by many 3471 implementors. Finally, a number of useful Content-Types are 3472 defined for general use by consenting user agents, notably 3473 text/richtext, message/partial, and message/external-body. 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 Borenstein & Freed [Page 53] 3510 3511 3512 3513 3514 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3515 3516 3517 Acknowledgements 3518 3519 This document is the result of the collective effort of a 3520 large number of people, at several IETF meetings, on the 3521 IETF-SMTP and IETF-822 mailing lists, and elsewhere. 3522 Although any enumeration seems doomed to suffer from 3523 egregious omissions, the following are among the many 3524 contributors to this effort: 3525 3526 Harald Tveit Alvestrand Timo Lehtinen 3527 Randall Atkinson John R. MacMillan 3528 Philippe Brandon Rick McGowan 3529 Kevin Carosso Leo Mclaughlin 3530 Uhhyung Choi Goli Montaser-Kohsari 3531 Cristian Constantinof Keith Moore 3532 Mark Crispin Tom Moore 3533 Dave Crocker Erik Naggum 3534 Terry Crowley Mark Needleman 3535 Walt Daniels John Noerenberg 3536 Frank Dawson Mats Ohrman 3537 Hitoshi Doi Julian Onions 3538 Kevin Donnelly Michael Patton 3539 Keith Edwards David J. Pepper 3540 Chris Eich Blake C. Ramsdell 3541 Johnny Eriksson Luc Rooijakkers 3542 Craig Everhart Marshall T. Rose 3543 Patrik Faeltstroem Jonathan Rosenberg 3544 Erik E. Fair Jan Rynning 3545 Roger Fajman Harri Salminen 3546 Alain Fontaine Michael Sanderson 3547 James M. Galvin Masahiro Sekiguchi 3548 Philip Gladstone Mark Sherman 3549 Thomas Gordon Keld Simonsen 3550 Phill Gross Bob Smart 3551 James Hamilton Peter Speck 3552 Steve Hardcastle-Kille Henry Spencer 3553 David Herron Einar Stefferud 3554 Bruce Howard Michael Stein 3555 Bill Janssen Klaus Steinberger 3556 Olle Jaernefors Peter Svanberg 3557 Risto Kankkunen James Thompson 3558 Phil Karn Steve Uhler 3559 Alan Katz Stuart Vance 3560 Tim Kehres Erik van der Poel 3561 Neil Katin Guido van Rossum 3562 Kyuho Kim Peter Vanderbilt 3563 Anders Klemets Greg Vaudreuil 3564 John Klensin Ed Vielmetti 3565 Valdis Kletniek Ryan Waldron 3566 Jim Knowles Wally Wedel 3567 Stev Knowles Sven-Ove Westberg 3568 Bob Kummerfeld Brian Wideen 3569 3570 3571 3572 3573 3574 Borenstein & Freed [Page 54] 3575 3576 3577 3578 3579 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3580 3581 3582 Pekka Kytolaakso John Wobus 3583 Stellan Lagerstr.m Glenn Wright 3584 Vincent Lau Rayan Zachariassen 3585 Donald Lindsay David Zimmerman 3586 The authors apologize for any omissions from this list, 3587 which are certainly unintentional. 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 Borenstein & Freed [Page 55] 3640 3641 3642 3643 3644 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3645 3646 3647 Appendix A -- Minimal MIME-Conformance 3648 3649 The mechanisms described in this document are open-ended. 3650 It is definitely not expected that all implementations will 3651 support all of the Content-Types described, nor that they 3652 will all share the same extensions. In order to promote 3653 interoperability, however, it is useful to define the 3654 concept of "MIME-conformance" to define a certain level of 3655 implementation that allows the useful interworking of 3656 messages with content that differs from US ASCII text. In 3657 this section, we specify the requirements for such 3658 conformance. 3659 3660 A mail user agent that is MIME-conformant MUST: 3661 3662 1. Always generate a "MIME-Version: 1.0" header 3663 field. 3664 3665 2. Recognize the Content-Transfer-Encoding header 3666 field, and decode all received data encoded with 3667 either the quoted-printable or base64 3668 implementations. Encode any data sent that is 3669 not in seven-bit mail-ready representation using 3670 one of these transformations and include the 3671 appropriate Content-Transfer-Encoding header 3672 field, unless the underlying transport mechanism 3673 supports non-seven-bit data, as SMTP does not. 3674 3675 3. Recognize and interpret the Content-Type 3676 header field, and avoid showing users raw data 3677 with a Content-Type field other than text. Be 3678 able to send at least text/plain messages, with 3679 the character set specified as a parameter if it 3680 is not US-ASCII. 3681 3682 4. Explicitly handle the following Content-Type 3683 values, to at least the following extents: 3684 3685 Text: 3686 -- Recognize and display "text" mail 3687 with the character set "US-ASCII." 3688 -- Recognize other character sets at 3689 least to the extent of being able 3690 to inform the user about what 3691 character set the message uses. 3692 -- Recognize the "ISO-8859-*" character 3693 sets to the extent of being able to 3694 display those characters that are 3695 common to ISO-8859-* and US-ASCII, 3696 namely all characters represented 3697 by octet values 0-127. 3698 -- For unrecognized subtypes, show or 3699 offer to show the user the "raw" 3700 version of the data. An ability at 3701 3702 3703 3704 Borenstein & Freed [Page 56] 3705 3706 3707 3708 3709 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3710 3711 3712 least to convert "text/richtext" to 3713 plain text, as shown in Appendix D, 3714 is encouraged, but not required for 3715 conformance. 3716 Message: 3717 --Recognize and display at least the 3718 primary (822) encapsulation. 3719 Multipart: 3720 -- Recognize the primary (mixed) 3721 subtype. Display all relevant 3722 information on the message level 3723 and the body part header level and 3724 then display or offer to display 3725 each of the body parts 3726 individually. 3727 -- Recognize the "alternative" subtype, 3728 and avoid showing the user 3729 redundant parts of 3730 multipart/alternative mail. 3731 -- Treat any unrecognized subtypes as if 3732 they were "mixed". 3733 Application: 3734 -- Offer the ability to remove either of 3735 the two types of Content-Transfer- 3736 Encoding defined in this document 3737 and put the resulting information 3738 in a user file. 3739 3740 5. Upon encountering any unrecognized Content- 3741 Type, an implementation must treat it as if it had 3742 a Content-Type of "application/octet-stream" with 3743 no parameter sub-arguments. How such data are 3744 handled is up to an implementation, but likely 3745 options for handling such unrecognized data 3746 include offering the user to write it into a file 3747 (decoded from its mail transport format) or 3748 offering the user to name a program to which the 3749 decoded data should be passed as input. 3750 Unrecognized predefined types, which in a MIME- 3751 conformant mailer might still include audio, 3752 image, or video, should also be treated in this 3753 way. 3754 3755 A user agent that meets the above conditions is said to be 3756 MIME-conformant. The meaning of this phrase is that it is 3757 assumed to be "safe" to send virtually any kind of 3758 properly-marked data to users of such mail systems, because 3759 such systems will at least be able to treat the data as 3760 undifferentiated binary, and will not simply splash it onto 3761 the screen of unsuspecting users. There is another sense 3762 in which it is always "safe" to send data in a format that 3763 is MIME-conformant, which is that such data will not break 3764 or be broken by any known systems that are conformant with 3765 RFC 821 and RFC 822. User agents that are MIME-conformant 3766 3767 3768 3769 Borenstein & Freed [Page 57] 3770 3771 3772 3773 3774 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3775 3776 3777 have the additional guarantee that the user will not be 3778 shown data that were never intended to be viewed as text. 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 Borenstein & Freed [Page 58] 3835 3836 3837 3838 3839 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3840 3841 3842 Appendix B -- General Guidelines For Sending Email Data 3843 3844 Internet email is not a perfect, homogeneous system. Mail 3845 may become corrupted at several stages in its travel to a 3846 final destination. Specifically, email sent throughout the 3847 Internet may travel across many networking technologies. 3848 Many networking and mail technologies do not support the 3849 full functionality possible in the SMTP transport 3850 environment. Mail traversing these systems is likely to be 3851 modified in such a way that it can be transported. 3852 3853 There exist many widely-deployed non-conformant MTAs in the 3854 Internet. These MTAs, speaking the SMTP protocol, alter 3855 messages on the fly to take advantage of the internal data 3856 structure of the hosts they are implemented on, or are just 3857 plain broken. 3858 3859 The following guidelines may be useful to anyone devising a 3860 data format (Content-Type) that will survive the widest 3861 range of networking technologies and known broken MTAs 3862 unscathed. Note that anything encoded in the base64 3863 encoding will satisfy these rules, but that some well-known 3864 mechanisms, notably the UNIX uuencode facility, will not. 3865 Note also that anything encoded in the Quoted-Printable 3866 encoding will survive most gateways intact, but possibly not 3867 some gateways to systems that use the EBCDIC character set. 3868 3869 (1) Under some circumstances the encoding used for 3870 data may change as part of normal gateway or user 3871 agent operation. In particular, conversion from 3872 base64 to quoted-printable and vice versa may be 3873 necessary. This may result in the confusion of 3874 CRLF sequences with line breaks in text body 3875 parts. As such, the persistence of CRLF as 3876 something other than a line break should not be 3877 relied on. 3878 3879 (2) Many systems may elect to represent and store 3880 text data using local newline conventions. Local 3881 newline conventions may not match the RFC822 CRLF 3882 convention -- systems are known that use plain CR, 3883 plain LF, CRLF, or counted records. The result is 3884 that isolated CR and LF characters are not well 3885 tolerated in general; they may be lost or 3886 converted to delimiters on some systems, and hence 3887 should not be relied on. 3888 3889 (3) TAB (HT) characters may be misinterpreted or 3890 may be automatically converted to variable numbers 3891 of spaces. This is unavoidable in some 3892 environments, notably those not based on the ASCII 3893 character set. Such conversion is STRONGLY 3894 DISCOURAGED, but it may occur, and mail formats 3895 should not rely on the persistence of TAB (HT) 3896 3897 3898 3899 Borenstein & Freed [Page 59] 3900 3901 3902 3903 3904 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3905 3906 3907 characters. 3908 3909 (4) Lines longer than 76 characters may be wrapped 3910 or truncated in some environments. Line wrapping 3911 and line truncation are STRONGLY DISCOURAGED, but 3912 unavoidable in some cases. Applications which 3913 require long lines should somehow differentiate 3914 between soft and hard line breaks. (A simple way 3915 to do this is to use the quoted-printable 3916 encoding.) 3917 3918 (5) Trailing "white space" characters (SPACE, TAB 3919 (HT)) on a line may be discarded by some transport 3920 agents, while other transport agents may pad lines 3921 with these characters so that all lines in a mail 3922 file are of equal length. The persistence of 3923 trailing white space, therefore, should not be 3924 relied on. 3925 3926 (6) Many mail domains use variations on the ASCII 3927 character set, or use character sets such as 3928 EBCDIC which contain most but not all of the US- 3929 ASCII characters. The correct translation of 3930 characters not in the "invariant" set cannot be 3931 depended on across character converting gateways. 3932 For example, this situation is a problem when 3933 sending uuencoded information across BITNET, an 3934 EBCDIC system. Similar problems can occur without 3935 crossing a gateway, since many Internet hosts use 3936 character sets other than ASCII internally. The 3937 definition of Printable Strings in X.400 adds 3938 further restrictions in certain special cases. In 3939 particular, the only characters that are known to 3940 be consistent across all gateways are the 73 3941 characters that correspond to the upper and lower 3942 case letters A-Z and a-z, the 10 digits 0-9, and 3943 the following eleven special characters: 3944 3945 "'" (ASCII code 39) 3946 "(" (ASCII code 40) 3947 ")" (ASCII code 41) 3948 "+" (ASCII code 43) 3949 "," (ASCII code 44) 3950 "-" (ASCII code 45) 3951 "." (ASCII code 46) 3952 "/" (ASCII code 47) 3953 ":" (ASCII code 58) 3954 "=" (ASCII code 61) 3955 "?" (ASCII code 63) 3956 3957 A maximally portable mail representation, such as 3958 the base64 encoding, will confine itself to 3959 relatively short lines of text in which the only 3960 meaningful characters are taken from this set of 3961 3962 3963 3964 Borenstein & Freed [Page 60] 3965 3966 3967 3968 3969 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 3970 3971 3972 73 characters. 3973 3974 Please note that the above list is NOT a list of recommended 3975 practices for MTAs. RFC 821 MTAs are prohibited from 3976 altering the character of white space or wrapping long 3977 lines. These BAD and illegal practices are known to occur 3978 on established networks, and implementions should be robust 3979 in dealing with the bad effects they can cause. 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 Borenstein & Freed [Page 61] 4030 4031 4032 4033 4034 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4035 4036 4037 Appendix C -- A Complex Multipart Example 4038 4039 What follows is the outline of a complex multipart message. 4040 This message has five parts to be displayed serially: two 4041 introductory plain text parts, an embedded multipart 4042 message, a richtext part, and a closing encapsulated text 4043 message in a non-ASCII character set. The embedded 4044 multipart message has two parts to be displayed in parallel, 4045 a picture and an audio fragment. 4046 4047 MIME-Version: 1.0 4048 From: Nathaniel Borenstein <nsb@bellcore.com> 4049 Subject: A multipart example 4050 Content-Type: multipart/mixed; 4051 boundary=unique-boundary-1 4052 4053 This is the preamble area of a multipart message. 4054 Mail readers that understand multipart format 4055 should ignore this preamble. 4056 If you are reading this text, you might want to 4057 consider changing to a mail reader that understands 4058 how to properly display multipart messages. 4059 --unique-boundary-1 4060 4061 ...Some text appears here... 4062 [Note that the preceding blank line means 4063 no header fields were given and this is text, 4064 with charset US ASCII. It could have been 4065 done with explicit typing as in the next part.] 4066 4067 --unique-boundary-1 4068 Content-type: text/plain; charset=US-ASCII 4069 4070 This could have been part of the previous part, 4071 but illustrates explicit versus implicit 4072 typing of body parts. 4073 4074 --unique-boundary-1 4075 Content-Type: multipart/parallel; 4076 boundary=unique-boundary-2 4077 4078 4079 --unique-boundary-2 4080 Content-Type: audio/basic 4081 Content-Transfer-Encoding: base64 4082 4083 ... base64-encoded 8000 Hz single-channel 4084 u-law-format audio data goes here.... 4085 4086 --unique-boundary-2 4087 Content-Type: image/gif 4088 Content-Transfer-Encoding: Base64 4089 4090 4091 4092 4093 4094 Borenstein & Freed [Page 62] 4095 4096 4097 4098 4099 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4100 4101 4102 ... base64-encoded image data goes here.... 4103 4104 --unique-boundary-2-- 4105 4106 --unique-boundary-1 4107 Content-type: text/richtext 4108 4109 This is <bold><italic>richtext.</italic></bold> 4110 <nl><nl>Isn't it 4111 <bigger><bigger>cool?</bigger></bigger> 4112 4113 --unique-boundary-1 4114 Content-Type: message/rfc822 4115 4116 From: (name in US-ASCII) 4117 Subject: (subject in US-ASCII) 4118 Content-Type: Text/plain; charset=ISO-8859-1 4119 Content-Transfer-Encoding: Quoted-printable 4120 4121 ... Additional text in ISO-8859-1 goes here ... 4122 4123 --unique-boundary-1-- 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 Borenstein & Freed [Page 63] 4160 4161 4162 4163 4164 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4165 4166 4167 Appendix D -- A Simple Richtext-to-Text Translator in C 4168 4169 One of the major goals in the design of the richtext subtype 4170 of the text Content-Type is to make formatted text so simple 4171 that even text-only mailers will implement richtext-to- 4172 plain-text translators, thus increasing the likelihood that 4173 multifont text will become "safe" to use very widely. To 4174 demonstrate this simplicity, what follows is an extremely 4175 simple 44-line C program that converts richtext input into 4176 plain text output: 4177 4178 #include <stdio.h> 4179 #include <ctype.h> 4180 main() { 4181 int c, i; 4182 char token[50]; 4183 4184 while((c = getc(stdin)) != EOF) { 4185 if (c == '<') { 4186 for (i=0; (i<49 && (c = getc(stdin)) != '>' 4187 && c != EOF); ++i) { 4188 token[i] = isupper(c) ? tolower(c) : c; 4189 } 4190 if (c == EOF) break; 4191 if (c != '>') while ((c = getc(stdin)) != 4192 '>' 4193 && c != EOF) {;} 4194 if (c == EOF) break; 4195 token[i] = '\0'; 4196 if (!strcmp(token, "lt")) { 4197 putc('<', stdout); 4198 } else if (!strcmp(token, "nl")) { 4199 putc('\n', stdout); 4200 } else if (!strcmp(token, "/paragraph")) { 4201 fputs("\n\n", stdout); 4202 } else if (!strcmp(token, "comment")) { 4203 int commct=1; 4204 while (commct > 0) { 4205 while ((c = getc(stdin)) != '<' 4206 && c != EOF) ; 4207 if (c == EOF) break; 4208 for (i=0; (c = getc(stdin)) != '>' 4209 && c != EOF; ++i) { 4210 token[i] = isupper(c) ? 4211 tolower(c) : c; 4212 } 4213 if (c== EOF) break; 4214 token[i] = NULL; 4215 if (!strcmp(token, "/comment")) -- 4216 commct; 4217 if (!strcmp(token, "comment")) 4218 ++commct; 4219 4220 4221 4222 4223 4224 Borenstein & Freed [Page 64] 4225 4226 4227 4228 4229 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4230 4231 4232 } 4233 } /* Ignore all other tokens */ 4234 } else if (c != '\n') putc(c, stdout); 4235 } 4236 putc('\n', stdout); /* for good measure */ 4237 } 4238 It should be noted that one can do considerably better than 4239 this in displaying richtext data on a dumb terminal. In 4240 particular, one can replace font information such as "bold" 4241 with textual emphasis (like *this* or _T_H_I_S_). One can 4242 also properly handle the richtext formatting commands 4243 regarding indentation, justification, and others. However, 4244 the above program is all that is necessary in order to 4245 present richtext on a dumb terminal. 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 Borenstein & Freed [Page 65] 4290 4291 4292 4293 4294 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4295 4296 4297 Appendix E -- Collected Grammar 4298 4299 This appendix contains the complete BNF grammar for all the 4300 syntax specified by this document. 4301 4302 By itself, however, this grammar is incomplete. It refers 4303 to several entities that are defined by RFC 822. Rather 4304 than reproduce those definitions here, and risk 4305 unintentional differences between the two, this document 4306 simply refers the reader to RFC 822 for the remaining 4307 definitions. Wherever a term is undefined, it refers to the 4308 RFC 822 definition. 4309 4310 attribute := token 4311 4312 body-part = <"message" as defined in RFC 822, 4313 with all header fields optional, and with the 4314 specified delimiter not occurring anywhere in 4315 the message body, either on a line by itself 4316 or as a substring anywhere.> 4317 4318 boundary := 0*69<bchars> bcharsnospace 4319 4320 bchars := bcharsnospace / " " 4321 4322 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / 4323 "_" 4324 / "," / "-" / "." / "/" / ":" / "=" / "?" 4325 4326 close-delimiter := delimiter "--" 4327 4328 Content-Description := *text 4329 4330 Content-ID := msg-id 4331 4332 Content-Transfer-Encoding := "BASE64" / "QUOTED- 4333 PRINTABLE" / 4334 "8BIT" / "7BIT" / 4335 "BINARY" / x-token 4336 4337 Content-Type := type "/" subtype *[";" parameter] 4338 4339 delimiter := CRLF "--" boundary ; taken from Content-Type 4340 field. 4341 ; when content-type is 4342 multipart 4343 ; There should be no space 4344 ; between "--" and boundary. 4345 4346 encapsulation := delimiter CRLF body-part 4347 4348 epilogue := *text ; to be ignored upon 4349 receipt. 4350 4351 4352 4353 4354 Borenstein & Freed [Page 66] 4355 4356 4357 4358 4359 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4360 4361 4362 MIME-Version := 1*text 4363 4364 multipart-body := preamble 1*encapsulation close-delimiter 4365 epilogue 4366 4367 parameter := attribute "=" value 4368 4369 preamble := *text ; to be ignored upon 4370 receipt. 4371 4372 subtype := token 4373 4374 token := 1*<any CHAR except SPACE, CTLs, or tspecials> 4375 4376 tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in 4377 / "," / ";" / ":" / "\" / <"> ; quoted-string, 4378 / "/" / "[" / "]" / "?" / "." ; to use within 4379 / "=" ; parameter values 4380 4381 4382 type := "application" / "audio" ; case- 4383 insensitive 4384 / "image" / "message" 4385 / "multipart" / "text" 4386 / "video" / x-token 4387 4388 value := token / quoted-string 4389 4390 x-token := <The two characters "X-" followed, with no 4391 intervening white space, by any token> 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 Borenstein & Freed [Page 67] 4420 4421 4422 4423 4424 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4425 4426 4427 Appendix F -- IANA Registration Procedures 4428 4429 MIME has been carefully designed to have extensible 4430 mechanisms, and it is expected that the set of content- 4431 type/subtype pairs and their associated parameters will grow 4432 significantly with time. Several other MIME fields, notably 4433 character set names, access-type parameters for the 4434 message/external-body type, conversions parameters for the 4435 application type, and possibly even Content-Transfer- 4436 Encoding values, are likely to have new values defined over 4437 time. In order to ensure that the set of such values is 4438 developed in an orderly, well-specified, and public manner, 4439 MIME defines a registration process which uses the Internet 4440 Assigned Numbers Authority (IANA) as a central registry for 4441 such values. 4442 4443 In general, parameters in the content-type header field are 4444 used to convey supplemental information for various content 4445 types, and their use is defined when the content-type and 4446 subtype are defined. New parameters should not be defined 4447 as a way to introduce new functionality. 4448 4449 In order to simplify and standardize the registration 4450 process, this appendix gives templates for the registration 4451 of new values with IANA. Each of these is given in the form 4452 of an email message template, to be filled in by the 4453 registering party. 4454 4455 F.1 Registration of New Content-type/subtype Values 4456 4457 Note that MIME is generally expected to be extended by 4458 subtypes. If a new fundamental top-level type is needed, 4459 its specification should be published as an RFC or 4460 submitted in a form suitable to become an RFC, and be 4461 subject to the Internet standards process. 4462 4463 To: IANA@isi.edu 4464 Subject: Registration of new MIME content-type/subtype 4465 4466 MIME type name: 4467 4468 (If the above is not an existing top-level MIME type, 4469 please explain why an existing type cannot be used.) 4470 4471 MIME subtype name: 4472 4473 Required parameters: 4474 4475 Optional parameters: 4476 4477 Encoding considerations: 4478 4479 Security considerations: 4480 4481 4482 4483 4484 Borenstein & Freed [Page 68] 4485 4486 4487 4488 4489 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4490 4491 4492 Published specification: 4493 4494 (The published specification must be an Internet RFC or 4495 RFC-to-be if a new top-level type is being defined, and 4496 must be a publicly available specification in any 4497 case.) 4498 4499 Person & email address to contact for further 4500 information: 4501 F.2 Registration of New Character Set Values 4502 4503 To: IANA@isi.edu 4504 Subject: Registration of new MIME character set value 4505 4506 MIME character set name: 4507 4508 Published specification: 4509 4510 (The published specification must be an Internet RFC or 4511 RFC-to-be or an international standard.) 4512 4513 Person & email address to contact for further 4514 information: 4515 4516 F.3 Registration of New Access-type Values for 4517 Message/external-body 4518 4519 To: IANA@isi.edu 4520 Subject: Registration of new MIME Access-type for 4521 Message/external-body content-type 4522 4523 MIME access-type name: 4524 4525 Required parameters: 4526 4527 Optional parameters: 4528 4529 Published specification: 4530 4531 (The published specification must be an Internet RFC or 4532 RFC-to-be.) 4533 4534 Person & email address to contact for further 4535 information: 4536 4537 4538 F.4 Registration of New Conversions Values for Application 4539 4540 To: IANA@isi.edu 4541 Subject: Registration of new MIME Conversions value 4542 for Application content-type 4543 4544 MIME Conversions name: 4545 4546 4547 4548 4549 Borenstein & Freed [Page 69] 4550 4551 4552 4553 4554 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4555 4556 4557 Published specification: 4558 4559 (The published specification must be an Internet RFC or 4560 RFC-to-be.) 4561 4562 Person & email address to contact for further 4563 information: 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 Borenstein & Freed [Page 70] 4615 4616 4617 4618 4619 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4620 4621 4622 Appendix G -- Summary of the Seven Content-types 4623 4624 Content-type: text 4625 4626 Subtypes defined by this document: plain, richtext 4627 4628 Important Parameters: charset 4629 4630 Encoding notes: quoted-printable generally preferred if an 4631 encoding is needed and the character set is mostly an 4632 ASCII superset. 4633 4634 Security considerations: Rich text formats such as TeX and 4635 Troff often contain mechanisms for executing arbitrary 4636 commands or file system operations, and should not be 4637 used automatically unless these security problems have 4638 been addressed. Even plain text may contain control 4639 characters that can be used to exploit the capabilities 4640 of "intelligent" terminals and cause security 4641 violations. User interfaces designed to run on such 4642 terminals should be aware of and try to prevent such 4643 problems. 4644 ________________________________________________________________ 4645 4646 Content-type: multipart 4647 4648 Subtypes defined by this document: mixed, alternative, 4649 digest, parallel. 4650 4651 Important Parameters: boundary 4652 4653 Encoding notes: No content-transfer-encoding is permitted. 4654 4655 ________________________________________________________________ 4656 4657 Content-type: message 4658 4659 Subtypes defined by this document: rfc822, partial, 4660 external-body 4661 4662 Important Parameters: id, number, total 4663 4664 Encoding notes: No content-transfer-encoding is permitted. 4665 4666 ________________________________________________________________ 4667 4668 Content-type: application 4669 4670 Subtypes defined by this document: octet-stream, 4671 postscript, oda 4672 4673 Important Parameters: profile 4674 4675 4676 4677 4678 4679 Borenstein & Freed [Page 71] 4680 4681 4682 4683 4684 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4685 4686 4687 Encoding notes: base64 generally preferred for octet-stream 4688 or other unreadable subtypes. 4689 4690 Security considerations: This type is intended for the 4691 transmission of data to be interpreted by locally-installed 4692 programs. If used, for example, to transmit executable 4693 binary programs or programs in general-purpose interpreted 4694 languages, such as LISP programs or shell scripts, severe 4695 security problems could result. In general, authors of 4696 mail-reading agents are cautioned against giving their 4697 systems the power to execute mail-based application data 4698 without carefully considering the security implications. 4699 While it is certainly possible to define safe application 4700 formats and even safe interpreters for unsafe formats, each 4701 interpreter should be evaluated separately for possible 4702 security problems. 4703 ________________________________________________________________ 4704 4705 Content-type: image 4706 4707 Subtypes defined by this document: jpeg, gif 4708 4709 Important Parameters: none 4710 4711 Encoding notes: base64 generally preferred 4712 4713 ________________________________________________________________ 4714 4715 Content-type: audio 4716 4717 Subtypes defined by this document: basic 4718 4719 Important Parameters: none 4720 4721 Encoding notes: base64 generally preferred 4722 4723 ________________________________________________________________ 4724 4725 Content-type: video 4726 4727 Subtypes defined by this document: mpeg 4728 4729 Important Parameters: none 4730 4731 Encoding notes: base64 generally preferred 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 Borenstein & Freed [Page 72] 4745 4746 4747 4748 4749 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4750 4751 4752 Appendix H -- Canonical Encoding Model 4753 4754 4755 4756 There was some confusion, in earlier drafts of this memo, 4757 regarding the model for when email data was to be converted 4758 to canonical form and encoded, and in particular how this 4759 process would affect the treatment of CRLFs, given that the 4760 representation of newlines varies greatly from system to 4761 system. For this reason, a canonical model for encoding is 4762 presented below. 4763 4764 The process of composing a MIME message part can be modelled 4765 as being done in a number of steps. Note that these steps 4766 are roughly similar to those steps used in RFC1113: 4767 4768 Step 1. Creation of local form. 4769 4770 The body part to be transmitted is created in the system's 4771 native format. The native character set is used, and where 4772 appropriate local end of line conventions are used as well. 4773 The may be a UNIX-style text file, or a Sun raster image, or 4774 a VMS indexed file, or audio data in a system-dependent 4775 format stored only in memory, or anything else that 4776 corresponds to the local model for the representation of 4777 some form of information. 4778 4779 Step 2. Conversion to canonical form. 4780 4781 The entire body part, including "out-of-band" information 4782 such as record lengths and possibly file attribute 4783 information, is converted to a universal canonical form. 4784 The specific content type of the body part as well as its 4785 associated attributes dictate the nature of the canonical 4786 form that is used. Conversion to the proper canonical form 4787 may involve character set conversion, transformation of 4788 audio data, compression, or various other operations 4789 specific to the various content types. 4790 4791 For example, in the case of text/plain data, the text must 4792 be converted to a supported character set and lines must be 4793 delimited with CRLF delimiters in accordance with RFC822. 4794 Note that the restriction on line lengths implied by RFC822 4795 is eliminated if the next step employs either quoted- 4796 printable or base64 encoding. 4797 4798 Step 3. Apply transfer encoding. 4799 4800 A Content-Transfer-Encoding appropriate for this body part 4801 is applied. Note that there is no fixed relationship 4802 between the content type and the transfer encoding. In 4803 particular, it may be appropriate to base the choice of 4804 base64 or quoted-printable on character frequency counts 4805 which are specific to a given instance of body part. 4806 4807 4808 4809 Borenstein & Freed [Page 73] 4810 4811 4812 4813 4814 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4815 4816 4817 Step 4. Insertion into message. 4818 4819 The encoded object is inserted into a MIME message with 4820 appropriate body part headers and boundary markers. 4821 4822 It is vital to note that these steps are only a model; they 4823 are specifically NOT a blueprint for how an actual system 4824 would be built. In particular, the model fails to account 4825 for two common designs: 4826 4827 1. In many cases the conversion to a canonical 4828 form prior to encoding will be subsumed into the 4829 encoder itself, which understands local formats 4830 directly. For example, the local newline 4831 convention for text bodyparts might be carried 4832 through to the encoder itself along with knowledge 4833 of what that format is. 4834 4835 2. The output of the encoders may have to pass 4836 through one or more additional steps prior to 4837 being transmitted as a message. As such, the 4838 output of the encoder may not be compliant with 4839 the formats specified by RFC822. In particular, 4840 once again it may be appropriate for the 4841 converter's output to be expressed using local 4842 newline conventions rather than using the standard 4843 RFC822 CRLF delimiters. 4844 4845 Other implementation variations are conceivable as well. 4846 The only important aspect of this discussion is that the 4847 resulting messages are consistent with those produced by the 4848 model described here. 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 Borenstein & Freed [Page 74] 4875 4876 4877 4878 4879 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4880 4881 4882 References 4883 4884 [US-ASCII] Coded Character Set--7-Bit American Standard Code 4885 for Information Interchange, ANSI X3.4-1986. 4886 4887 [ATK] Borenstein, Nathaniel S., Multimedia Applications 4888 Development with the Andrew Toolkit, Prentice-Hall, 1990. 4889 4890 [GIF] Graphics Interchange Format (Version 89a), Compuserve, 4891 Inc., Columbus, Ohio, 1990. 4892 4893 [ISO-2022] International Standard--Information Processing-- 4894 ISO 7-bit and 8-bit coded character sets--Code extension 4895 techniques, ISO 2022:1986. 4896 4897 [ISO-8859] Information Processing -- 8-bit Single-Byte Coded 4898 Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO 4899 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 4900 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4901 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: 4902 Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: 4903 Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: 4904 Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: 4905 Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin 4906 alphabet No. 5, ISO 8859-9, 1990. 4907 4908 [ISO-646] International Standard--Information Processing-- 4909 ISO 7-bit coded character set for information interchange, 4910 ISO 646:1983. 4911 4912 [MPEG] Video Coding Draft Standard ISO 11172 CD, ISO 4913 IEC/TJC1/SC2/WG11 (Motion Picture Experts Group), May, 1991. 4914 4915 [ODA] ISO 8613; Information Processing: Text and Office 4916 System; Office Document Architecture (ODA) and Interchange 4917 Format (ODIF), Part 1-8, 1989. 4918 4919 [PCM] CCITT, Fascicle III.4 - Recommendation G.711, Geneva, 4920 1972, "Pulse Code Modulation (PCM) of Voice Frequencies". 4921 4922 [POSTSCRIPT] Adobe Systems, Inc., PostScript Language 4923 Reference Manual, Addison-Wesley, 1985. 4924 4925 [X400] Schicker, Pietro, "Message Handling Systems, X.400", 4926 Message Handling Systems and Distributed Applications, E. 4927 Stefferud, O-j. Jacobsen, and P. Schicker, eds., North- 4928 Holland, 1989, pp. 3-41. 4929 4930 [RFC-783] Sollins, K.R. TFTP Protocol (revision 2). June, 4931 1981, MIT, RFC-783. 4932 4933 [RFC-821] Postel, J.B. Simple Mail Transfer Protocol. 4934 August, 1982, USC/Information Sciences Institute, RFC-821. 4935 4936 4937 4938 4939 Borenstein & Freed [Page 75] 4940 4941 4942 4943 4944 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 4945 4946 4947 [RFC-822] Crocker, D. Standard for the format of ARPA 4948 Internet text messages. August, 1982, UDEL, RFC-822. 4949 4950 [RFC-934] Rose, M.T.; Stefferud, E.A. Proposed standard 4951 for message encapsulation. January, 1985, Delaware 4952 and NMA, RFC-934. 4953 4954 [RFC-959] Postel, J.B.; Reynolds, J.K. File Transfer 4955 Protocol. October, 1985, USC/Information Sciences 4956 Institute, RFC-959. 4957 4958 [RFC-1049] Sirbu, M.A. Content-Type header field for 4959 Internet messages. March, 1988, CMU, RFC-1049. 4960 4961 [RFC-1113] Linn, J. Privacy enhancement for Internet 4962 electronic mail: Part I - message encipherment and 4963 authentication procedures. August, 1989, IAB Privacy Task 4964 Force, RFC-1113. 4965 4966 [RFC-1154] Robinson, D.; Ullmann, R. Encoding header field 4967 for Internet messages. April, 1990, Prime Computer, 4968 Inc., RFC-1154. 4969 4970 [RFC-1342] Moore, Keith, Representation of Non-Ascii Text in 4971 Internet Message Headers. June, 1992, University of 4972 Tennessee, RFC-1342. 4973 4974 Security Considerations 4975 4976 Security issues are discussed in Section 7.4.2 and in 4977 Appendix G. Implementors should pay special attention to 4978 the security implications of any mail content-types that can 4979 cause the remote execution of any actions in the recipient's 4980 environment. In such cases, the discussion of the 4981 applicaton/postscript content-type in Section 7.4.2 may 4982 serve as a model for considering other content-types with 4983 remote execution capabilities. 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 Borenstein & Freed [Page 76] 5005 5006 5007 5008 5009 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 5010 5011 5012 Authors' Addresses 5013 5014 For more information, the authors of this document may be 5015 contacted via Internet mail: 5016 5017 Nathaniel S. Borenstein 5018 MRE 2D-296, Bellcore 5019 445 South St. 5020 Morristown, NJ 07962-1910 5021 5022 Phone: +1 201 829 4270 5023 Fax: +1 201 829 7019 5024 Email: nsb@bellcore.com 5025 5026 5027 Ned Freed 5028 Innosoft International, Inc. 5029 250 West First Street 5030 Suite 240 5031 Claremont, CA 91711 5032 5033 Phone: +1 714 624 7907 5034 Fax: +1 714 621 5319 5035 Email: ned@innosoft.com 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 Borenstein & Freed [Page 77] 5070 5071 5072 5073 5074 RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 5075 5076 5077 5078 5079 5080 THIS PAGE INTENTIONALLY LEFT BLANK. 5081 5082 Please discard this page and place the following table of 5083 contents after the title page. 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 Borenstein & Freed [Page i] 5135 5136 5137 5138 5139 5140 5141 5142 5143 Table of Contents 5144 5145 5146 1 Introduction....................................... 1 5147 2 Notations, Conventions, and Generic BNF Grammar.... 3 5148 3 The MIME-Version Header Field...................... 5 5149 4 The Content-Type Header Field...................... 6 5150 5 The Content-Transfer-Encoding Header Field......... 10 5151 5.1 Quoted-Printable Content-Transfer-Encoding......... 14 5152 5.2 Base64 Content-Transfer-Encoding................... 17 5153 6 Additional Optional Content- Header Fields......... 19 5154 6.1 Optional Content-ID Header Field................... 19 5155 6.2 Optional Content-Description Header Field.......... 19 5156 7 The Predefined Content-Type Values................. 20 5157 7.1 The Text Content-Type.............................. 20 5158 7.1.1 The charset parameter.............................. 20 5159 7.1.2 The Text/plain subtype............................. 23 5160 7.1.3 The Text/richtext subtype.......................... 23 5161 7.2 The Multipart Content-Type......................... 29 5162 7.2.1 Multipart: The common syntax...................... 30 5163 7.2.2 The Multipart/mixed (primary) subtype.............. 34 5164 7.2.3 The Multipart/alternative subtype.................. 34 5165 7.2.4 The Multipart/digest subtype....................... 36 5166 7.2.5 The Multipart/parallel subtype..................... 36 5167 7.3 The Message Content-Type........................... 37 5168 7.3.1 The Message/rfc822 (primary) subtype............... 37 5169 7.3.2 The Message/Partial subtype........................ 37 5170 7.3.3 The Message/External-Body subtype.................. 40 5171 7.4 The Application Content-Type....................... 46 5172 7.4.1 The Application/Octet-Stream (primary) subtype..... 46 5173 7.4.2 The Application/PostScript subtype................. 47 5174 7.4.3 The Application/ODA subtype........................ 50 5175 7.5 The Image Content-Type............................. 51 5176 7.6 The Audio Content-Type............................. 51 5177 7.7 The Video Content-Type............................. 51 5178 7.8 Experimental Content-Type Values................... 51 5179 Summary............................................ 53 5180 Acknowledgements................................... 54 5181 Appendix A -- Minimal MIME-Conformance............. 56 5182 Appendix B -- General Guidelines For Sending Email Data59 5183 Appendix C -- A Complex Multipart Example.......... 62 5184 Appendix D -- A Simple Richtext-to-Text Translator in C64 5185 Appendix E -- Collected Grammar.................... 66 5186 Appendix F -- IANA Registration Procedures......... 68 5187 F.1 Registration of New Content-type/subtype Values..68 5188 F.2 Registration of New Character Set Values...... 69 5189 F.3 Registration of New Access-type Values for Message/external-body69 5190 F.4 Registration of New Conversions Values for Application69 5191 Appendix G -- Summary of the Seven Content-types... 71 5192 Appendix H -- Canonical Encoding Model............. 73 5193 References......................................... 75 5194 Security Considerations............................ 76 5195 Authors' Addresses................................. 77 5196 5197 5198 5199 Borenstein & Freed [Page ii] 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 Borenstein & Freed [Page iii] 5265