spec.txt (170787B)
1 File: APPNOTE.TXT - .ZIP File Format Specification 2 Version: 6.3.10 3 Status: FINAL - replaces version 6.3.9 4 Revised: Nov 01, 2022 5 Copyright (c) 1989 - 2014, 2018, 2019, 2020, 2022 PKWARE Inc., All Rights Reserved. 6 7 1.0 Introduction 8 --------------- 9 10 1.1 Purpose 11 ----------- 12 13 1.1.1 This specification is intended to define a cross-platform, 14 interoperable file storage and transfer format. Since its 15 first publication in 1989, PKWARE, Inc. ("PKWARE") has remained 16 committed to ensuring the interoperability of the .ZIP file 17 format through periodic publication and maintenance of this 18 specification. We trust that all .ZIP compatible vendors and 19 application developers that use and benefit from this format 20 will share and support this commitment to interoperability. 21 22 1.2 Scope 23 --------- 24 25 1.2.1 ZIP is one of the most widely used compressed file formats. It is 26 universally used to aggregate, compress, and encrypt files into a single 27 interoperable container. No specific use or application need is 28 defined by this format and no specific implementation guidance is 29 provided. This document provides details on the storage format for 30 creating ZIP files. Information is provided on the records and 31 fields that describe what a ZIP file is. 32 33 1.3 Trademarks 34 -------------- 35 36 1.3.1 PKWARE, PKZIP, Smartcrypt, SecureZIP, and PKSFX are registered 37 trademarks of PKWARE, Inc. in the United States and elsewhere. 38 PKPatchMaker, Deflate64, and ZIP64 are trademarks of PKWARE, Inc. 39 Other marks referenced within this document appear for identification 40 purposes only and are the property of their respective owners. 41 42 43 1.4 Permitted Use 44 ----------------- 45 46 1.4.1 This document, "APPNOTE.TXT - .ZIP File Format Specification" is the 47 exclusive property of PKWARE. Use of the information contained in this 48 document is permitted solely for the purpose of creating products, 49 programs and processes that read and write files in the ZIP format 50 subject to the terms and conditions herein. 51 52 1.4.2 Use of the content of this document within other publications is 53 permitted only through reference to this document. Any reproduction 54 or distribution of this document in whole or in part without prior 55 written permission from PKWARE is strictly prohibited. 56 57 1.4.3 Certain technological components provided in this document are the 58 patented proprietary technology of PKWARE and as such require a 59 separate, executed license agreement from PKWARE. Applicable 60 components are marked with the following, or similar, statement: 61 'Refer to the section in this document entitled "Incorporating 62 PKWARE Proprietary Technology into Your Product" for more information'. 63 64 1.5 Contacting PKWARE 65 --------------------- 66 67 1.5.1 If you have questions on this format, its use, or licensing, or if you 68 wish to report defects, request changes or additions, please contact: 69 70 PKWARE, Inc. 71 201 E. Pittsburgh Avenue, Suite 400 72 Milwaukee, WI 53204 73 +1-414-289-9788 74 +1-414-289-9789 FAX 75 zipformat@pkware.com 76 77 1.5.2 Information about this format and a reference copy of this document 78 is publicly available at: 79 80 http://www.pkware.com/appnote 81 82 1.6 Disclaimer 83 -------------- 84 85 1.6.1 Although PKWARE will attempt to supply current and accurate 86 information relating to its file formats, algorithms, and the 87 subject programs, the possibility of error or omission cannot 88 be eliminated. PKWARE therefore expressly disclaims any warranty 89 that the information contained in the associated materials relating 90 to the subject programs and/or the format of the files created or 91 accessed by the subject programs and/or the algorithms used by 92 the subject programs, or any other matter, is current, correct or 93 accurate as delivered. Any risk of damage due to any possible 94 inaccurate information is assumed by the user of the information. 95 Furthermore, the information relating to the subject programs 96 and/or the file formats created or accessed by the subject 97 programs and/or the algorithms used by the subject programs is 98 subject to change without notice. 99 100 2.0 Revisions 101 -------------- 102 103 2.1 Document Status 104 -------------------- 105 106 2.1.1 If the STATUS of this file is marked as DRAFT, the content 107 defines proposed revisions to this specification which may consist 108 of changes to the ZIP format itself, or that may consist of other 109 content changes to this document. Versions of this document and 110 the format in DRAFT form may be subject to modification prior to 111 publication STATUS of FINAL. DRAFT versions are published periodically 112 to provide notification to the ZIP community of pending changes and to 113 provide opportunity for review and comment. 114 115 2.1.2 Versions of this document having a STATUS of FINAL are 116 considered to be in the final form for that version of the document 117 and are not subject to further change until a new, higher version 118 numbered document is published. Newer versions of this format 119 specification are intended to remain interoperable with all prior 120 versions whenever technically possible. 121 122 2.2 Change Log 123 -------------- 124 125 Version Change Description Date 126 ------- ------------------ ---------- 127 5.2 -Single Password Symmetric Encryption 07/16/2003 128 storage 129 130 6.1.0 -Smartcard compatibility 01/20/2004 131 -Documentation on certificate storage 132 133 6.2.0 -Introduction of Central Directory 04/26/2004 134 Encryption for encrypting metadata 135 -Added OS X to Version Made By values 136 137 6.2.1 -Added Extra Field placeholder for 04/01/2005 138 POSZIP using ID 0x4690 139 140 -Clarified size field on 141 "zip64 end of central directory record" 142 143 6.2.2 -Documented Final Feature Specification 01/06/2006 144 for Strong Encryption 145 146 -Clarifications and typographical 147 corrections 148 149 6.3.0 -Added tape positioning storage 09/29/2006 150 parameters 151 152 -Expanded list of supported hash algorithms 153 154 -Expanded list of supported compression 155 algorithms 156 157 -Expanded list of supported encryption 158 algorithms 159 160 -Added option for Unicode filename 161 storage 162 163 -Clarifications for consistent use 164 of Data Descriptor records 165 166 -Added additional "Extra Field" 167 definitions 168 169 6.3.1 -Corrected standard hash values for 04/11/2007 170 SHA-256/384/512 171 172 6.3.2 -Added compression method 97 09/28/2007 173 174 -Documented InfoZIP "Extra Field" 175 values for UTF-8 file name and 176 file comment storage 177 178 6.3.3 -Formatting changes to support 09/01/2012 179 easier referencing of this APPNOTE 180 from other documents and standards 181 182 6.3.4 -Address change 10/01/2014 183 184 6.3.5 -Documented compression methods 16 11/31/2018 185 and 99 (4.4.5, 4.6.1, 5.11, 5.17, 186 APPENDIX E) 187 188 -Corrected several typographical 189 errors (2.1.2, 3.2, 4.1.1, 10.2) 190 191 -Marked legacy algorithms as no 192 longer suitable for use (4.4.5.1) 193 194 -Added clarity on MS DOS time format 195 (4.4.6) 196 197 -Assign extrafield ID for Timestamps 198 (4.5.2) 199 200 -Field code description correction (A.2) 201 202 -More consistent use of MAY/SHOULD/MUST 203 204 -Expanded 0x0065 record attribute codes (B.2) 205 206 -Initial information on 0x0022 Extra Data 207 208 6.3.6 -Corrected typographical error 04/26/2019 209 (4.4.1.3) 210 211 6.3.7 -Added Zstandard compression method ID 212 (4.4.5) 213 214 -Corrected several reported typos 215 216 -Marked intended use for general purpose bit 14 217 218 -Added Data Stream Alignment Extra Data info 219 (4.6.11) 220 221 6.3.8 -Resolved Zstandard compression method ID conflict 222 (4.4.5) 223 224 -Added additional compression method ID values in use 225 226 6.3.9 -Corrected a typo in Data Stream Alignment description 227 (4.6.11) 228 229 6.3.10 -Added several z/OS attribute values for APPENDIX B 230 231 -Added several additional 3rd party Extra Field mappings 232 (thanks to Armijn Hemel @tjaldur.nl for forwarding info 233 on several of the Header ID's) 234 235 236 237 3.0 Notations 238 ------------- 239 240 3.1 Use of the term MUST or SHALL indicates a required element. 241 242 3.2 MUST NOT or SHALL NOT indicates an element is prohibited from use. 243 244 3.3 SHOULD indicates a RECOMMENDED element. 245 246 3.4 SHOULD NOT indicates an element NOT RECOMMENDED for use. 247 248 3.5 MAY indicates an OPTIONAL element. 249 250 251 4.0 ZIP Files 252 ------------- 253 254 4.1 What is a ZIP file 255 ---------------------- 256 257 4.1.1 ZIP files MAY be identified by the standard .ZIP file extension 258 although use of a file extension is not required. Use of the 259 extension .ZIPX is also recognized and MAY be used for ZIP files. 260 Other common file extensions using the ZIP format include .JAR, .WAR, 261 .DOCX, .XLSX, .PPTX, .ODT, .ODS, .ODP and others. Programs reading or 262 writing ZIP files SHOULD rely on internal record signatures described 263 in this document to identify files in this format. 264 265 4.1.2 ZIP files SHOULD contain at least one file and MAY contain 266 multiple files. 267 268 4.1.3 Data compression MAY be used to reduce the size of files 269 placed into a ZIP file, but is not required. This format supports the 270 use of multiple data compression algorithms. When compression is used, 271 one of the documented compression algorithms MUST be used. Implementors 272 are advised to experiment with their data to determine which of the 273 available algorithms provides the best compression for their needs. 274 Compression method 8 (Deflate) is the method used by default by most 275 ZIP compatible application programs. 276 277 278 4.1.4 Data encryption MAY be used to protect files within a ZIP file. 279 Keying methods supported for encryption within this format include 280 passwords and public/private keys. Either MAY be used individually 281 or in combination. Encryption MAY be applied to individual files. 282 Additional security MAY be used through the encryption of ZIP file 283 metadata stored within the Central Directory. See the section on the 284 Strong Encryption Specification for information. Refer to the section 285 in this document entitled "Incorporating PKWARE Proprietary Technology 286 into Your Product" for more information. 287 288 4.1.5 Data integrity MUST be provided for each file using CRC32. 289 290 4.1.6 Additional data integrity MAY be included through the use of 291 digital signatures. Individual files MAY be signed with one or more 292 digital signatures. The Central Directory, if signed, MUST use a 293 single signature. 294 295 4.1.7 Files MAY be placed within a ZIP file uncompressed or stored. 296 The term "stored" as used in the context of this document means the file 297 is copied into the ZIP file uncompressed. 298 299 4.1.8 Each data file placed into a ZIP file MAY be compressed, stored, 300 encrypted or digitally signed independent of how other data files in the 301 same ZIP file are archived. 302 303 4.1.9 ZIP files MAY be streamed, split into segments (on fixed or on 304 removable media) or "self-extracting". Self-extracting ZIP 305 files MUST include extraction code for a target platform within 306 the ZIP file. 307 308 4.1.10 Extensibility is provided for platform or application specific 309 needs through extra data fields that MAY be defined for custom 310 purposes. Extra data definitions MUST NOT conflict with existing 311 documented record definitions. 312 313 4.1.11 Common uses for ZIP MAY also include the use of manifest files. 314 Manifest files store application specific information within a file stored 315 within the ZIP file. This manifest file SHOULD be the first file in the 316 ZIP file. This specification does not provide any information or guidance on 317 the use of manifest files within ZIP files. Refer to the application developer 318 for information on using manifest files and for any additional profile 319 information on using ZIP within an application. 320 321 4.1.12 ZIP files MAY be placed within other ZIP files. 322 323 4.2 ZIP Metadata 324 ---------------- 325 326 4.2.1 ZIP files are identified by metadata consisting of defined record types 327 containing the storage information necessary for maintaining the files 328 placed into a ZIP file. Each record type MUST be identified using a header 329 signature that identifies the record type. Signature values begin with the 330 two byte constant marker of 0x4b50, representing the characters "PK". 331 332 333 4.3 General Format of a .ZIP file 334 --------------------------------- 335 336 4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP 337 file containing only an "end of central directory record" is considered an 338 empty ZIP file. Files MAY be added or replaced within a ZIP file, or deleted. 339 A ZIP file MUST have only one "end of central directory record". Other 340 records defined in this specification MAY be used as needed to support 341 storage requirements for individual ZIP files. 342 343 4.3.2 Each file placed into a ZIP file MUST be preceded by a "local 344 file header" record for that file. Each "local file header" MUST be 345 accompanied by a corresponding "central directory header" record within 346 the central directory section of the ZIP file. 347 348 4.3.3 Files MAY be stored in arbitrary order within a ZIP file. A ZIP 349 file MAY span multiple volumes or it MAY be split into user-defined 350 segment sizes. All values MUST be stored in little-endian byte order unless 351 otherwise specified in this document for a specific data element. 352 353 4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption 354 header", or an "end of central directory record". Individual "central 355 directory records" MUST NOT be compressed, but the aggregate of all central 356 directory records MAY be compressed. 357 358 4.3.5 File data MAY be followed by a "data descriptor" for the file. Data 359 descriptors are used to facilitate ZIP file streaming. 360 361 362 4.3.6 Overall .ZIP file format: 363 364 [local file header 1] 365 [encryption header 1] 366 [file data 1] 367 [data descriptor 1] 368 . 369 . 370 . 371 [local file header n] 372 [encryption header n] 373 [file data n] 374 [data descriptor n] 375 [archive decryption header] 376 [archive extra data record] 377 [central directory header 1] 378 . 379 . 380 . 381 [central directory header n] 382 [zip64 end of central directory record] 383 [zip64 end of central directory locator] 384 [end of central directory record] 385 386 387 4.3.7 Local file header: 388 389 local file header signature 4 bytes (0x04034b50) 390 version needed to extract 2 bytes 391 general purpose bit flag 2 bytes 392 compression method 2 bytes 393 last mod file time 2 bytes 394 last mod file date 2 bytes 395 crc-32 4 bytes 396 compressed size 4 bytes 397 uncompressed size 4 bytes 398 file name length 2 bytes 399 extra field length 2 bytes 400 401 file name (variable size) 402 extra field (variable size) 403 404 4.3.8 File data 405 406 Immediately following the local header for a file 407 SHOULD be placed the compressed or stored data for the file. 408 If the file is encrypted, the encryption header for the file 409 SHOULD be placed after the local header and before the file 410 data. The series of [local file header][encryption header] 411 [file data][data descriptor] repeats for each file in the 412 .ZIP archive. 413 414 Zero-byte files, directories, and other file types that 415 contain no content MUST NOT include file data. 416 417 4.3.9 Data descriptor: 418 419 crc-32 4 bytes 420 compressed size 4 bytes 421 uncompressed size 4 bytes 422 423 4.3.9.1 This descriptor MUST exist if bit 3 of the general 424 purpose bit flag is set (see below). It is byte aligned 425 and immediately follows the last byte of compressed data. 426 This descriptor SHOULD be used only when it was not possible to 427 seek in the output .ZIP file, e.g., when the output .ZIP file 428 was standard output or a non-seekable device. For ZIP64(tm) format 429 archives, the compressed and uncompressed sizes are 8 bytes each. 430 431 4.3.9.2 When compressing files, compressed and uncompressed sizes 432 SHOULD be stored in ZIP64 format (as 8 byte values) when a 433 file's size exceeds 0xFFFFFFFF. However ZIP64 format MAY be 434 used regardless of the size of a file. When extracting, if 435 the zip64 extended information extra field is present for 436 the file the compressed and uncompressed sizes will be 8 437 byte values. 438 439 4.3.9.3 Although not originally assigned a signature, the value 440 0x08074b50 has commonly been adopted as a signature value 441 for the data descriptor record. Implementers SHOULD be 442 aware that ZIP files MAY be encountered with or without this 443 signature marking data descriptors and SHOULD account for 444 either case when reading ZIP files to ensure compatibility. 445 446 4.3.9.4 When writing ZIP files, implementors SHOULD include the 447 signature value marking the data descriptor record. When 448 the signature is used, the fields currently defined for 449 the data descriptor record will immediately follow the 450 signature. 451 452 4.3.9.5 An extensible data descriptor will be released in a 453 future version of this APPNOTE. This new record is intended to 454 resolve conflicts with the use of this record going forward, 455 and to provide better support for streamed file processing. 456 457 4.3.9.6 When the Central Directory Encryption method is used, 458 the data descriptor record is not required, but MAY be used. 459 If present, and bit 3 of the general purpose bit field is set to 460 indicate its presence, the values in fields of the data descriptor 461 record MUST be set to binary zeros. See the section on the Strong 462 Encryption Specification for information. Refer to the section in 463 this document entitled "Incorporating PKWARE Proprietary Technology 464 into Your Product" for more information. 465 466 467 4.3.10 Archive decryption header: 468 469 4.3.10.1 The Archive Decryption Header is introduced in version 6.2 470 of the ZIP format specification. This record exists in support 471 of the Central Directory Encryption Feature implemented as part of 472 the Strong Encryption Specification as described in this document. 473 When the Central Directory Structure is encrypted, this decryption 474 header MUST precede the encrypted data segment. 475 476 4.3.10.2 The encrypted data segment SHALL consist of the Archive 477 extra data record (if present) and the encrypted Central Directory 478 Structure data. The format of this data record is identical to the 479 Decryption header record preceding compressed file data. If the 480 central directory structure is encrypted, the location of the start of 481 this data record is determined using the Start of Central Directory 482 field in the Zip64 End of Central Directory record. See the 483 section on the Strong Encryption Specification for information 484 on the fields used in the Archive Decryption Header record. 485 Refer to the section in this document entitled "Incorporating 486 PKWARE Proprietary Technology into Your Product" for more information. 487 488 489 4.3.11 Archive extra data record: 490 491 archive extra data signature 4 bytes (0x08064b50) 492 extra field length 4 bytes 493 extra field data (variable size) 494 495 4.3.11.1 The Archive Extra Data Record is introduced in version 6.2 496 of the ZIP format specification. This record MAY be used in support 497 of the Central Directory Encryption Feature implemented as part of 498 the Strong Encryption Specification as described in this document. 499 When present, this record MUST immediately precede the central 500 directory data structure. 501 502 4.3.11.2 The size of this data record SHALL be included in the 503 Size of the Central Directory field in the End of Central 504 Directory record. If the central directory structure is compressed, 505 but not encrypted, the location of the start of this data record is 506 determined using the Start of Central Directory field in the Zip64 507 End of Central Directory record. Refer to the section in this document 508 entitled "Incorporating PKWARE Proprietary Technology into Your 509 Product" for more information. 510 511 4.3.12 Central directory structure: 512 513 [central directory header 1] 514 . 515 . 516 . 517 [central directory header n] 518 [digital signature] 519 520 File header: 521 522 central file header signature 4 bytes (0x02014b50) 523 version made by 2 bytes 524 version needed to extract 2 bytes 525 general purpose bit flag 2 bytes 526 compression method 2 bytes 527 last mod file time 2 bytes 528 last mod file date 2 bytes 529 crc-32 4 bytes 530 compressed size 4 bytes 531 uncompressed size 4 bytes 532 file name length 2 bytes 533 extra field length 2 bytes 534 file comment length 2 bytes 535 disk number start 2 bytes 536 internal file attributes 2 bytes 537 external file attributes 4 bytes 538 relative offset of local header 4 bytes 539 540 file name (variable size) 541 extra field (variable size) 542 file comment (variable size) 543 544 4.3.13 Digital signature: 545 546 header signature 4 bytes (0x05054b50) 547 size of data 2 bytes 548 signature data (variable size) 549 550 With the introduction of the Central Directory Encryption 551 feature in version 6.2 of this specification, the Central 552 Directory Structure MAY be stored both compressed and encrypted. 553 Although not required, it is assumed when encrypting the 554 Central Directory Structure, that it will be compressed 555 for greater storage efficiency. Information on the 556 Central Directory Encryption feature can be found in the section 557 describing the Strong Encryption Specification. The Digital 558 Signature record will be neither compressed nor encrypted. 559 560 4.3.14 Zip64 end of central directory record 561 562 zip64 end of central dir 563 signature 4 bytes (0x06064b50) 564 size of zip64 end of central 565 directory record 8 bytes 566 version made by 2 bytes 567 version needed to extract 2 bytes 568 number of this disk 4 bytes 569 number of the disk with the 570 start of the central directory 4 bytes 571 total number of entries in the 572 central directory on this disk 8 bytes 573 total number of entries in the 574 central directory 8 bytes 575 size of the central directory 8 bytes 576 offset of start of central 577 directory with respect to 578 the starting disk number 8 bytes 579 zip64 extensible data sector (variable size) 580 581 4.3.14.1 The value stored into the "size of zip64 end of central 582 directory record" SHOULD be the size of the remaining 583 record and SHOULD NOT include the leading 12 bytes. 584 585 Size = SizeOfFixedFields + SizeOfVariableData - 12. 586 587 4.3.14.2 The above record structure defines Version 1 of the 588 zip64 end of central directory record. Version 1 was 589 implemented in versions of this specification preceding 590 6.2 in support of the ZIP64 large file feature. The 591 introduction of the Central Directory Encryption feature 592 implemented in version 6.2 as part of the Strong Encryption 593 Specification defines Version 2 of this record structure. 594 Refer to the section describing the Strong Encryption 595 Specification for details on the version 2 format for 596 this record. Refer to the section in this document entitled 597 "Incorporating PKWARE Proprietary Technology into Your Product" 598 for more information applicable to use of Version 2 of this 599 record. 600 601 4.3.14.3 Special purpose data MAY reside in the zip64 extensible 602 data sector field following either a V1 or V2 version of this 603 record. To ensure identification of this special purpose data 604 it MUST include an identifying header block consisting of the 605 following: 606 607 Header ID - 2 bytes 608 Data Size - 4 bytes 609 610 The Header ID field indicates the type of data that is in the 611 data block that follows. 612 613 Data Size identifies the number of bytes that follow for this 614 data block type. 615 616 4.3.14.4 Multiple special purpose data blocks MAY be present. 617 Each MUST be preceded by a Header ID and Data Size field. Current 618 mappings of Header ID values supported in this field are as 619 defined in APPENDIX C. 620 621 4.3.15 Zip64 end of central directory locator 622 623 zip64 end of central dir locator 624 signature 4 bytes (0x07064b50) 625 number of the disk with the 626 start of the zip64 end of 627 central directory 4 bytes 628 relative offset of the zip64 629 end of central directory record 8 bytes 630 total number of disks 4 bytes 631 632 4.3.16 End of central directory record: 633 634 end of central dir signature 4 bytes (0x06054b50) 635 number of this disk 2 bytes 636 number of the disk with the 637 start of the central directory 2 bytes 638 total number of entries in the 639 central directory on this disk 2 bytes 640 total number of entries in 641 the central directory 2 bytes 642 size of the central directory 4 bytes 643 offset of start of central 644 directory with respect to 645 the starting disk number 4 bytes 646 .ZIP file comment length 2 bytes 647 .ZIP file comment (variable size) 648 649 4.4 Explanation of fields 650 -------------------------- 651 652 4.4.1 General notes on fields 653 654 4.4.1.1 All fields unless otherwise noted are unsigned and stored 655 in Intel low-byte:high-byte, low-word:high-word order. 656 657 4.4.1.2 String fields are not null terminated, since the length 658 is given explicitly. 659 660 4.4.1.3 The entries in the central directory MAY NOT necessarily 661 be in the same order that files appear in the .ZIP file. 662 663 4.4.1.4 If one of the fields in the end of central directory 664 record is too small to hold required data, the field SHOULD be 665 set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record 666 SHOULD be created. 667 668 4.4.1.5 The end of central directory record and the Zip64 end 669 of central directory locator record MUST reside on the same 670 disk when splitting or spanning an archive. 671 672 4.4.2 version made by (2 bytes) 673 674 4.4.2.1 The upper byte indicates the compatibility of the file 675 attribute information. If the external file attributes 676 are compatible with MS-DOS and can be read by PKZIP for 677 DOS version 2.04g then this value will be zero. If these 678 attributes are not compatible, then this value will 679 identify the host system on which the attributes are 680 compatible. Software can use this information to determine 681 the line record format for text files etc. 682 683 4.4.2.2 The current mappings are: 684 685 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) 686 1 - Amiga 2 - OpenVMS 687 3 - UNIX 4 - VM/CMS 688 5 - Atari ST 6 - OS/2 H.P.F.S. 689 7 - Macintosh 8 - Z-System 690 9 - CP/M 10 - Windows NTFS 691 11 - MVS (OS/390 - Z/OS) 12 - VSE 692 13 - Acorn Risc 14 - VFAT 693 15 - alternate MVS 16 - BeOS 694 17 - Tandem 18 - OS/400 695 19 - OS X (Darwin) 20 thru 255 - unused 696 697 4.4.2.3 The lower byte indicates the ZIP specification version 698 (the version of this document) supported by the software 699 used to encode the file. The value/10 indicates the major 700 version number, and the value mod 10 is the minor version 701 number. 702 703 4.4.3 version needed to extract (2 bytes) 704 705 4.4.3.1 The minimum supported ZIP specification version needed 706 to extract the file, mapped as above. This value is based on 707 the specific format features a ZIP program MUST support to 708 be able to extract the file. If multiple features are 709 applied to a file, the minimum version MUST be set to the 710 feature having the highest value. New features or feature 711 changes affecting the published format specification will be 712 implemented using higher version numbers than the last 713 published value to avoid conflict. 714 715 4.4.3.2 Current minimum feature versions are as defined below: 716 717 1.0 - Default value 718 1.1 - File is a volume label 719 2.0 - File is a folder (directory) 720 2.0 - File is compressed using Deflate compression 721 2.0 - File is encrypted using traditional PKWARE encryption 722 2.1 - File is compressed using Deflate64(tm) 723 2.5 - File is compressed using PKWARE DCL Implode 724 2.7 - File is a patch data set 725 4.5 - File uses ZIP64 format extensions 726 4.6 - File is compressed using BZIP2 compression* 727 5.0 - File is encrypted using DES 728 5.0 - File is encrypted using 3DES 729 5.0 - File is encrypted using original RC2 encryption 730 5.0 - File is encrypted using RC4 encryption 731 5.1 - File is encrypted using AES encryption 732 5.1 - File is encrypted using corrected RC2 encryption** 733 5.2 - File is encrypted using corrected RC2-64 encryption** 734 6.1 - File is encrypted using non-OAEP key wrapping*** 735 6.2 - Central directory encryption 736 6.3 - File is compressed using LZMA 737 6.3 - File is compressed using PPMd+ 738 6.3 - File is encrypted using Blowfish 739 6.3 - File is encrypted using Twofish 740 741 4.4.3.3 Notes on version needed to extract 742 743 * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the 744 version needed to extract for BZIP2 compression to be 50 745 when it SHOULD have been 46. 746 747 ** Refer to the section on Strong Encryption Specification 748 for additional information regarding RC2 corrections. 749 750 *** Certificate encryption using non-OAEP key wrapping is the 751 intended mode of operation for all versions beginning with 6.1. 752 Support for OAEP key wrapping MUST only be used for 753 backward compatibility when sending ZIP files to be opened by 754 versions of PKZIP older than 6.1 (5.0 or 6.0). 755 756 + Files compressed using PPMd MUST set the version 757 needed to extract field to 6.3, however, not all ZIP 758 programs enforce this and MAY be unable to decompress 759 data files compressed using PPMd if this value is set. 760 761 When using ZIP64 extensions, the corresponding value in the 762 zip64 end of central directory record MUST also be set. 763 This field SHOULD be set appropriately to indicate whether 764 Version 1 or Version 2 format is in use. 765 766 767 4.4.4 general purpose bit flag: (2 bytes) 768 769 Bit 0: If set, indicates that the file is encrypted. 770 771 (For Method 6 - Imploding) 772 Bit 1: If the compression method used was type 6, 773 Imploding, then this bit, if set, indicates 774 an 8K sliding dictionary was used. If clear, 775 then a 4K sliding dictionary was used. 776 777 Bit 2: If the compression method used was type 6, 778 Imploding, then this bit, if set, indicates 779 3 Shannon-Fano trees were used to encode the 780 sliding dictionary output. If clear, then 2 781 Shannon-Fano trees were used. 782 783 (For Methods 8 and 9 - Deflating) 784 Bit 2 Bit 1 785 0 0 Normal (-en) compression option was used. 786 0 1 Maximum (-exx/-ex) compression option was used. 787 1 0 Fast (-ef) compression option was used. 788 1 1 Super Fast (-es) compression option was used. 789 790 (For Method 14 - LZMA) 791 Bit 1: If the compression method used was type 14, 792 LZMA, then this bit, if set, indicates 793 an end-of-stream (EOS) marker is used to 794 mark the end of the compressed data stream. 795 If clear, then an EOS marker is not present 796 and the compressed data size must be known 797 to extract. 798 799 Note: Bits 1 and 2 are undefined if the compression 800 method is any other. 801 802 Bit 3: If this bit is set, the fields crc-32, compressed 803 size and uncompressed size are set to zero in the 804 local header. The correct values are put in the 805 data descriptor immediately following the compressed 806 data. (Note: PKZIP version 2.04g for DOS only 807 recognizes this bit for method 8 compression, newer 808 versions of PKZIP recognize this bit for any 809 compression method.) 810 811 Bit 4: Reserved for use with method 8, for enhanced 812 deflating. 813 814 Bit 5: If this bit is set, this indicates that the file is 815 compressed patched data. (Note: Requires PKZIP 816 version 2.70 or greater) 817 818 Bit 6: Strong encryption. If this bit is set, you MUST 819 set the version needed to extract value to at least 820 50 and you MUST also set bit 0. If AES encryption 821 is used, the version needed to extract value MUST 822 be at least 51. See the section describing the Strong 823 Encryption Specification for details. Refer to the 824 section in this document entitled "Incorporating PKWARE 825 Proprietary Technology into Your Product" for more 826 information. 827 828 Bit 7: Currently unused. 829 830 Bit 8: Currently unused. 831 832 Bit 9: Currently unused. 833 834 Bit 10: Currently unused. 835 836 Bit 11: Language encoding flag (EFS). If this bit is set, 837 the filename and comment fields for this file 838 MUST be encoded using UTF-8. (see APPENDIX D) 839 840 Bit 12: Reserved by PKWARE for enhanced compression. 841 842 Bit 13: Set when encrypting the Central Directory to indicate 843 selected data values in the Local Header are masked to 844 hide their actual values. See the section describing 845 the Strong Encryption Specification for details. Refer 846 to the section in this document entitled "Incorporating 847 PKWARE Proprietary Technology into Your Product" for 848 more information. 849 850 Bit 14: Reserved by PKWARE for alternate streams. 851 852 Bit 15: Reserved by PKWARE. 853 854 4.4.5 compression method: (2 bytes) 855 856 0 - The file is stored (no compression) 857 1 - The file is Shrunk 858 2 - The file is Reduced with compression factor 1 859 3 - The file is Reduced with compression factor 2 860 4 - The file is Reduced with compression factor 3 861 5 - The file is Reduced with compression factor 4 862 6 - The file is Imploded 863 7 - Reserved for Tokenizing compression algorithm 864 8 - The file is Deflated 865 9 - Enhanced Deflating using Deflate64(tm) 866 10 - PKWARE Data Compression Library Imploding (old IBM TERSE) 867 11 - Reserved by PKWARE 868 12 - File is compressed using BZIP2 algorithm 869 13 - Reserved by PKWARE 870 14 - LZMA 871 15 - Reserved by PKWARE 872 16 - IBM z/OS CMPSC Compression 873 17 - Reserved by PKWARE 874 18 - File is compressed using IBM TERSE (new) 875 19 - IBM LZ77 z Architecture 876 20 - deprecated (use method 93 for zstd) 877 93 - Zstandard (zstd) Compression 878 94 - MP3 Compression 879 95 - XZ Compression 880 96 - JPEG variant 881 97 - WavPack compressed data 882 98 - PPMd version I, Rev 1 883 99 - AE-x encryption marker (see APPENDIX E) 884 885 4.4.5.1 Methods 1-6 are legacy algorithms and are no longer 886 recommended for use when compressing files. 887 888 4.4.6 date and time fields: (2 bytes each) 889 890 The date and time are encoded in standard MS-DOS format. 891 If input came from standard input, the date and time are 892 those at which compression was started for this data. 893 If encrypting the central directory and general purpose bit 894 flag 13 is set indicating masking, the value stored in the 895 Local Header will be zero. MS-DOS time format is different 896 from more commonly used computer time formats such as 897 UTC. For example, MS-DOS uses year values relative to 1980 898 and 2 second precision. 899 900 4.4.7 CRC-32: (4 bytes) 901 902 The CRC-32 algorithm was generously contributed by 903 David Schwaderer and can be found in his excellent 904 book "C Programmers Guide to NetBIOS" published by 905 Howard W. Sams & Co. Inc. The 'magic number' for 906 the CRC is 0xdebb20e3. The proper CRC pre and post 907 conditioning is used, meaning that the CRC register 908 is pre-conditioned with all ones (a starting value 909 of 0xffffffff) and the value is post-conditioned by 910 taking the one's complement of the CRC residual. 911 If bit 3 of the general purpose flag is set, this 912 field is set to zero in the local header and the correct 913 value is put in the data descriptor and in the central 914 directory. When encrypting the central directory, if the 915 local header is not in ZIP64 format and general purpose 916 bit flag 13 is set indicating masking, the value stored 917 in the Local Header will be zero. 918 919 4.4.8 compressed size: (4 bytes) 920 4.4.9 uncompressed size: (4 bytes) 921 922 The size of the file compressed (4.4.8) and uncompressed, 923 (4.4.9) respectively. When a decryption header is present it 924 will be placed in front of the file data and the value of the 925 compressed file size will include the bytes of the decryption 926 header. If bit 3 of the general purpose bit flag is set, 927 these fields are set to zero in the local header and the 928 correct values are put in the data descriptor and 929 in the central directory. If an archive is in ZIP64 format 930 and the value in this field is 0xFFFFFFFF, the size will be 931 in the corresponding 8 byte ZIP64 extended information 932 extra field. When encrypting the central directory, if the 933 local header is not in ZIP64 format and general purpose bit 934 flag 13 is set indicating masking, the value stored for the 935 uncompressed size in the Local Header will be zero. 936 937 4.4.10 file name length: (2 bytes) 938 4.4.11 extra field length: (2 bytes) 939 4.4.12 file comment length: (2 bytes) 940 941 The length of the file name, extra field, and comment 942 fields respectively. The combined length of any 943 directory record and these three fields SHOULD NOT 944 generally exceed 65,535 bytes. If input came from standard 945 input, the file name length is set to zero. 946 947 948 4.4.13 disk number start: (2 bytes) 949 950 The number of the disk on which this file begins. If an 951 archive is in ZIP64 format and the value in this field is 952 0xFFFF, the size will be in the corresponding 4 byte zip64 953 extended information extra field. 954 955 4.4.14 internal file attributes: (2 bytes) 956 957 Bits 1 and 2 are reserved for use by PKWARE. 958 959 4.4.14.1 The lowest bit of this field indicates, if set, 960 that the file is apparently an ASCII or text file. If not 961 set, that the file apparently contains binary data. 962 The remaining bits are unused in version 1.0. 963 964 4.4.14.2 The 0x0002 bit of this field indicates, if set, that 965 a 4 byte variable record length control field precedes each 966 logical record indicating the length of the record. The 967 record length control field is stored in little-endian byte 968 order. This flag is independent of text control characters, 969 and if used in conjunction with text data, includes any 970 control characters in the total length of the record. This 971 value is provided for mainframe data transfer support. 972 973 4.4.15 external file attributes: (4 bytes) 974 975 The mapping of the external attributes is 976 host-system dependent (see 'version made by'). For 977 MS-DOS, the low order byte is the MS-DOS directory 978 attribute byte. If input came from standard input, this 979 field is set to zero. 980 981 4.4.16 relative offset of local header: (4 bytes) 982 983 This is the offset from the start of the first disk on 984 which this file appears, to where the local header SHOULD 985 be found. If an archive is in ZIP64 format and the value 986 in this field is 0xFFFFFFFF, the size will be in the 987 corresponding 8 byte zip64 extended information extra field. 988 989 4.4.17 file name: (Variable) 990 991 4.4.17.1 The name of the file, with optional relative path. 992 The path stored MUST NOT contain a drive or 993 device letter, or a leading slash. All slashes 994 MUST be forward slashes '/' as opposed to 995 backwards slashes '\' for compatibility with Amiga 996 and UNIX file systems etc. If input came from standard 997 input, there is no file name field. 998 999 4.4.17.2 If using the Central Directory Encryption Feature and 1000 general purpose bit flag 13 is set indicating masking, the file 1001 name stored in the Local Header will not be the actual file name. 1002 A masking value consisting of a unique hexadecimal value will 1003 be stored. This value will be sequentially incremented for each 1004 file in the archive. See the section on the Strong Encryption 1005 Specification for details on retrieving the encrypted file name. 1006 Refer to the section in this document entitled "Incorporating PKWARE 1007 Proprietary Technology into Your Product" for more information. 1008 1009 1010 4.4.18 file comment: (Variable) 1011 1012 The comment for this file. 1013 1014 4.4.19 number of this disk: (2 bytes) 1015 1016 The number of this disk, which contains central 1017 directory end record. If an archive is in ZIP64 format 1018 and the value in this field is 0xFFFF, the size will 1019 be in the corresponding 4 byte zip64 end of central 1020 directory field. 1021 1022 1023 4.4.20 number of the disk with the start of the central 1024 directory: (2 bytes) 1025 1026 The number of the disk on which the central 1027 directory starts. If an archive is in ZIP64 format 1028 and the value in this field is 0xFFFF, the size will 1029 be in the corresponding 4 byte zip64 end of central 1030 directory field. 1031 1032 4.4.21 total number of entries in the central dir on 1033 this disk: (2 bytes) 1034 1035 The number of central directory entries on this disk. 1036 If an archive is in ZIP64 format and the value in 1037 this field is 0xFFFF, the size will be in the 1038 corresponding 8 byte zip64 end of central 1039 directory field. 1040 1041 4.4.22 total number of entries in the central dir: (2 bytes) 1042 1043 The total number of files in the .ZIP file. If an 1044 archive is in ZIP64 format and the value in this field 1045 is 0xFFFF, the size will be in the corresponding 8 byte 1046 zip64 end of central directory field. 1047 1048 4.4.23 size of the central directory: (4 bytes) 1049 1050 The size (in bytes) of the entire central directory. 1051 If an archive is in ZIP64 format and the value in 1052 this field is 0xFFFFFFFF, the size will be in the 1053 corresponding 8 byte zip64 end of central 1054 directory field. 1055 1056 4.4.24 offset of start of central directory with respect to 1057 the starting disk number: (4 bytes) 1058 1059 Offset of the start of the central directory on the 1060 disk on which the central directory starts. If an 1061 archive is in ZIP64 format and the value in this 1062 field is 0xFFFFFFFF, the size will be in the 1063 corresponding 8 byte zip64 end of central 1064 directory field. 1065 1066 4.4.25 .ZIP file comment length: (2 bytes) 1067 1068 The length of the comment for this .ZIP file. 1069 1070 4.4.26 .ZIP file comment: (Variable) 1071 1072 The comment for this .ZIP file. ZIP file comment data 1073 is stored unsecured. No encryption or data authentication 1074 is applied to this area at this time. Confidential information 1075 SHOULD NOT be stored in this section. 1076 1077 4.4.27 zip64 extensible data sector (variable size) 1078 1079 (currently reserved for use by PKWARE) 1080 1081 1082 4.4.28 extra field: (Variable) 1083 1084 This SHOULD be used for storage expansion. If additional 1085 information needs to be stored within a ZIP file for special 1086 application or platform needs, it SHOULD be stored here. 1087 Programs supporting earlier versions of this specification can 1088 then safely skip the file, and find the next file or header. 1089 This field will be 0 length in version 1.0. 1090 1091 Existing extra fields are defined in the section 1092 Extensible data fields that follows. 1093 1094 4.5 Extensible data fields 1095 -------------------------- 1096 1097 4.5.1 In order to allow different programs and different types 1098 of information to be stored in the 'extra' field in .ZIP 1099 files, the following structure MUST be used for all 1100 programs storing data in this field: 1101 1102 header1+data1 + header2+data2 . . . 1103 1104 Each header MUST consist of: 1105 1106 Header ID - 2 bytes 1107 Data Size - 2 bytes 1108 1109 Note: all fields stored in Intel low-byte/high-byte order. 1110 1111 The Header ID field indicates the type of data that is in 1112 the following data block. 1113 1114 Header IDs of 0 thru 31 are reserved for use by PKWARE. 1115 The remaining IDs can be used by third party vendors for 1116 proprietary usage. 1117 1118 4.5.2 The current Header ID mappings defined by PKWARE are: 1119 1120 0x0001 Zip64 extended information extra field 1121 0x0007 AV Info 1122 0x0008 Reserved for extended language encoding data (PFS) 1123 (see APPENDIX D) 1124 0x0009 OS/2 1125 0x000a NTFS 1126 0x000c OpenVMS 1127 0x000d UNIX 1128 0x000e Reserved for file stream and fork descriptors 1129 0x000f Patch Descriptor 1130 0x0014 PKCS#7 Store for X.509 Certificates 1131 0x0015 X.509 Certificate ID and Signature for 1132 individual file 1133 0x0016 X.509 Certificate ID for Central Directory 1134 0x0017 Strong Encryption Header 1135 0x0018 Record Management Controls 1136 0x0019 PKCS#7 Encryption Recipient Certificate List 1137 0x0020 Reserved for Timestamp record 1138 0x0021 Policy Decryption Key Record 1139 0x0022 Smartcrypt Key Provider Record 1140 0x0023 Smartcrypt Policy Key Data Record 1141 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes 1142 - uncompressed 1143 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) 1144 attributes - compressed 1145 0x4690 POSZIP 4690 (reserved) 1146 1147 1148 4.5.3 -Zip64 Extended Information Extra Field (0x0001): 1149 1150 The following is the layout of the zip64 extended 1151 information "extra" block. If one of the size or 1152 offset fields in the Local or Central directory 1153 record is too small to hold the required data, 1154 a Zip64 extended information record is created. 1155 The order of the fields in the zip64 extended 1156 information record is fixed, but the fields MUST 1157 only appear if the corresponding Local or Central 1158 directory record field is set to 0xFFFF or 0xFFFFFFFF. 1159 1160 Note: all fields stored in Intel low-byte/high-byte order. 1161 1162 Value Size Description 1163 ----- ---- ----------- 1164 (ZIP64) 0x0001 2 bytes Tag for this "extra" block type 1165 Size 2 bytes Size of this "extra" block 1166 Original 1167 Size 8 bytes Original uncompressed file size 1168 Compressed 1169 Size 8 bytes Size of compressed data 1170 Relative Header 1171 Offset 8 bytes Offset of local header record 1172 Disk Start 1173 Number 4 bytes Number of the disk on which 1174 this file starts 1175 1176 This entry in the Local header MUST include BOTH original 1177 and compressed file size fields. If encrypting the 1178 central directory and bit 13 of the general purpose bit 1179 flag is set indicating masking, the value stored in the 1180 Local Header for the original file size will be zero. 1181 1182 1183 4.5.4 -OS/2 Extra Field (0x0009): 1184 1185 The following is the layout of the OS/2 attributes "extra" 1186 block. (Last Revision 09/05/95) 1187 1188 Note: all fields stored in Intel low-byte/high-byte order. 1189 1190 Value Size Description 1191 ----- ---- ----------- 1192 (OS/2) 0x0009 2 bytes Tag for this "extra" block type 1193 TSize 2 bytes Size for the following data block 1194 BSize 4 bytes Uncompressed Block Size 1195 CType 2 bytes Compression type 1196 EACRC 4 bytes CRC value for uncompress block 1197 (var) variable Compressed block 1198 1199 The OS/2 extended attribute structure (FEA2LIST) is 1200 compressed and then stored in its entirety within this 1201 structure. There will only ever be one "block" of data in 1202 VarFields[]. 1203 1204 4.5.5 -NTFS Extra Field (0x000a): 1205 1206 The following is the layout of the NTFS attributes 1207 "extra" block. (Note: At this time the Mtime, Atime 1208 and Ctime values MAY be used on any WIN32 system.) 1209 1210 Note: all fields stored in Intel low-byte/high-byte order. 1211 1212 Value Size Description 1213 ----- ---- ----------- 1214 (NTFS) 0x000a 2 bytes Tag for this "extra" block type 1215 TSize 2 bytes Size of the total "extra" block 1216 Reserved 4 bytes Reserved for future use 1217 Tag1 2 bytes NTFS attribute tag value #1 1218 Size1 2 bytes Size of attribute #1, in bytes 1219 (var) Size1 Attribute #1 data 1220 . 1221 . 1222 . 1223 TagN 2 bytes NTFS attribute tag value #N 1224 SizeN 2 bytes Size of attribute #N, in bytes 1225 (var) SizeN Attribute #N data 1226 1227 For NTFS, values for Tag1 through TagN are as follows: 1228 (currently only one set of attributes is defined for NTFS) 1229 1230 Tag Size Description 1231 ----- ---- ----------- 1232 0x0001 2 bytes Tag for attribute #1 1233 Size1 2 bytes Size of attribute #1, in bytes 1234 Mtime 8 bytes File last modification time 1235 Atime 8 bytes File last access time 1236 Ctime 8 bytes File creation time 1237 1238 4.5.6 -OpenVMS Extra Field (0x000c): 1239 1240 The following is the layout of the OpenVMS attributes 1241 "extra" block. 1242 1243 Note: all fields stored in Intel low-byte/high-byte order. 1244 1245 Value Size Description 1246 ----- ---- ----------- 1247 (VMS) 0x000c 2 bytes Tag for this "extra" block type 1248 TSize 2 bytes Size of the total "extra" block 1249 CRC 4 bytes 32-bit CRC for remainder of the block 1250 Tag1 2 bytes OpenVMS attribute tag value #1 1251 Size1 2 bytes Size of attribute #1, in bytes 1252 (var) Size1 Attribute #1 data 1253 . 1254 . 1255 . 1256 TagN 2 bytes OpenVMS attribute tag value #N 1257 SizeN 2 bytes Size of attribute #N, in bytes 1258 (var) SizeN Attribute #N data 1259 1260 OpenVMS Extra Field Rules: 1261 1262 4.5.6.1. There will be one or more attributes present, which 1263 will each be preceded by the above TagX & SizeX values. 1264 These values are identical to the ATR$C_XXXX and ATR$S_XXXX 1265 constants which are defined in ATR.H under OpenVMS C. Neither 1266 of these values will ever be zero. 1267 1268 4.5.6.2. No word alignment or padding is performed. 1269 1270 4.5.6.3. A well-behaved PKZIP/OpenVMS program SHOULD NOT produce 1271 more than one sub-block with the same TagX value. Also, there MUST 1272 NOT be more than one "extra" block of type 0x000c in a particular 1273 directory record. 1274 1275 4.5.7 -UNIX Extra Field (0x000d): 1276 1277 The following is the layout of the UNIX "extra" block. 1278 Note: all fields are stored in Intel low-byte/high-byte 1279 order. 1280 1281 Value Size Description 1282 ----- ---- ----------- 1283 (UNIX) 0x000d 2 bytes Tag for this "extra" block type 1284 TSize 2 bytes Size for the following data block 1285 Atime 4 bytes File last access time 1286 Mtime 4 bytes File last modification time 1287 Uid 2 bytes File user ID 1288 Gid 2 bytes File group ID 1289 (var) variable Variable length data field 1290 1291 The variable length data field will contain file type 1292 specific data. Currently the only values allowed are 1293 the original "linked to" file names for hard or symbolic 1294 links, and the major and minor device node numbers for 1295 character and block device nodes. Since device nodes 1296 cannot be either symbolic or hard links, only one set of 1297 variable length data is stored. Link files will have the 1298 name of the original file stored. This name is NOT NULL 1299 terminated. Its size can be determined by checking TSize - 1300 12. Device entries will have eight bytes stored as two 4 1301 byte entries (in little endian format). The first entry 1302 will be the major device number, and the second the minor 1303 device number. 1304 1305 4.5.8 -PATCH Descriptor Extra Field (0x000f): 1306 1307 4.5.8.1 The following is the layout of the Patch Descriptor 1308 "extra" block. 1309 1310 Note: all fields stored in Intel low-byte/high-byte order. 1311 1312 Value Size Description 1313 ----- ---- ----------- 1314 (Patch) 0x000f 2 bytes Tag for this "extra" block type 1315 TSize 2 bytes Size of the total "extra" block 1316 Version 2 bytes Version of the descriptor 1317 Flags 4 bytes Actions and reactions (see below) 1318 OldSize 4 bytes Size of the file about to be patched 1319 OldCRC 4 bytes 32-bit CRC of the file to be patched 1320 NewSize 4 bytes Size of the resulting file 1321 NewCRC 4 bytes 32-bit CRC of the resulting file 1322 1323 4.5.8.2 Actions and reactions 1324 1325 Bits Description 1326 ---- ---------------- 1327 0 Use for auto detection 1328 1 Treat as a self-patch 1329 2-3 RESERVED 1330 4-5 Action (see below) 1331 6-7 RESERVED 1332 8-9 Reaction (see below) to absent file 1333 10-11 Reaction (see below) to newer file 1334 12-13 Reaction (see below) to unknown file 1335 14-15 RESERVED 1336 16-31 RESERVED 1337 1338 4.5.8.2.1 Actions 1339 1340 Action Value 1341 ------ ----- 1342 none 0 1343 add 1 1344 delete 2 1345 patch 3 1346 1347 4.5.8.2.2 Reactions 1348 1349 Reaction Value 1350 -------- ----- 1351 ask 0 1352 skip 1 1353 ignore 2 1354 fail 3 1355 1356 4.5.8.3 Patch support is provided by PKPatchMaker(tm) technology 1357 and is covered under U.S. Patents and Patents Pending. The use or 1358 implementation in a product of certain technological aspects set 1359 forth in the current APPNOTE, including those with regard to 1360 strong encryption or patching requires a license from PKWARE. 1361 Refer to the section in this document entitled "Incorporating 1362 PKWARE Proprietary Technology into Your Product" for more 1363 information. 1364 1365 4.5.9 -PKCS#7 Store for X.509 Certificates (0x0014): 1366 1367 This field MUST contain information about each of the certificates 1368 files MAY be signed with. When the Central Directory Encryption 1369 feature is enabled for a ZIP file, this record will appear in 1370 the Archive Extra Data Record, otherwise it will appear in the 1371 first central directory record and will be ignored in any 1372 other record. 1373 1374 1375 Note: all fields stored in Intel low-byte/high-byte order. 1376 1377 Value Size Description 1378 ----- ---- ----------- 1379 (Store) 0x0014 2 bytes Tag for this "extra" block type 1380 TSize 2 bytes Size of the store data 1381 TData TSize Data about the store 1382 1383 1384 4.5.10 -X.509 Certificate ID and Signature for individual file (0x0015): 1385 1386 This field contains the information about which certificate in 1387 the PKCS#7 store was used to sign a particular file. It also 1388 contains the signature data. This field can appear multiple 1389 times, but can only appear once per certificate. 1390 1391 Note: all fields stored in Intel low-byte/high-byte order. 1392 1393 Value Size Description 1394 ----- ---- ----------- 1395 (CID) 0x0015 2 bytes Tag for this "extra" block type 1396 TSize 2 bytes Size of data that follows 1397 TData TSize Signature Data 1398 1399 4.5.11 -X.509 Certificate ID and Signature for central directory (0x0016): 1400 1401 This field contains the information about which certificate in 1402 the PKCS#7 store was used to sign the central directory structure. 1403 When the Central Directory Encryption feature is enabled for a 1404 ZIP file, this record will appear in the Archive Extra Data Record, 1405 otherwise it will appear in the first central directory record. 1406 1407 Note: all fields stored in Intel low-byte/high-byte order. 1408 1409 Value Size Description 1410 ----- ---- ----------- 1411 (CDID) 0x0016 2 bytes Tag for this "extra" block type 1412 TSize 2 bytes Size of data that follows 1413 TData TSize Data 1414 1415 4.5.12 -Strong Encryption Header (0x0017): 1416 1417 Value Size Description 1418 ----- ---- ----------- 1419 0x0017 2 bytes Tag for this "extra" block type 1420 TSize 2 bytes Size of data that follows 1421 Format 2 bytes Format definition for this record 1422 AlgID 2 bytes Encryption algorithm identifier 1423 Bitlen 2 bytes Bit length of encryption key 1424 Flags 2 bytes Processing flags 1425 CertData TSize-8 Certificate decryption extra field data 1426 (refer to the explanation for CertData 1427 in the section describing the 1428 Certificate Processing Method under 1429 the Strong Encryption Specification) 1430 1431 See the section describing the Strong Encryption Specification 1432 for details. Refer to the section in this document entitled 1433 "Incorporating PKWARE Proprietary Technology into Your Product" 1434 for more information. 1435 1436 4.5.13 -Record Management Controls (0x0018): 1437 1438 Value Size Description 1439 ----- ---- ----------- 1440 (Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type 1441 CSize 2 bytes Size of total extra block data 1442 Tag1 2 bytes Record control attribute 1 1443 Size1 2 bytes Size of attribute 1, in bytes 1444 Data1 Size1 Attribute 1 data 1445 . 1446 . 1447 . 1448 TagN 2 bytes Record control attribute N 1449 SizeN 2 bytes Size of attribute N, in bytes 1450 DataN SizeN Attribute N data 1451 1452 1453 4.5.14 -PKCS#7 Encryption Recipient Certificate List (0x0019): 1454 1455 This field MAY contain information about each of the certificates 1456 used in encryption processing and it can be used to identify who is 1457 allowed to decrypt encrypted files. This field SHOULD only appear 1458 in the archive extra data record. This field is not required and 1459 serves only to aid archive modifications by preserving public 1460 encryption key data. Individual security requirements may dictate 1461 that this data be omitted to deter information exposure. 1462 1463 Note: all fields stored in Intel low-byte/high-byte order. 1464 1465 Value Size Description 1466 ----- ---- ----------- 1467 (CStore) 0x0019 2 bytes Tag for this "extra" block type 1468 TSize 2 bytes Size of the store data 1469 TData TSize Data about the store 1470 1471 TData: 1472 1473 Value Size Description 1474 ----- ---- ----------- 1475 Version 2 bytes Format version number - MUST be 0x0001 at this time 1476 CStore (var) PKCS#7 data blob 1477 1478 See the section describing the Strong Encryption Specification 1479 for details. Refer to the section in this document entitled 1480 "Incorporating PKWARE Proprietary Technology into Your Product" 1481 for more information. 1482 1483 4.5.15 -MVS Extra Field (0x0065): 1484 1485 The following is the layout of the MVS "extra" block. 1486 Note: Some fields are stored in Big Endian format. 1487 All text is in EBCDIC format unless otherwise specified. 1488 Value Size Description 1489 ----- ---- ----------- 1490 (MVS) 0x0065 2 bytes Tag for this "extra" block type 1491 TSize 2 bytes Size for the following data block 1492 ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or 1493 "T4MV" for TargetFour 1494 (var) TSize-4 Attribute data (see APPENDIX B) 1495 1496 1497 4.5.16 -OS/400 Extra Field (0x0065): 1498 1499 The following is the layout of the OS/400 "extra" block. 1500 Note: Some fields are stored in Big Endian format. 1501 All text is in EBCDIC format unless otherwise specified. 1502 1503 Value Size Description 1504 ----- ---- ----------- 1505 (OS400) 0x0065 2 bytes Tag for this "extra" block type 1506 TSize 2 bytes Size for the following data block 1507 ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or 1508 "T4MV" for TargetFour 1509 (var) TSize-4 Attribute data (see APPENDIX A) 1510 1511 4.5.17 -Policy Decryption Key Record Extra Field (0x0021): 1512 1513 The following is the layout of the Policy Decryption Key "extra" block. 1514 TData is a variable length, variable content field. It holds 1515 information about encryptions and/or encryption key sources. 1516 Contact PKWARE for information on current TData structures. 1517 Information in this "extra" block may aternatively be placed 1518 within comment fields. Refer to the section in this document 1519 entitled "Incorporating PKWARE Proprietary Technology into Your 1520 Product" for more information. 1521 1522 Value Size Description 1523 ----- ---- ----------- 1524 0x0021 2 bytes Tag for this "extra" block type 1525 TSize 2 bytes Size for the following data block 1526 TData TSize Data about the key 1527 1528 4.5.18 -Key Provider Record Extra Field (0x0022): 1529 1530 The following is the layout of the Key Provider "extra" block. 1531 TData is a variable length, variable content field. It holds 1532 information about encryptions and/or encryption key sources. 1533 Contact PKWARE for information on current TData structures. 1534 Information in this "extra" block may aternatively be placed 1535 within comment fields. Refer to the section in this document 1536 entitled "Incorporating PKWARE Proprietary Technology into Your 1537 Product" for more information. 1538 1539 Value Size Description 1540 ----- ---- ----------- 1541 0x0022 2 bytes Tag for this "extra" block type 1542 TSize 2 bytes Size for the following data block 1543 TData TSize Data about the key 1544 1545 4.5.19 -Policy Key Data Record Record Extra Field (0x0023): 1546 1547 The following is the layout of the Policy Key Data "extra" block. 1548 TData is a variable length, variable content field. It holds 1549 information about encryptions and/or encryption key sources. 1550 Contact PKWARE for information on current TData structures. 1551 Information in this "extra" block may aternatively be placed 1552 within comment fields. Refer to the section in this document 1553 entitled "Incorporating PKWARE Proprietary Technology into Your 1554 Product" for more information. 1555 1556 Value Size Description 1557 ----- ---- ----------- 1558 0x0023 2 bytes Tag for this "extra" block type 1559 TSize 2 bytes Size for the following data block 1560 TData TSize Data about the key 1561 1562 4.6 Third Party Mappings 1563 ------------------------ 1564 1565 4.6.1 Third party mappings commonly used are: 1566 1567 0x07c8 Macintosh 1568 0x1986 Pixar USD header ID 1569 0x2605 ZipIt Macintosh 1570 0x2705 ZipIt Macintosh 1.3.5+ 1571 0x2805 ZipIt Macintosh 1.3.5+ 1572 0x334d Info-ZIP Macintosh 1573 0x4154 Tandem 1574 0x4341 Acorn/SparkFS 1575 0x4453 Windows NT security descriptor (binary ACL) 1576 0x4704 VM/CMS 1577 0x470f MVS 1578 0x4854 THEOS (old?) 1579 0x4b46 FWKCS MD5 (see below) 1580 0x4c41 OS/2 access control list (text ACL) 1581 0x4d49 Info-ZIP OpenVMS 1582 0x4d63 Macintosh Smartzip (??) 1583 0x4f4c Xceed original location extra field 1584 0x5356 AOS/VS (ACL) 1585 0x5455 extended timestamp 1586 0x554e Xceed unicode extra field 1587 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc) 1588 0x6375 Info-ZIP Unicode Comment Extra Field 1589 0x6542 BeOS/BeBox 1590 0x6854 THEOS 1591 0x7075 Info-ZIP Unicode Path Extra Field 1592 0x7441 AtheOS/Syllable 1593 0x756e ASi UNIX 1594 0x7855 Info-ZIP UNIX (new) 1595 0x7875 Info-ZIP UNIX (newer UID/GID) 1596 0xa11e Data Stream Alignment (Apache Commons-Compress) 1597 0xa220 Microsoft Open Packaging Growth Hint 1598 0xcafe Java JAR file Extra Field Header ID 1599 0xd935 Android ZIP Alignment Extra Field 1600 0xe57a Korean ZIP code page info 1601 0xfd4a SMS/QDOS 1602 0x9901 AE-x encryption structure (see APPENDIX E) 1603 0x9902 unknown 1604 1605 1606 Detailed descriptions of Extra Fields defined by third 1607 party mappings will be documented as information on 1608 these data structures is made available to PKWARE. 1609 PKWARE does not guarantee the accuracy of any published 1610 third party data. 1611 1612 4.6.2 Third-party Extra Fields MUST include a Header ID using 1613 the format defined in the section of this document 1614 titled Extensible Data Fields (section 4.5). 1615 1616 The Data Size field indicates the size of the following 1617 data block. Programs can use this value to skip to the 1618 next header block, passing over any data blocks that are 1619 not of interest. 1620 1621 Note: As stated above, the size of the entire .ZIP file 1622 header, including the file name, comment, and extra 1623 field SHOULD NOT exceed 64K in size. 1624 1625 4.6.3 In case two different programs appropriate the same 1626 Header ID value, it is strongly recommended that each 1627 program SHOULD place a unique signature of at least two bytes in 1628 size (and preferably 4 bytes or bigger) at the start of 1629 each data area. Every program SHOULD verify that its 1630 unique signature is present, in addition to the Header ID 1631 value being correct, before assuming that it is a block of 1632 known type. 1633 1634 Third-party Mappings: 1635 Not all third-party extra field mappings are documented here. 1636 1637 4.6.4 -ZipIt Macintosh Extra Field (long) (0x2605): 1638 1639 The following is the layout of the ZipIt extra block 1640 for Macintosh. The local-header and central-header versions 1641 are identical. This block MUST be present if the file is 1642 stored MacBinary-encoded and it SHOULD NOT be used if the file 1643 is not stored MacBinary-encoded. 1644 1645 Value Size Description 1646 ----- ---- ----------- 1647 (Mac2) 0x2605 Short tag for this extra block type 1648 TSize Short total data size for this block 1649 "ZPIT" beLong extra-field signature 1650 FnLen Byte length of FileName 1651 FileName variable full Macintosh filename 1652 FileType Byte[4] four-byte Mac file type string 1653 Creator Byte[4] four-byte Mac creator string 1654 1655 1656 4.6.5 -ZipIt Macintosh Extra Field (short, for files) (0x2705): 1657 1658 The following is the layout of a shortened variant of the 1659 ZipIt extra block for Macintosh (without "full name" entry). 1660 This variant is used by ZipIt 1.3.5 and newer for entries of 1661 files (not directories) that do not have a MacBinary encoded 1662 file. The local-header and central-header versions are identical. 1663 1664 Value Size Description 1665 ----- ---- ----------- 1666 (Mac2b) 0x2705 Short tag for this extra block type 1667 TSize Short total data size for this block (12) 1668 "ZPIT" beLong extra-field signature 1669 FileType Byte[4] four-byte Mac file type string 1670 Creator Byte[4] four-byte Mac creator string 1671 fdFlags beShort attributes from FInfo.frFlags, 1672 MAY be omitted 1673 0x0000 beShort reserved, MAY be omitted 1674 1675 1676 4.6.6 -ZipIt Macintosh Extra Field (short, for directories) (0x2805): 1677 1678 The following is the layout of a shortened variant of the 1679 ZipIt extra block for Macintosh used only for directory 1680 entries. This variant is used by ZipIt 1.3.5 and newer to 1681 save some optional Mac-specific information about directories. 1682 The local-header and central-header versions are identical. 1683 1684 Value Size Description 1685 ----- ---- ----------- 1686 (Mac2c) 0x2805 Short tag for this extra block type 1687 TSize Short total data size for this block (12) 1688 "ZPIT" beLong extra-field signature 1689 frFlags beShort attributes from DInfo.frFlags, MAY 1690 be omitted 1691 View beShort ZipIt view flag, MAY be omitted 1692 1693 1694 The View field specifies ZipIt-internal settings as follows: 1695 1696 Bits of the Flags: 1697 bit 0 if set, the folder is shown expanded (open) 1698 when the archive contents are viewed in ZipIt. 1699 bits 1-15 reserved, zero; 1700 1701 1702 4.6.7 -FWKCS MD5 Extra Field (0x4b46): 1703 1704 The FWKCS Contents_Signature System, used in 1705 automatically identifying files independent of file name, 1706 optionally adds and uses an extra field to support the 1707 rapid creation of an enhanced contents_signature: 1708 1709 Header ID = 0x4b46 1710 Data Size = 0x0013 1711 Preface = 'M','D','5' 1712 followed by 16 bytes containing the uncompressed file's 1713 128_bit MD5 hash(1), low byte first. 1714 1715 When FWKCS revises a .ZIP file central directory to add 1716 this extra field for a file, it also replaces the 1717 central directory entry for that file's uncompressed 1718 file length with a measured value. 1719 1720 FWKCS provides an option to strip this extra field, if 1721 present, from a .ZIP file central directory. In adding 1722 this extra field, FWKCS preserves .ZIP file Authenticity 1723 Verification; if stripping this extra field, FWKCS 1724 preserves all versions of AV through PKZIP version 2.04g. 1725 1726 FWKCS, and FWKCS Contents_Signature System, are 1727 trademarks of Frederick W. Kantor. 1728 1729 (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer 1730 Science and RSA Data Security, Inc., April 1992. 1731 ll.76-77: "The MD5 algorithm is being placed in the 1732 public domain for review and possible adoption as a 1733 standard." 1734 1735 1736 4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375): 1737 1738 Stores the UTF-8 version of the file comment as stored in the 1739 central directory header. (Last Revision 20070912) 1740 1741 Value Size Description 1742 ----- ---- ----------- 1743 (UCom) 0x6375 Short tag for this extra block type ("uc") 1744 TSize Short total data size for this block 1745 Version 1 byte version of this extra field, currently 1 1746 ComCRC32 4 bytes Comment Field CRC32 Checksum 1747 UnicodeCom Variable UTF-8 version of the entry comment 1748 1749 Currently Version is set to the number 1. If there is a need 1750 to change this field, the version will be incremented. Changes 1751 MAY NOT be backward compatible so this extra field SHOULD NOT be 1752 used if the version is not recognized. 1753 1754 The ComCRC32 is the standard zip CRC32 checksum of the File Comment 1755 field in the central directory header. This is used to verify that 1756 the comment field has not changed since the Unicode Comment extra field 1757 was created. This can happen if a utility changes the File Comment 1758 field but does not update the UTF-8 Comment extra field. If the CRC 1759 check fails, this Unicode Comment extra field SHOULD be ignored and 1760 the File Comment field in the header SHOULD be used instead. 1761 1762 The UnicodeCom field is the UTF-8 version of the File Comment field 1763 in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte 1764 order mark (BOM) is used. The length of this field is determined by 1765 subtracting the size of the previous fields from TSize. If both the 1766 File Name and Comment fields are UTF-8, the new General Purpose Bit 1767 Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate 1768 both the header File Name and Comment fields are UTF-8 and, in this 1769 case, the Unicode Path and Unicode Comment extra fields are not 1770 needed and SHOULD NOT be created. Note that, for backward 1771 compatibility, bit 11 SHOULD only be used if the native character set 1772 of the paths and comments being zipped up are already in UTF-8. It is 1773 expected that the same file comment storage method, either general 1774 purpose bit 11 or extra fields, be used in both the Local and Central 1775 Directory Header for a file. 1776 1777 1778 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075): 1779 1780 Stores the UTF-8 version of the file name field as stored in the 1781 local header and central directory header. (Last Revision 20070912) 1782 1783 Value Size Description 1784 ----- ---- ----------- 1785 (UPath) 0x7075 Short tag for this extra block type ("up") 1786 TSize Short total data size for this block 1787 Version 1 byte version of this extra field, currently 1 1788 NameCRC32 4 bytes File Name Field CRC32 Checksum 1789 UnicodeName Variable UTF-8 version of the entry File Name 1790 1791 Currently Version is set to the number 1. If there is a need 1792 to change this field, the version will be incremented. Changes 1793 MAY NOT be backward compatible so this extra field SHOULD NOT be 1794 used if the version is not recognized. 1795 1796 The NameCRC32 is the standard zip CRC32 checksum of the File Name 1797 field in the header. This is used to verify that the header 1798 File Name field has not changed since the Unicode Path extra field 1799 was created. This can happen if a utility renames the File Name but 1800 does not update the UTF-8 path extra field. If the CRC check fails, 1801 this UTF-8 Path Extra Field SHOULD be ignored and the File Name field 1802 in the header SHOULD be used instead. 1803 1804 The UnicodeName is the UTF-8 version of the contents of the File Name 1805 field in the header. As UnicodeName is defined to be UTF-8, no UTF-8 1806 byte order mark (BOM) is used. The length of this field is determined 1807 by subtracting the size of the previous fields from TSize. If both 1808 the File Name and Comment fields are UTF-8, the new General Purpose 1809 Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to 1810 indicate that both the header File Name and Comment fields are UTF-8 1811 and, in this case, the Unicode Path and Unicode Comment extra fields 1812 are not needed and SHOULD NOT be created. Note that, for backward 1813 compatibility, bit 11 SHOULD only be used if the native character set 1814 of the paths and comments being zipped up are already in UTF-8. It is 1815 expected that the same file name storage method, either general 1816 purpose bit 11 or extra fields, be used in both the Local and Central 1817 Directory Header for a file. 1818 1819 1820 4.6.10 -Microsoft Open Packaging Growth Hint (0xa220): 1821 1822 Value Size Description 1823 ----- ---- ----------- 1824 0xa220 Short tag for this extra block type 1825 TSize Short size of Sig + PadVal + Padding 1826 Sig Short verification signature (A028) 1827 PadVal Short Initial padding value 1828 Padding variable filled with NULL characters 1829 1830 4.6.11 -Data Stream Alignment (Apache Commons-Compress) (0xa11e): 1831 1832 (per Zbynek Vyskovsky) Defines alignment of data stream of this 1833 entry within the zip archive. Additionally, indicates whether the 1834 compression method should be kept when re-compressing the zip file. 1835 1836 The purpose of this extra field is to align specific resources to 1837 word or page boundaries so they can be easily mapped into memory. 1838 1839 Value Size Description 1840 ----- ---- ----------- 1841 0xa11e Short tag for this extra block type 1842 TSize Short total data size for this block (2+padding) 1843 alignment Short required alignment and indicator 1844 0x00 Variable padding 1845 1846 The alignment field (lower 15 bits) defines the minimal alignment 1847 required by the data stream. Bit 15 of alignment field indicates 1848 whether the compression method of this entry can be changed when 1849 recompressing the zip file. The value 0 means the compression method 1850 should not be changed. The value 1 indicates the compression method 1851 may be changed. The padding field contains padding to ensure the correct 1852 alignment. It can be changed at any time when the offset or required 1853 alignment changes. (see https://issues.apache.org/jira/browse/COMPRESS-391) 1854 1855 1856 4.7 Manifest Files 1857 ------------------ 1858 1859 4.7.1 Applications using ZIP files MAY have a need for additional 1860 information that MUST be included with the files placed into 1861 a ZIP file. Application specific information that cannot be 1862 stored using the defined ZIP storage records SHOULD be stored 1863 using the extensible Extra Field convention defined in this 1864 document. However, some applications MAY use a manifest 1865 file as a means for storing additional information. One 1866 example is the META-INF/MANIFEST.MF file used in ZIP formatted 1867 files having the .JAR extension (JAR files). 1868 1869 4.7.2 A manifest file is a file created for the application process 1870 that requires this information. A manifest file MAY be of any 1871 file type required by the defining application process. It is 1872 placed within the same ZIP file as files to which this information 1873 applies. By convention, this file is typically the first file placed 1874 into the ZIP file and it MAY include a defined directory path. 1875 1876 4.7.3 Manifest files MAY be compressed or encrypted as needed for 1877 application processing of the files inside the ZIP files. 1878 1879 Manifest files are outside of the scope of this specification. 1880 1881 1882 5.0 Explanation of compression methods 1883 -------------------------------------- 1884 1885 1886 5.1 UnShrinking - Method 1 1887 -------------------------- 1888 1889 5.1.1 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm 1890 with partial clearing. The initial code size is 9 bits, and the 1891 maximum code size is 13 bits. Shrinking differs from conventional 1892 Dynamic Ziv-Lempel-Welch implementations in several respects: 1893 1894 5.1.2 The code size is controlled by the compressor, and is 1895 not automatically increased when codes larger than the current 1896 code size are created (but not necessarily used). When 1897 the decompressor encounters the code sequence 256 1898 (decimal) followed by 1, it SHOULD increase the code size 1899 read from the input stream to the next bit size. No 1900 blocking of the codes is performed, so the next code at 1901 the increased size SHOULD be read from the input stream 1902 immediately after where the previous code at the smaller 1903 bit size was read. Again, the decompressor SHOULD NOT 1904 increase the code size used until the sequence 256,1 is 1905 encountered. 1906 1907 5.1.3 When the table becomes full, total clearing is not 1908 performed. Rather, when the compressor emits the code 1909 sequence 256,2 (decimal), the decompressor SHOULD clear 1910 all leaf nodes from the Ziv-Lempel tree, and continue to 1911 use the current code size. The nodes that are cleared 1912 from the Ziv-Lempel tree are then re-used, with the lowest 1913 code value re-used first, and the highest code value 1914 re-used last. The compressor can emit the sequence 256,2 1915 at any time. 1916 1917 5.2 Expanding - Methods 2-5 1918 --------------------------- 1919 1920 5.2.1 The Reducing algorithm is actually a combination of two 1921 distinct algorithms. The first algorithm compresses repeated 1922 byte sequences, and the second algorithm takes the compressed 1923 stream from the first algorithm and applies a probabilistic 1924 compression method. 1925 1926 5.2.2 The probabilistic compression stores an array of 'follower 1927 sets' S(j), for j=0 to 255, corresponding to each possible 1928 ASCII character. Each set contains between 0 and 32 1929 characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. 1930 The sets are stored at the beginning of the data area for a 1931 Reduced file, in reverse order, with S(255) first, and S(0) 1932 last. 1933 1934 5.2.3 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, 1935 where N(j) is the size of set S(j). N(j) can be 0, in which 1936 case the follower set for S(j) is empty. Each N(j) value is 1937 encoded in 6 bits, followed by N(j) eight bit character values 1938 corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If 1939 N(j) is 0, then no values for S(j) are stored, and the value 1940 for N(j-1) immediately follows. 1941 1942 5.2.4 Immediately after the follower sets, is the compressed data 1943 stream. The compressed data stream can be interpreted for the 1944 probabilistic decompression as follows: 1945 1946 let Last-Character <- 0. 1947 loop until done 1948 if the follower set S(Last-Character) is empty then 1949 read 8 bits from the input stream, and copy this 1950 value to the output stream. 1951 otherwise if the follower set S(Last-Character) is non-empty then 1952 read 1 bit from the input stream. 1953 if this bit is not zero then 1954 read 8 bits from the input stream, and copy this 1955 value to the output stream. 1956 otherwise if this bit is zero then 1957 read B(N(Last-Character)) bits from the input 1958 stream, and assign this value to I. 1959 Copy the value of S(Last-Character)[I] to the 1960 output stream. 1961 1962 assign the last value placed on the output stream to 1963 Last-Character. 1964 end loop 1965 1966 B(N(j)) is defined as the minimal number of bits required to 1967 encode the value N(j)-1. 1968 1969 5.2.5 The decompressed stream from above can then be expanded to 1970 re-create the original file as follows: 1971 1972 let State <- 0. 1973 1974 loop until done 1975 read 8 bits from the input stream into C. 1976 case State of 1977 0: if C is not equal to DLE (144 decimal) then 1978 copy C to the output stream. 1979 otherwise if C is equal to DLE then 1980 let State <- 1. 1981 1982 1: if C is non-zero then 1983 let V <- C. 1984 let Len <- L(V) 1985 let State <- F(Len). 1986 otherwise if C is zero then 1987 copy the value 144 (decimal) to the output stream. 1988 let State <- 0 1989 1990 2: let Len <- Len + C 1991 let State <- 3. 1992 1993 3: move backwards D(V,C) bytes in the output stream 1994 (if this position is before the start of the output 1995 stream, then assume that all the data before the 1996 start of the output stream is filled with zeros). 1997 copy Len+3 bytes from this position to the output stream. 1998 let State <- 0. 1999 end case 2000 end loop 2001 2002 The functions F,L, and D are dependent on the 'compression 2003 factor', 1 through 4, and are defined as follows: 2004 2005 For compression factor 1: 2006 L(X) equals the lower 7 bits of X. 2007 F(X) equals 2 if X equals 127 otherwise F(X) equals 3. 2008 D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. 2009 For compression factor 2: 2010 L(X) equals the lower 6 bits of X. 2011 F(X) equals 2 if X equals 63 otherwise F(X) equals 3. 2012 D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. 2013 For compression factor 3: 2014 L(X) equals the lower 5 bits of X. 2015 F(X) equals 2 if X equals 31 otherwise F(X) equals 3. 2016 D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. 2017 For compression factor 4: 2018 L(X) equals the lower 4 bits of X. 2019 F(X) equals 2 if X equals 15 otherwise F(X) equals 3. 2020 D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. 2021 2022 5.3 Imploding - Method 6 2023 ------------------------ 2024 2025 5.3.1 The Imploding algorithm is actually a combination of two 2026 distinct algorithms. The first algorithm compresses repeated byte 2027 sequences using a sliding dictionary. The second algorithm is 2028 used to compress the encoding of the sliding dictionary output, 2029 using multiple Shannon-Fano trees. 2030 2031 5.3.2 The Imploding algorithm can use a 4K or 8K sliding dictionary 2032 size. The dictionary size used can be determined by bit 1 in the 2033 general purpose flag word; a 0 bit indicates a 4K dictionary 2034 while a 1 bit indicates an 8K dictionary. 2035 2036 5.3.3 The Shannon-Fano trees are stored at the start of the 2037 compressed file. The number of trees stored is defined by bit 2 in 2038 the general purpose flag word; a 0 bit indicates two trees stored, 2039 a 1 bit indicates three trees are stored. If 3 trees are stored, 2040 the first Shannon-Fano tree represents the encoding of the 2041 Literal characters, the second tree represents the encoding of 2042 the Length information, the third represents the encoding of the 2043 Distance information. When 2 Shannon-Fano trees are stored, the 2044 Length tree is stored first, followed by the Distance tree. 2045 2046 5.3.4 The Literal Shannon-Fano tree, if present is used to represent 2047 the entire ASCII character set, and contains 256 values. This 2048 tree is used to compress any data not compressed by the sliding 2049 dictionary algorithm. When this tree is present, the Minimum 2050 Match Length for the sliding dictionary is 3. If this tree is 2051 not present, the Minimum Match Length is 2. 2052 2053 5.3.5 The Length Shannon-Fano tree is used to compress the Length 2054 part of the (length,distance) pairs from the sliding dictionary 2055 output. The Length tree contains 64 values, ranging from the 2056 Minimum Match Length, to 63 plus the Minimum Match Length. 2057 2058 5.3.6 The Distance Shannon-Fano tree is used to compress the Distance 2059 part of the (length,distance) pairs from the sliding dictionary 2060 output. The Distance tree contains 64 values, ranging from 0 to 2061 63, representing the upper 6 bits of the distance value. The 2062 distance values themselves will be between 0 and the sliding 2063 dictionary size, either 4K or 8K. 2064 2065 5.3.7 The Shannon-Fano trees themselves are stored in a compressed 2066 format. The first byte of the tree data represents the number of 2067 bytes of data representing the (compressed) Shannon-Fano tree 2068 minus 1. The remaining bytes represent the Shannon-Fano tree 2069 data encoded as: 2070 2071 High 4 bits: Number of values at this bit length + 1. (1 - 16) 2072 Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) 2073 2074 5.3.8 The Shannon-Fano codes can be constructed from the bit lengths 2075 using the following algorithm: 2076 2077 1) Sort the Bit Lengths in ascending order, while retaining the 2078 order of the original lengths stored in the file. 2079 2080 2) Generate the Shannon-Fano trees: 2081 2082 Code <- 0 2083 CodeIncrement <- 0 2084 LastBitLength <- 0 2085 i <- number of Shannon-Fano codes - 1 (either 255 or 63) 2086 2087 loop while i >= 0 2088 Code = Code + CodeIncrement 2089 if BitLength(i) <> LastBitLength then 2090 LastBitLength=BitLength(i) 2091 CodeIncrement = 1 shifted left (16 - LastBitLength) 2092 ShannonCode(i) = Code 2093 i <- i - 1 2094 end loop 2095 2096 3) Reverse the order of all the bits in the above ShannonCode() 2097 vector, so that the most significant bit becomes the least 2098 significant bit. For example, the value 0x1234 (hex) would 2099 become 0x2C48 (hex). 2100 2101 4) Restore the order of Shannon-Fano codes as originally stored 2102 within the file. 2103 2104 Example: 2105 2106 This example will show the encoding of a Shannon-Fano tree 2107 of size 8. Notice that the actual Shannon-Fano trees used 2108 for Imploding are either 64 or 256 entries in size. 2109 2110 Example: 0x02, 0x42, 0x01, 0x13 2111 2112 The first byte indicates 3 values in this table. Decoding the 2113 bytes: 2114 0x42 = 5 codes of 3 bits long 2115 0x01 = 1 code of 2 bits long 2116 0x13 = 2 codes of 4 bits long 2117 2118 This would generate the original bit length array of: 2119 (3, 3, 3, 3, 3, 2, 4, 4) 2120 2121 There are 8 codes in this table for the values 0 thru 7. Using 2122 the algorithm to obtain the Shannon-Fano codes produces: 2123 2124 Reversed Order Original 2125 Val Sorted Constructed Code Value Restored Length 2126 --- ------ ----------------- -------- -------- ------ 2127 0: 2 1100000000000000 11 101 3 2128 1: 3 1010000000000000 101 001 3 2129 2: 3 1000000000000000 001 110 3 2130 3: 3 0110000000000000 110 010 3 2131 4: 3 0100000000000000 010 100 3 2132 5: 3 0010000000000000 100 11 2 2133 6: 4 0001000000000000 1000 1000 4 2134 7: 4 0000000000000000 0000 0000 4 2135 2136 The values in the Val, Order Restored and Original Length columns 2137 now represent the Shannon-Fano encoding tree that can be used for 2138 decoding the Shannon-Fano encoded data. How to parse the 2139 variable length Shannon-Fano values from the data stream is beyond 2140 the scope of this document. (See the references listed at the end of 2141 this document for more information.) However, traditional decoding 2142 schemes used for Huffman variable length decoding, such as the 2143 Greenlaw algorithm, can be successfully applied. 2144 2145 5.3.9 The compressed data stream begins immediately after the 2146 compressed Shannon-Fano data. The compressed data stream can be 2147 interpreted as follows: 2148 2149 loop until done 2150 read 1 bit from input stream. 2151 2152 if this bit is non-zero then (encoded data is literal data) 2153 if Literal Shannon-Fano tree is present 2154 read and decode character using Literal Shannon-Fano tree. 2155 otherwise 2156 read 8 bits from input stream. 2157 copy character to the output stream. 2158 otherwise (encoded data is sliding dictionary match) 2159 if 8K dictionary size 2160 read 7 bits for offset Distance (lower 7 bits of offset). 2161 otherwise 2162 read 6 bits for offset Distance (lower 6 bits of offset). 2163 2164 using the Distance Shannon-Fano tree, read and decode the 2165 upper 6 bits of the Distance value. 2166 2167 using the Length Shannon-Fano tree, read and decode 2168 the Length value. 2169 2170 Length <- Length + Minimum Match Length 2171 2172 if Length = 63 + Minimum Match Length 2173 read 8 bits from the input stream, 2174 add this value to Length. 2175 2176 move backwards Distance+1 bytes in the output stream, and 2177 copy Length characters from this position to the output 2178 stream. (if this position is before the start of the output 2179 stream, then assume that all the data before the start of 2180 the output stream is filled with zeros). 2181 end loop 2182 2183 5.4 Tokenizing - Method 7 2184 ------------------------- 2185 2186 5.4.1 This method is not used by PKZIP. 2187 2188 5.5 Deflating - Method 8 2189 ------------------------ 2190 2191 5.5.1 The Deflate algorithm is similar to the Implode algorithm using 2192 a sliding dictionary of up to 32K with secondary compression 2193 from Huffman/Shannon-Fano codes. 2194 2195 5.5.2 The compressed data is stored in blocks with a header describing 2196 the block and the Huffman codes used in the data block. The header 2197 format is as follows: 2198 2199 Bit 0: Last Block bit This bit is set to 1 if this is the last 2200 compressed block in the data. 2201 Bits 1-2: Block type 2202 00 (0) - Block is stored - All stored data is byte aligned. 2203 Skip bits until next byte, then next word = block 2204 length, followed by the ones compliment of the block 2205 length word. Remaining data in block is the stored 2206 data. 2207 2208 01 (1) - Use fixed Huffman codes for literal and distance codes. 2209 Lit Code Bits Dist Code Bits 2210 --------- ---- --------- ---- 2211 0 - 143 8 0 - 31 5 2212 144 - 255 9 2213 256 - 279 7 2214 280 - 287 8 2215 2216 Literal codes 286-287 and distance codes 30-31 are 2217 never used but participate in the huffman construction. 2218 2219 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes) 2220 2221 11 (3) - Reserved - Flag a "Error in compressed data" if seen. 2222 2223 5.5.3 Expanding Huffman Codes 2224 2225 If the data block is stored with dynamic Huffman codes, the Huffman 2226 codes are sent in the following compressed format: 2227 2228 5 Bits: # of Literal codes sent - 256 (256 - 286) 2229 All other codes are never sent. 2230 5 Bits: # of Dist codes - 1 (1 - 32) 2231 4 Bits: # of Bit Length codes - 3 (3 - 19) 2232 2233 The Huffman codes are sent as bit lengths and the codes are built as 2234 described in the implode algorithm. The bit lengths themselves are 2235 compressed with Huffman codes. There are 19 bit length codes: 2236 2237 0 - 15: Represent bit lengths of 0 - 15 2238 16: Copy the previous bit length 3 - 6 times. 2239 The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6) 2240 Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will 2241 expand to 12 bit lengths of 8 (1 + 6 + 5) 2242 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length) 2243 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length) 2244 2245 The lengths of the bit length codes are sent packed 3 bits per value 2246 (0 - 7) in the following order: 2247 2248 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 2249 2250 The Huffman codes SHOULD be built as described in the Implode algorithm 2251 except codes are assigned starting at the shortest bit length, i.e. the 2252 shortest code SHOULD be all 0's rather than all 1's. Also, codes with 2253 a bit length of zero do not participate in the tree construction. The 2254 codes are then used to decode the bit lengths for the literal and 2255 distance tables. 2256 2257 The bit lengths for the literal tables are sent first with the number 2258 of entries sent described by the 5 bits sent earlier. There are up 2259 to 286 literal characters; the first 256 represent the respective 8 2260 bit character, code 256 represents the End-Of-Block code, the remaining 2261 29 codes represent copy lengths of 3 thru 258. There are up to 30 2262 distance codes representing distances from 1 thru 32k as described 2263 below. 2264 2265 Length Codes 2266 ------------ 2267 Extra Extra Extra Extra 2268 Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s) 2269 ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- --------- 2270 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162 2271 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194 2272 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226 2273 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257 2274 261 0 7 269 2 19-22 277 4 67-82 285 0 258 2275 262 0 8 270 2 23-26 278 4 83-98 2276 263 0 9 271 2 27-30 279 4 99-114 2277 264 0 10 272 2 31-34 280 4 115-130 2278 2279 Distance Codes 2280 -------------- 2281 Extra Extra Extra Extra 2282 Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance 2283 ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- -------- 2284 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144 2285 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192 2286 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288 2287 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384 2288 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576 2289 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768 2290 6 2 9-12 14 6 129-192 22 10 2049-3072 2291 7 2 13-16 15 6 193-256 23 10 3073-4096 2292 2293 5.5.4 The compressed data stream begins immediately after the 2294 compressed header data. The compressed data stream can be 2295 interpreted as follows: 2296 2297 do 2298 read header from input stream. 2299 2300 if stored block 2301 skip bits until byte aligned 2302 read count and 1's compliment of count 2303 copy count bytes data block 2304 otherwise 2305 loop until end of block code sent 2306 decode literal character from input stream 2307 if literal < 256 2308 copy character to the output stream 2309 otherwise 2310 if literal = end of block 2311 break from loop 2312 otherwise 2313 decode distance from input stream 2314 2315 move backwards distance bytes in the output stream, and 2316 copy length characters from this position to the output 2317 stream. 2318 end loop 2319 while not last block 2320 2321 if data descriptor exists 2322 skip bits until byte aligned 2323 read crc and sizes 2324 endif 2325 2326 5.6 Enhanced Deflating - Method 9 2327 --------------------------------- 2328 2329 5.6.1 The Enhanced Deflating algorithm is similar to Deflate but uses 2330 a sliding dictionary of up to 64K. Deflate64(tm) is supported 2331 by the Deflate extractor. 2332 2333 5.7 BZIP2 - Method 12 2334 --------------------- 2335 2336 5.7.1 BZIP2 is an open-source data compression algorithm developed by 2337 Julian Seward. Information and source code for this algorithm 2338 can be found on the internet. 2339 2340 5.8 LZMA - Method 14 2341 --------------------- 2342 2343 5.8.1 LZMA is a block-oriented, general purpose data compression 2344 algorithm developed and maintained by Igor Pavlov. It is a derivative 2345 of LZ77 that utilizes Markov chains and a range coder. Information and 2346 source code for this algorithm can be found on the internet. Consult 2347 with the author of this algorithm for information on terms or 2348 restrictions on use. 2349 2350 Support for LZMA within the ZIP format is defined as follows: 2351 2352 5.8.2 The Compression method field within the ZIP Local and Central 2353 Header records will be set to the value 14 to indicate data was 2354 compressed using LZMA. 2355 2356 5.8.3 The Version needed to extract field within the ZIP Local and 2357 Central Header records will be set to 6.3 to indicate the minimum 2358 ZIP format version supporting this feature. 2359 2360 5.8.4 File data compressed using the LZMA algorithm MUST be placed 2361 immediately following the Local Header for the file. If a standard 2362 ZIP encryption header is required, it will follow the Local Header 2363 and will precede the LZMA compressed file data segment. The location 2364 of LZMA compressed data segment within the ZIP format will be as shown: 2365 2366 [local header file 1] 2367 [encryption header file 1] 2368 [LZMA compressed data segment for file 1] 2369 [data descriptor 1] 2370 [local header file 2] 2371 2372 5.8.5 The encryption header and data descriptor records MAY 2373 be conditionally present. The LZMA Compressed Data Segment 2374 will consist of an LZMA Properties Header followed by the 2375 LZMA Compressed Data as shown: 2376 2377 [LZMA properties header for file 1] 2378 [LZMA compressed data for file 1] 2379 2380 5.8.6 The LZMA Compressed Data will be stored as provided by the 2381 LZMA compression library. Compressed size, uncompressed size and 2382 other file characteristics about the file being compressed MUST be 2383 stored in standard ZIP storage format. 2384 2385 5.8.7 The LZMA Properties Header will store specific data required 2386 to decompress the LZMA compressed Data. This data is set by the 2387 LZMA compression engine using the function WriteCoderProperties() 2388 as documented within the LZMA SDK. 2389 2390 5.8.8 Storage fields for the property information within the LZMA 2391 Properties Header are as follows: 2392 2393 LZMA Version Information 2 bytes 2394 LZMA Properties Size 2 bytes 2395 LZMA Properties Data variable, defined by "LZMA Properties Size" 2396 2397 5.8.8.1 LZMA Version Information - this field identifies which version 2398 of the LZMA SDK was used to compress a file. The first byte will 2399 store the major version number of the LZMA SDK and the second 2400 byte will store the minor number. 2401 2402 5.8.8.2 LZMA Properties Size - this field defines the size of the 2403 remaining property data. Typically this size SHOULD be determined by 2404 the version of the SDK. This size field is included as a convenience 2405 and to help avoid any ambiguity arising in the future due 2406 to changes in this compression algorithm. 2407 2408 5.8.8.3 LZMA Property Data - this variable sized field records the 2409 required values for the decompressor as defined by the LZMA SDK. 2410 The data stored in this field SHOULD be obtained using the 2411 WriteCoderProperties() in the version of the SDK defined by 2412 the "LZMA Version Information" field. 2413 2414 5.8.8.4 The layout of the "LZMA Properties Data" field is a function of 2415 the LZMA compression algorithm. It is possible that this layout MAY be 2416 changed by the author over time. The data layout in version 4.3 of the 2417 LZMA SDK defines a 5 byte array that uses 4 bytes to store the dictionary 2418 size in little-endian order. This is preceded by a single packed byte as 2419 the first element of the array that contains the following fields: 2420 2421 PosStateBits 2422 LiteralPosStateBits 2423 LiteralContextBits 2424 2425 Refer to the LZMA documentation for a more detailed explanation of 2426 these fields. 2427 2428 5.8.9 Data compressed with method 14, LZMA, MAY include an end-of-stream 2429 (EOS) marker ending the compressed data stream. This marker is not 2430 required, but its use is highly recommended to facilitate processing 2431 and implementers SHOULD include the EOS marker whenever possible. 2432 When the EOS marker is used, general purpose bit 1 MUSY be set. If 2433 general purpose bit 1 is not set, the EOS marker is not present. 2434 2435 5.9 WavPack - Method 97 2436 ----------------------- 2437 2438 5.9.1 Information describing the use of compression method 97 is 2439 provided by WinZIP International, LLC. This method relies on the 2440 open source WavPack audio compression utility developed by David Bryant. 2441 Information on WavPack is available at www.wavpack.com. Please consult 2442 with the author of this algorithm for information on terms and 2443 restrictions on use. 2444 2445 5.9.2 WavPack data for a file begins immediately after the end of the 2446 local header data. This data is the output from WavPack compression 2447 routines. Within the ZIP file, the use of WavPack compression is 2448 indicated by setting the compression method field to a value of 97 2449 in both the local header and the central directory header. The Version 2450 needed to extract and version made by fields use the same values as are 2451 used for data compressed using the Deflate algorithm. 2452 2453 5.9.3 An implementation note for storing digital sample data when using 2454 WavPack compression within ZIP files is that all of the bytes of 2455 the sample data SHOULD be compressed. This includes any unused 2456 bits up to the byte boundary. An example is a 2 byte sample that 2457 uses only 12 bits for the sample data with 4 unused bits. If only 2458 12 bits are passed as the sample size to the WavPack routines, the 4 2459 unused bits will be set to 0 on extraction regardless of their original 2460 state. To avoid this, the full 16 bits of the sample data size 2461 SHOULD be provided. 2462 2463 5.10 PPMd - Method 98 2464 --------------------- 2465 2466 5.10.1 PPMd is a data compression algorithm developed by Dmitry Shkarin 2467 which includes a carryless rangecoder developed by Dmitry Subbotin. 2468 This algorithm is based on predictive phrase matching on multiple 2469 order contexts. Information and source code for this algorithm 2470 can be found on the internet. Consult with the author of this 2471 algorithm for information on terms or restrictions on use. 2472 2473 5.10.2 Support for PPMd within the ZIP format currently is provided only 2474 for version I, revision 1 of the algorithm. Storage requirements 2475 for using this algorithm are as follows: 2476 2477 5.10.3 Parameters needed to control the algorithm are stored in the two 2478 bytes immediately preceding the compressed data. These bytes are 2479 used to store the following fields: 2480 2481 Model order - sets the maximum model order, default is 8, possible 2482 values are from 2 to 16 inclusive 2483 2484 Sub-allocator size - sets the size of sub-allocator in MB, default is 50, 2485 possible values are from 1MB to 256MB inclusive 2486 2487 Model restoration method - sets the method used to restart context 2488 model at memory insufficiency, values are: 2489 2490 0 - restarts model from scratch - default 2491 1 - cut off model - decreases performance by as much as 2x 2492 2 - freeze context tree - not recommended 2493 2494 5.10.4 An example for packing these fields into the 2 byte storage field is 2495 illustrated below. These values are stored in Intel low-byte/high-byte 2496 order. 2497 2498 wPPMd = (Model order - 1) + 2499 ((Sub-allocator size - 1) << 4) + 2500 (Model restoration method << 12) 2501 2502 2503 5.11 AE-x Encryption marker - Method 99 2504 ------------------------------------------- 2505 2506 5.12 JPEG variant - Method 96 2507 ------------------------------------------- 2508 2509 5.13 PKWARE Data Compression Library Imploding - Method 10 2510 ----------------------------------------------------------- 2511 2512 5.14 Reserved - Method 11 2513 ------------------------------------------- 2514 2515 5.15 Reserved - Method 13 2516 ------------------------------------------- 2517 2518 5.16 Reserved - Method 15 2519 ------------------------------------------- 2520 2521 5.17 IBM z/OS CMPSC Compression - Method 16 2522 ------------------------------------------- 2523 2524 Method 16 utilizes the IBM hardware compression facility available 2525 on most IBM mainframes. Hardware compression can significantly 2526 increase the speed of data compression. This method uses a variant 2527 of the LZ78 algorithm. CMPSC hardware compression is performed 2528 using the COMPRESSION CALL instruction. 2529 2530 ZIP archives can be created using this method only on mainframes 2531 supporting the CP instruction. Extraction MAY occur on any 2532 platform supporting this compression algorithm. Use of this 2533 algorithm requires creation of a compression dictionary and 2534 an expansion dictionary. The expansion dictionary MUST be 2535 placed into the ZIP archive for use on the system where 2536 extraction will occur. 2537 2538 Additional information on this compression algorithm and dictionaries 2539 can be found in the IBM provided document titled IBM ESA/390 Data 2540 Compression (SA22-7208-01). Storage requirements for using CMPSC 2541 compression are as follows. 2542 2543 The format for the compressed data stream placed into the ZIP 2544 archive following the Local Header is: 2545 2546 [dictionary header] 2547 [expansion dictionary] 2548 [CMPSC compressed data] 2549 2550 If encryption is used to encrypt a file compressed with CMPSC, these 2551 sections MUST be encrypted as a single entity. 2552 2553 The format of the dictionary header is: 2554 2555 Value Size Description 2556 ----- ---- ----------- 2557 Version 1 byte 1 2558 Flags/Symsize 1 byte Processing flags and 2559 symbol size 2560 DictionaryLen 4 bytes Length of the 2561 expansion dictionary 2562 2563 Explanation of processing flags and symbol size: 2564 2565 The high 4 bits are used to store the processing flags. The low 2566 4 bits represent the size of a symbol, in bits (values range 2567 from 9-13). Flag values are defined below. 2568 2569 0x80 - expansion dictionary 2570 0x40 - expansion dictionary is compressed using Deflate 2571 0x20 - Reserved 2572 0x10 - Reserved 2573 2574 2575 5.18 Reserved - Method 17 2576 ------------------------------------------- 2577 2578 5.19 IBM TERSE - Method 18 2579 ------------------------------------------- 2580 2581 5.20 IBM LZ77 z Architecture - Method 19 2582 ----------------------------------------- 2583 2584 6.0 Traditional PKWARE Encryption 2585 ---------------------------------- 2586 2587 6.0.1 The following information discusses the decryption steps 2588 required to support traditional PKWARE encryption. This 2589 form of encryption is considered weak by today's standards 2590 and its use is recommended only for situations with 2591 low security needs or for compatibility with older .ZIP 2592 applications. 2593 2594 6.1 Traditional PKWARE Decryption 2595 --------------------------------- 2596 2597 6.1.1 PKWARE is grateful to Mr. Roger Schlafly for his expert 2598 contribution towards the development of PKWARE's traditional 2599 encryption. 2600 2601 6.1.2 PKZIP encrypts the compressed data stream. Encrypted files 2602 MUST be decrypted before they can be extracted to their original 2603 form. 2604 2605 6.1.3 Each encrypted file has an extra 12 bytes stored at the start 2606 of the data area defining the encryption header for that file. The 2607 encryption header is originally set to random values, and then 2608 itself encrypted, using three, 32-bit keys. The key values are 2609 initialized using the supplied encryption password. After each byte 2610 is encrypted, the keys are then updated using pseudo-random number 2611 generation techniques in combination with the same CRC-32 algorithm 2612 used in PKZIP and described elsewhere in this document. 2613 2614 6.1.4 The following are the basic steps required to decrypt a file: 2615 2616 1) Initialize the three 32-bit keys with the password. 2617 2) Read and decrypt the 12-byte encryption header, further 2618 initializing the encryption keys. 2619 3) Read and decrypt the compressed data stream using the 2620 encryption keys. 2621 2622 6.1.5 Initializing the encryption keys 2623 2624 Key(0) <- 305419896 2625 Key(1) <- 591751049 2626 Key(2) <- 878082192 2627 2628 loop for i <- 0 to length(password)-1 2629 update_keys(password(i)) 2630 end loop 2631 2632 Where update_keys() is defined as: 2633 2634 update_keys(char): 2635 Key(0) <- crc32(key(0),char) 2636 Key(1) <- Key(1) + (Key(0) & 000000ffH) 2637 Key(1) <- Key(1) * 134775813 + 1 2638 Key(2) <- crc32(key(2),key(1) >> 24) 2639 end update_keys 2640 2641 Where crc32(old_crc,char) is a routine that given a CRC value and a 2642 character, returns an updated CRC value after applying the CRC-32 2643 algorithm described elsewhere in this document. 2644 2645 6.1.6 Decrypting the encryption header 2646 2647 The purpose of this step is to further initialize the encryption 2648 keys, based on random data, to render a plaintext attack on the 2649 data ineffective. 2650 2651 Read the 12-byte encryption header into Buffer, in locations 2652 Buffer(0) thru Buffer(11). 2653 2654 loop for i <- 0 to 11 2655 C <- buffer(i) ^ decrypt_byte() 2656 update_keys(C) 2657 buffer(i) <- C 2658 end loop 2659 2660 Where decrypt_byte() is defined as: 2661 2662 unsigned char decrypt_byte() 2663 local unsigned short temp 2664 temp <- Key(2) | 2 2665 decrypt_byte <- (temp * (temp ^ 1)) >> 8 2666 end decrypt_byte 2667 2668 After the header is decrypted, the last 1 or 2 bytes in Buffer 2669 SHOULD be the high-order word/byte of the CRC for the file being 2670 decrypted, stored in Intel low-byte/high-byte order. Versions of 2671 PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is 2672 used on versions after 2.0. This can be used to test if the password 2673 supplied is correct or not. 2674 2675 6.1.7 Decrypting the compressed data stream 2676 2677 The compressed data stream can be decrypted as follows: 2678 2679 loop until done 2680 read a character into C 2681 Temp <- C ^ decrypt_byte() 2682 update_keys(temp) 2683 output Temp 2684 end loop 2685 2686 2687 7.0 Strong Encryption Specification 2688 ----------------------------------- 2689 2690 7.0.1 Portions of the Strong Encryption technology defined in this 2691 specification are covered under patents and pending patent applications. 2692 Refer to the section in this document entitled "Incorporating 2693 PKWARE Proprietary Technology into Your Product" for more information. 2694 2695 7.1 Strong Encryption Overview 2696 ------------------------------ 2697 2698 7.1.1 Version 5.x of this specification introduced support for strong 2699 encryption algorithms. These algorithms can be used with either 2700 a password or an X.509v3 digital certificate to encrypt each file. 2701 This format specification supports either password or certificate 2702 based encryption to meet the security needs of today, to enable 2703 interoperability between users within both PKI and non-PKI 2704 environments, and to ensure interoperability between different 2705 computing platforms that are running a ZIP program. 2706 2707 7.1.2 Password based encryption is the most common form of encryption 2708 people are familiar with. However, inherent weaknesses with 2709 passwords (e.g. susceptibility to dictionary/brute force attack) 2710 as well as password management and support issues make certificate 2711 based encryption a more secure and scalable option. Industry 2712 efforts and support are defining and moving towards more advanced 2713 security solutions built around X.509v3 digital certificates and 2714 Public Key Infrastructures(PKI) because of the greater scalability, 2715 administrative options, and more robust security over traditional 2716 password based encryption. 2717 2718 7.1.3 Most standard encryption algorithms are supported with this 2719 specification. Reference implementations for many of these 2720 algorithms are available from either commercial or open source 2721 distributors. Readily available cryptographic toolkits make 2722 implementation of the encryption features straight-forward. 2723 This document is not intended to provide a treatise on data 2724 encryption principles or theory. Its purpose is to document the 2725 data structures required for implementing interoperable data 2726 encryption within the .ZIP format. It is strongly recommended that 2727 you have a good understanding of data encryption before reading 2728 further. 2729 2730 7.1.4 The algorithms introduced in Version 5.0 of this specification 2731 include: 2732 2733 RC2 40 bit, 64 bit, and 128 bit 2734 RC4 40 bit, 64 bit, and 128 bit 2735 DES 2736 3DES 112 bit and 168 bit 2737 2738 Version 5.1 adds support for the following: 2739 2740 AES 128 bit, 192 bit, and 256 bit 2741 2742 2743 7.1.5 Version 6.1 introduces encryption data changes to support 2744 interoperability with Smartcard and USB Token certificate storage 2745 methods which do not support the OAEP strengthening standard. 2746 2747 7.1.6 Version 6.2 introduces support for encrypting metadata by compressing 2748 and encrypting the central directory data structure to reduce information 2749 leakage. Information leakage can occur in legacy ZIP applications 2750 through exposure of information about a file even though that file is 2751 stored encrypted. The information exposed consists of file 2752 characteristics stored within the records and fields defined by this 2753 specification. This includes data such as a file's name, its original 2754 size, timestamp and CRC32 value. 2755 2756 7.1.7 Version 6.3 introduces support for encrypting data using the Blowfish 2757 and Twofish algorithms. These are symmetric block ciphers developed 2758 by Bruce Schneier. Blowfish supports using a variable length key from 2759 32 to 448 bits. Block size is 64 bits. Implementations SHOULD use 16 2760 rounds and the only mode supported within ZIP files is CBC. Twofish 2761 supports key sizes 128, 192 and 256 bits. Block size is 128 bits. 2762 Implementations SHOULD use 16 rounds and the only mode supported within 2763 ZIP files is CBC. Information and source code for both Blowfish and 2764 Twofish algorithms can be found on the internet. Consult with the author 2765 of these algorithms for information on terms or restrictions on use. 2766 2767 7.1.8 Central Directory Encryption provides greater protection against 2768 information leakage by encrypting the Central Directory structure and 2769 by masking key values that are replicated in the unencrypted Local 2770 Header. ZIP compatible programs that cannot interpret an encrypted 2771 Central Directory structure cannot rely on the data in the corresponding 2772 Local Header for decompression information. 2773 2774 7.1.9 Extra Field records that MAY contain information about a file that SHOULD 2775 not be exposed SHOULD NOT be stored in the Local Header and SHOULD only 2776 be written to the Central Directory where they can be encrypted. This 2777 design currently does not support streaming. Information in the End of 2778 Central Directory record, the Zip64 End of Central Directory Locator, 2779 and the Zip64 End of Central Directory records are not encrypted. Access 2780 to view data on files within a ZIP file with an encrypted Central Directory 2781 requires the appropriate password or private key for decryption prior to 2782 viewing any files, or any information about the files, in the archive. 2783 2784 7.1.10 Older ZIP compatible programs not familiar with the Central Directory 2785 Encryption feature will no longer be able to recognize the Central 2786 Directory and MAY assume the ZIP file is corrupt. Programs that 2787 attempt streaming access using Local Headers will see invalid 2788 information for each file. Central Directory Encryption need not be 2789 used for every ZIP file. Its use is recommended for greater security. 2790 ZIP files not using Central Directory Encryption SHOULD operate as 2791 in the past. 2792 2793 7.1.11 This strong encryption feature specification is intended to provide for 2794 scalable, cross-platform encryption needs ranging from simple password 2795 encryption to authenticated public/private key encryption. 2796 2797 7.1.12 Encryption provides data confidentiality and privacy. It is 2798 recommended that you combine X.509 digital signing with encryption 2799 to add authentication and non-repudiation. 2800 2801 2802 7.2 Single Password Symmetric Encryption Method 2803 ----------------------------------------------- 2804 2805 7.2.1 The Single Password Symmetric Encryption Method using strong 2806 encryption algorithms operates similarly to the traditional 2807 PKWARE encryption defined in this format. Additional data 2808 structures are added to support the processing needs of the 2809 strong algorithms. 2810 2811 The Strong Encryption data structures are: 2812 2813 7.2.2 General Purpose Bits - Bits 0 and 6 of the General Purpose bit 2814 flag in both local and central header records. Both bits set 2815 indicates strong encryption. Bit 13, when set indicates the Central 2816 Directory is encrypted and that selected fields in the Local Header 2817 are masked to hide their actual value. 2818 2819 2820 7.2.3 Extra Field 0x0017 in central header only. 2821 2822 Fields to consider in this record are: 2823 2824 7.2.3.1 Format - the data format identifier for this record. The only 2825 value allowed at this time is the integer value 2. 2826 2827 7.2.3.2 AlgId - integer identifier of the encryption algorithm from the 2828 following range 2829 2830 0x6601 - DES 2831 0x6602 - RC2 (version needed to extract < 5.2) 2832 0x6603 - 3DES 168 2833 0x6609 - 3DES 112 2834 0x660E - AES 128 2835 0x660F - AES 192 2836 0x6610 - AES 256 2837 0x6702 - RC2 (version needed to extract >= 5.2) 2838 0x6720 - Blowfish 2839 0x6721 - Twofish 2840 0x6801 - RC4 2841 0xFFFF - Unknown algorithm 2842 2843 7.2.3.3 Bitlen - Explicit bit length of key 2844 2845 32 - 448 bits 2846 2847 7.2.3.4 Flags - Processing flags needed for decryption 2848 2849 0x0001 - Password is required to decrypt 2850 0x0002 - Certificates only 2851 0x0003 - Password or certificate required to decrypt 2852 2853 Values > 0x0003 reserved for certificate processing 2854 2855 2856 7.2.4 Decryption header record preceding compressed file data. 2857 2858 -Decryption Header: 2859 2860 Value Size Description 2861 ----- ---- ----------- 2862 IVSize 2 bytes Size of initialization vector (IV) 2863 IVData IVSize Initialization vector for this file 2864 Size 4 bytes Size of remaining decryption header data 2865 Format 2 bytes Format definition for this record 2866 AlgID 2 bytes Encryption algorithm identifier 2867 Bitlen 2 bytes Bit length of encryption key 2868 Flags 2 bytes Processing flags 2869 ErdSize 2 bytes Size of Encrypted Random Data 2870 ErdData ErdSize Encrypted Random Data 2871 Reserved1 4 bytes Reserved certificate processing data 2872 Reserved2 (var) Reserved for certificate processing data 2873 VSize 2 bytes Size of password validation data 2874 VData VSize-4 Password validation data 2875 VCRC32 4 bytes Standard ZIP CRC32 of password validation data 2876 2877 7.2.4.1 IVData - The size of the IV SHOULD match the algorithm block size. 2878 The IVData can be completely random data. If the size of 2879 the randomly generated data does not match the block size 2880 it SHOULD be complemented with zero's or truncated as 2881 necessary. If IVSize is 0,then IV = CRC32 + Uncompressed 2882 File Size (as a 64 bit little-endian, unsigned integer value). 2883 2884 7.2.4.2 Format - the data format identifier for this record. The only 2885 value allowed at this time is the integer value 3. 2886 2887 7.2.4.3 AlgId - integer identifier of the encryption algorithm from the 2888 following range 2889 2890 0x6601 - DES 2891 0x6602 - RC2 (version needed to extract < 5.2) 2892 0x6603 - 3DES 168 2893 0x6609 - 3DES 112 2894 0x660E - AES 128 2895 0x660F - AES 192 2896 0x6610 - AES 256 2897 0x6702 - RC2 (version needed to extract >= 5.2) 2898 0x6720 - Blowfish 2899 0x6721 - Twofish 2900 0x6801 - RC4 2901 0xFFFF - Unknown algorithm 2902 2903 7.2.4.4 Bitlen - Explicit bit length of key 2904 2905 32 - 448 bits 2906 2907 7.2.4.5 Flags - Processing flags needed for decryption 2908 2909 0x0001 - Password is required to decrypt 2910 0x0002 - Certificates only 2911 0x0003 - Password or certificate required to decrypt 2912 2913 Values > 0x0003 reserved for certificate processing 2914 2915 7.2.4.6 ErdData - Encrypted random data is used to store random data that 2916 is used to generate a file session key for encrypting 2917 each file. SHA1 is used to calculate hash data used to 2918 derive keys. File session keys are derived from a master 2919 session key generated from the user-supplied password. 2920 If the Flags field in the decryption header contains 2921 the value 0x4000, then the ErdData field MUST be 2922 decrypted using 3DES. If the value 0x4000 is not set, 2923 then the ErdData field MUST be decrypted using AlgId. 2924 2925 2926 7.2.4.7 Reserved1 - Reserved for certificate processing, if value is 2927 zero, then Reserved2 data is absent. See the explanation 2928 under the Certificate Processing Method for details on 2929 this data structure. 2930 2931 7.2.4.8 Reserved2 - If present, the size of the Reserved2 data structure 2932 is located by skipping the first 4 bytes of this field 2933 and using the next 2 bytes as the remaining size. See 2934 the explanation under the Certificate Processing Method 2935 for details on this data structure. 2936 2937 7.2.4.9 VSize - This size value will always include the 4 bytes of the 2938 VCRC32 data and will be greater than 4 bytes. 2939 2940 7.2.4.10 VData - Random data for password validation. This data is VSize 2941 in length and VSize MUST be a multiple of the encryption 2942 block size. VCRC32 is a checksum value of VData. 2943 VData and VCRC32 are stored encrypted and start the 2944 stream of encrypted data for a file. 2945 2946 2947 7.2.5 Useful Tips 2948 2949 7.2.5.1 Strong Encryption is always applied to a file after compression. The 2950 block oriented algorithms all operate in Cypher Block Chaining (CBC) 2951 mode. The block size used for AES encryption is 16. All other block 2952 algorithms use a block size of 8. Two IDs are defined for RC2 to 2953 account for a discrepancy found in the implementation of the RC2 2954 algorithm in the cryptographic library on Windows XP SP1 and all 2955 earlier versions of Windows. It is recommended that zero length files 2956 not be encrypted, however programs SHOULD be prepared to extract them 2957 if they are found within a ZIP file. 2958 2959 7.2.5.2 A pseudo-code representation of the encryption process is as follows: 2960 2961 Password = GetUserPassword() 2962 MasterSessionKey = DeriveKey(SHA1(Password)) 2963 RD = CryptographicStrengthRandomData() 2964 For Each File 2965 IV = CryptographicStrengthRandomData() 2966 VData = CryptographicStrengthRandomData() 2967 VCRC32 = CRC32(VData) 2968 FileSessionKey = DeriveKey(SHA1(IV + RD) 2969 ErdData = Encrypt(RD,MasterSessionKey,IV) 2970 Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV) 2971 Done 2972 2973 7.2.5.3 The function names and parameter requirements will depend on 2974 the choice of the cryptographic toolkit selected. Almost any 2975 toolkit supporting the reference implementations for each 2976 algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft 2977 CryptoAPI libraries are all known to work well. 2978 2979 2980 7.3 Single Password - Central Directory Encryption 2981 -------------------------------------------------- 2982 2983 7.3.1 Central Directory Encryption is achieved within the .ZIP format by 2984 encrypting the Central Directory structure. This encapsulates the metadata 2985 most often used for processing .ZIP files. Additional metadata is stored for 2986 redundancy in the Local Header for each file. The process of concealing 2987 metadata by encrypting the Central Directory does not protect the data within 2988 the Local Header. To avoid information leakage from the exposed metadata 2989 in the Local Header, the fields containing information about a file are masked. 2990 2991 7.3.2 Local Header 2992 2993 Masking replaces the true content of the fields for a file in the Local 2994 Header with false information. When masked, the Local Header is not 2995 suitable for streaming access and the options for data recovery of damaged 2996 archives is reduced. Extra Data fields that MAY contain confidential 2997 data SHOULD NOT be stored within the Local Header. The value set into 2998 the Version needed to extract field SHOULD be the correct value needed to 2999 extract the file without regard to Central Directory Encryption. The fields 3000 within the Local Header targeted for masking when the Central Directory is 3001 encrypted are: 3002 3003 Field Name Mask Value 3004 ------------------ --------------------------- 3005 compression method 0 3006 last mod file time 0 3007 last mod file date 0 3008 crc-32 0 3009 compressed size 0 3010 uncompressed size 0 3011 file name (variable size) Base 16 value from the 3012 range 1 - 0xFFFFFFFFFFFFFFFF 3013 represented as a string whose 3014 size will be set into the 3015 file name length field 3016 3017 The Base 16 value assigned as a masked file name is simply a sequentially 3018 incremented value for each file starting with 1 for the first file. 3019 Modifications to a ZIP file MAY cause different values to be stored for 3020 each file. For compatibility, the file name field in the Local Header 3021 SHOULD NOT be left blank. As of Version 6.2 of this specification, 3022 the Compression Method and Compressed Size fields are not yet masked. 3023 Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format 3024 SHOULD NOT be masked. 3025 3026 7.3.3 Encrypting the Central Directory 3027 3028 Encryption of the Central Directory does not include encryption of the 3029 Central Directory Signature data, the Zip64 End of Central Directory 3030 record, the Zip64 End of Central Directory Locator, or the End 3031 of Central Directory record. The ZIP file comment data is never 3032 encrypted. 3033 3034 Before encrypting the Central Directory, it MAY optionally be compressed. 3035 Compression is not required, but for storage efficiency it is assumed 3036 this structure will be compressed before encrypting. Similarly, this 3037 specification supports compressing the Central Directory without 3038 requiring that it also be encrypted. Early implementations of this 3039 feature will assume the encryption method applied to files matches the 3040 encryption applied to the Central Directory. 3041 3042 Encryption of the Central Directory is done in a manner similar to 3043 that of file encryption. The encrypted data is preceded by a 3044 decryption header. The decryption header is known as the Archive 3045 Decryption Header. The fields of this record are identical to 3046 the decryption header preceding each encrypted file. The location 3047 of the Archive Decryption Header is determined by the value in the 3048 Start of the Central Directory field in the Zip64 End of Central 3049 Directory record. When the Central Directory is encrypted, the 3050 Zip64 End of Central Directory record will always be present. 3051 3052 The layout of the Zip64 End of Central Directory record for all 3053 versions starting with 6.2 of this specification will follow the 3054 Version 2 format. The Version 2 format is as follows: 3055 3056 The leading fixed size fields within the Version 1 format for this 3057 record remain unchanged. The record signature for both Version 1 3058 and Version 2 will be 0x06064b50. Immediately following the last 3059 byte of the field known as the Offset of Start of Central 3060 Directory With Respect to the Starting Disk Number will begin the 3061 new fields defining Version 2 of this record. 3062 3063 7.3.4 New fields for Version 2 3064 3065 Note: all fields stored in Intel low-byte/high-byte order. 3066 3067 Value Size Description 3068 ----- ---- ----------- 3069 Compression Method 2 bytes Method used to compress the 3070 Central Directory 3071 Compressed Size 8 bytes Size of the compressed data 3072 Original Size 8 bytes Original uncompressed size 3073 AlgId 2 bytes Encryption algorithm ID 3074 BitLen 2 bytes Encryption key length 3075 Flags 2 bytes Encryption flags 3076 HashID 2 bytes Hash algorithm identifier 3077 Hash Length 2 bytes Length of hash data 3078 Hash Data (variable) Hash data 3079 3080 The Compression Method accepts the same range of values as the 3081 corresponding field in the Central Header. 3082 3083 The Compressed Size and Original Size values will not include the 3084 data of the Central Directory Signature which is compressed or 3085 encrypted. 3086 3087 The AlgId, BitLen, and Flags fields accept the same range of values 3088 the corresponding fields within the 0x0017 record. 3089 3090 Hash ID identifies the algorithm used to hash the Central Directory 3091 data. This data does not have to be hashed, in which case the 3092 values for both the HashID and Hash Length will be 0. Possible 3093 values for HashID are: 3094 3095 Value Algorithm 3096 ------ --------- 3097 0x0000 none 3098 0x0001 CRC32 3099 0x8003 MD5 3100 0x8004 SHA1 3101 0x8007 RIPEMD160 3102 0x800C SHA256 3103 0x800D SHA384 3104 0x800E SHA512 3105 3106 7.3.5 When the Central Directory data is signed, the same hash algorithm 3107 used to hash the Central Directory for signing SHOULD be used. 3108 This is recommended for processing efficiency, however, it is 3109 permissible for any of the above algorithms to be used independent 3110 of the signing process. 3111 3112 The Hash Data will contain the hash data for the Central Directory. 3113 The length of this data will vary depending on the algorithm used. 3114 3115 The Version Needed to Extract SHOULD be set to 62. 3116 3117 The value for the Total Number of Entries on the Current Disk will 3118 be 0. These records will no longer support random access when 3119 encrypting the Central Directory. 3120 3121 7.3.6 When the Central Directory is compressed and/or encrypted, the 3122 End of Central Directory record will store the value 0xFFFFFFFF 3123 as the value for the Total Number of Entries in the Central 3124 Directory. The value stored in the Total Number of Entries in 3125 the Central Directory on this Disk field will be 0. The actual 3126 values will be stored in the equivalent fields of the Zip64 3127 End of Central Directory record. 3128 3129 7.3.7 Decrypting and decompressing the Central Directory is accomplished 3130 in the same manner as decrypting and decompressing a file. 3131 3132 7.4 Certificate Processing Method 3133 --------------------------------- 3134 3135 The Certificate Processing Method for ZIP file encryption 3136 defines the following additional data fields: 3137 3138 7.4.1 Certificate Flag Values 3139 3140 Additional processing flags that can be present in the Flags field of both 3141 the 0x0017 field of the central directory Extra Field and the Decryption 3142 header record preceding compressed file data are: 3143 3144 0x0007 - reserved for future use 3145 0x000F - reserved for future use 3146 0x0100 - Indicates non-OAEP key wrapping was used. If this 3147 this field is set, the version needed to extract MUST 3148 be at least 61. This means OAEP key wrapping is not 3149 used when generating a Master Session Key using 3150 ErdData. 3151 0x4000 - ErdData MUST be decrypted using 3DES-168, otherwise use the 3152 same algorithm used for encrypting the file contents. 3153 0x8000 - reserved for future use 3154 3155 3156 7.4.2 CertData - Extra Field 0x0017 record certificate data structure 3157 3158 The data structure used to store certificate data within the section 3159 of the Extra Field defined by the CertData field of the 0x0017 3160 record are as shown: 3161 3162 Value Size Description 3163 ----- ---- ----------- 3164 RCount 4 bytes Number of recipients. 3165 HashAlg 2 bytes Hash algorithm identifier 3166 HSize 2 bytes Hash size 3167 SRList (var) Simple list of recipients hashed public keys 3168 3169 3170 RCount This defines the number intended recipients whose 3171 public keys were used for encryption. This identifies 3172 the number of elements in the SRList. 3173 3174 HashAlg This defines the hash algorithm used to calculate 3175 the public key hash of each public key used 3176 for encryption. This field currently supports 3177 only the following value for SHA-1 3178 3179 0x8004 - SHA1 3180 3181 HSize This defines the size of a hashed public key. 3182 3183 SRList This is a variable length list of the hashed 3184 public keys for each intended recipient. Each 3185 element in this list is HSize. The total size of 3186 SRList is determined using RCount * HSize. 3187 3188 3189 7.4.3 Reserved1 - Certificate Decryption Header Reserved1 Data 3190 3191 Value Size Description 3192 ----- ---- ----------- 3193 RCount 4 bytes Number of recipients. 3194 3195 RCount This defines the number intended recipients whose 3196 public keys were used for encryption. This defines 3197 the number of elements in the REList field defined below. 3198 3199 3200 7.4.4 Reserved2 - Certificate Decryption Header Reserved2 Data Structures 3201 3202 3203 Value Size Description 3204 ----- ---- ----------- 3205 HashAlg 2 bytes Hash algorithm identifier 3206 HSize 2 bytes Hash size 3207 REList (var) List of recipient data elements 3208 3209 3210 HashAlg This defines the hash algorithm used to calculate 3211 the public key hash of each public key used 3212 for encryption. This field currently supports 3213 only the following value for SHA-1 3214 3215 0x8004 - SHA1 3216 3217 HSize This defines the size of a hashed public key 3218 defined in REHData. 3219 3220 REList This is a variable length of list of recipient data. 3221 Each element in this list consists of a Recipient 3222 Element data structure as follows: 3223 3224 3225 Recipient Element (REList) Data Structure: 3226 3227 Value Size Description 3228 ----- ---- ----------- 3229 RESize 2 bytes Size of REHData + REKData 3230 REHData HSize Hash of recipients public key 3231 REKData (var) Simple key blob 3232 3233 3234 RESize This defines the size of an individual REList 3235 element. This value is the combined size of the 3236 REHData field + REKData field. REHData is defined by 3237 HSize. REKData is variable and can be calculated 3238 for each REList element using RESize and HSize. 3239 3240 REHData Hashed public key for this recipient. 3241 3242 REKData Simple Key Blob. The format of this data structure 3243 is identical to that defined in the Microsoft 3244 CryptoAPI and generated using the CryptExportKey() 3245 function. The version of the Simple Key Blob 3246 supported at this time is 0x02 as defined by 3247 Microsoft. 3248 3249 7.5 Certificate Processing - Central Directory Encryption 3250 --------------------------------------------------------- 3251 3252 7.5.1 Central Directory Encryption using Digital Certificates will 3253 operate in a manner similar to that of Single Password Central 3254 Directory Encryption. This record will only be present when there 3255 is data to place into it. Currently, data is placed into this 3256 record when digital certificates are used for either encrypting 3257 or signing the files within a ZIP file. When only password 3258 encryption is used with no certificate encryption or digital 3259 signing, this record is not currently needed. When present, this 3260 record will appear before the start of the actual Central Directory 3261 data structure and will be located immediately after the Archive 3262 Decryption Header if the Central Directory is encrypted. 3263 3264 7.5.2 The Archive Extra Data record will be used to store the following 3265 information. Additional data MAY be added in future versions. 3266 3267 Extra Data Fields: 3268 3269 0x0014 - PKCS#7 Store for X.509 Certificates 3270 0x0016 - X.509 Certificate ID and Signature for central directory 3271 0x0019 - PKCS#7 Encryption Recipient Certificate List 3272 3273 The 0x0014 and 0x0016 Extra Data records that otherwise would be 3274 located in the first record of the Central Directory for digital 3275 certificate processing. When encrypting or compressing the Central 3276 Directory, the 0x0014 and 0x0016 records MUST be located in the 3277 Archive Extra Data record and they SHOULD NOT remain in the first 3278 Central Directory record. The Archive Extra Data record will also 3279 be used to store the 0x0019 data. 3280 3281 7.5.3 When present, the size of the Archive Extra Data record will be 3282 included in the size of the Central Directory. The data of the 3283 Archive Extra Data record will also be compressed and encrypted 3284 along with the Central Directory data structure. 3285 3286 7.6 Certificate Processing Differences 3287 -------------------------------------- 3288 3289 7.6.1 The Certificate Processing Method of encryption differs from the 3290 Single Password Symmetric Encryption Method as follows. Instead 3291 of using a user-defined password to generate a master session key, 3292 cryptographically random data is used. The key material is then 3293 wrapped using standard key-wrapping techniques. This key material 3294 is wrapped using the public key of each recipient that will need 3295 to decrypt the file using their corresponding private key. 3296 3297 7.6.2 This specification currently assumes digital certificates will follow 3298 the X.509 V3 format for 1024 bit and higher RSA format digital 3299 certificates. Implementation of this Certificate Processing Method 3300 requires supporting logic for key access and management. This logic 3301 is outside the scope of this specification. 3302 3303 7.7 OAEP Processing with Certificate-based Encryption 3304 ----------------------------------------------------- 3305 3306 7.7.1 OAEP stands for Optimal Asymmetric Encryption Padding. It is a 3307 strengthening technique used for small encoded items such as decryption 3308 keys. This is commonly applied in cryptographic key-wrapping techniques 3309 and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification 3310 were designed to support OAEP key-wrapping for certificate-based 3311 decryption keys for additional security. 3312 3313 7.7.2 Support for private keys stored on Smartcards or Tokens introduced 3314 a conflict with this OAEP logic. Most card and token products do 3315 not support the additional strengthening applied to OAEP key-wrapped 3316 data. In order to resolve this conflict, versions 6.1 and above of this 3317 specification will no longer support OAEP when encrypting using 3318 digital certificates. 3319 3320 7.7.3 Versions of PKZIP available during initial development of the 3321 certificate processing method set a value of 61 into the 3322 version needed to extract field for a file. This indicates that 3323 non-OAEP key wrapping is used. This affects certificate encryption 3324 only, and password encryption functions SHOULD NOT be affected by 3325 this value. This means values of 61 MAY be found on files encrypted 3326 with certificates only, or on files encrypted with both password 3327 encryption and certificate encryption. Files encrypted with both 3328 methods can safely be decrypted using the password methods documented. 3329 3330 7.8 Additional Encryption/Decryption Data Records 3331 ----------------------------------------------------- 3332 3333 7.8.1 Additional information MAY be stored within a ZIP file in support 3334 of the strong password and certificate encryption methods defined above. 3335 These include, but are not limited to the following record types. 3336 3337 0x0021 Policy Decryption Key Record 3338 0x0022 Smartcrypt Key Provider Record 3339 0x0023 Smartcrypt Policy Key Data Record 3340 3341 8.0 Splitting and Spanning ZIP files 3342 ------------------------------------- 3343 3344 8.1 Spanned ZIP files 3345 3346 8.1.1 Spanning is the process of segmenting a ZIP file across 3347 multiple removable media. This support has typically only 3348 been provided for DOS formatted floppy diskettes. 3349 3350 8.2 Split ZIP files 3351 3352 8.2.1 File splitting is a newer derivation of spanning. 3353 Splitting follows the same segmentation process as 3354 spanning, however, it does not require writing each 3355 segment to a unique removable medium and instead supports 3356 placing all pieces onto local or non-removable locations 3357 such as file systems, local drives, folders, etc. 3358 3359 8.3 File Naming Differences 3360 3361 8.3.1 A key difference between spanned and split ZIP files is 3362 that all pieces of a spanned ZIP file have the same name. 3363 Since each piece is written to a separate volume, no name 3364 collisions occur and each segment can reuse the original 3365 .ZIP file name given to the archive. 3366 3367 8.3.2 Sequence ordering for DOS spanned archives uses the DOS 3368 volume label to determine segment numbers. Volume labels 3369 for each segment are written using the form PKBACK#xxx, 3370 where xxx is the segment number written as a decimal 3371 value from 001 - nnn. 3372 3373 8.3.3 Split ZIP files are typically written to the same location 3374 and are subject to name collisions if the spanned name 3375 format is used since each segment will reside on the same 3376 drive. To avoid name collisions, split archives are named 3377 as follows. 3378 3379 Segment 1 = filename.z01 3380 Segment n-1 = filename.z(n-1) 3381 Segment n = filename.zip 3382 3383 8.3.4 The .ZIP extension is used on the last segment to support 3384 quickly reading the central directory. The segment number 3385 n SHOULD be a decimal value. 3386 3387 8.4 Spanned Self-extracting ZIP Files 3388 3389 8.4.1 Spanned ZIP files MAY be PKSFX Self-extracting ZIP files. 3390 PKSFX files MAY also be split, however, in this case 3391 the first segment MUST be named filename.exe. The first 3392 segment of a split PKSFX archive MUST be large enough to 3393 include the entire executable program. 3394 3395 8.5 Capacities and Markers 3396 3397 8.5.1 Capacities for split archives are as follows: 3398 3399 Maximum number of segments = 4,294,967,295 - 1 3400 Maximum .ZIP segment size = 4,294,967,295 bytes 3401 Minimum segment size = 64K 3402 Maximum PKSFX segment size = 2,147,483,647 bytes 3403 3404 8.5.2 Segment sizes MAY be different however by convention, all 3405 segment sizes SHOULD be the same with the exception of the 3406 last, which MAY be smaller. Local and central directory 3407 header records MUST NOT be split across a segment boundary. 3408 When writing a header record, if the number of bytes remaining 3409 within a segment is less than the size of the header record, 3410 end the current segment and write the header at the start 3411 of the next segment. The central directory MAY span segment 3412 boundaries, but no single record in the central directory 3413 SHOULD be split across segments. 3414 3415 8.5.3 Spanned/Split archives created using PKZIP for Windows 3416 (V2.50 or greater), PKZIP Command Line (V2.50 or greater), 3417 or PKZIP Explorer will include a special spanning 3418 signature as the first 4 bytes of the first segment of 3419 the archive. This signature (0x08074b50) will be 3420 followed immediately by the local header signature for 3421 the first file in the archive. 3422 3423 8.5.4 A special spanning marker MAY also appear in spanned/split 3424 archives if the spanning or splitting process starts but 3425 only requires one segment. In this case the 0x08074b50 3426 signature will be replaced with the temporary spanning 3427 marker signature of 0x30304b50. Split archives can 3428 only be uncompressed by other versions of PKZIP that 3429 know how to create a split archive. 3430 3431 8.5.5 The signature value 0x08074b50 is also used by some 3432 ZIP implementations as a marker for the Data Descriptor 3433 record. Conflict in this alternate assignment can be 3434 avoided by ensuring the position of the signature 3435 within the ZIP file to determine the use for which it 3436 is intended. 3437 3438 9.0 Change Process 3439 ------------------ 3440 3441 9.1 In order for the .ZIP file format to remain a viable technology, this 3442 specification SHOULD be considered as open for periodic review and 3443 revision. Although this format was originally designed with a 3444 certain level of extensibility, not all changes in technology 3445 (present or future) were or will be necessarily considered in its 3446 design. 3447 3448 9.2 If your application requires new definitions to the 3449 extensible sections in this format, or if you would like to 3450 submit new data structures or new capabilities, please forward 3451 your request to zipformat@pkware.com. All submissions will be 3452 reviewed by the ZIP File Specification Committee for possible 3453 inclusion into future versions of this specification. 3454 3455 9.3 Periodic revisions to this specification will be published as 3456 DRAFT or as FINAL status to ensure interoperability. We encourage 3457 comments and feedback that MAY help improve clarity or content. 3458 3459 3460 10.0 Incorporating PKWARE Proprietary Technology into Your Product 3461 ------------------------------------------------------------------ 3462 3463 10.1 The Use or Implementation in a product of APPNOTE technological 3464 components pertaining to either strong encryption or patching requires 3465 a separate, executed license agreement from PKWARE. Please contact 3466 PKWARE at zipformat@pkware.com or +1-414-289-9788 with regard to 3467 acquiring such a license. 3468 3469 10.2 Additional information regarding PKWARE proprietary technology is 3470 available at http://www.pkware.com/appnote. 3471 3472 11.0 Acknowledgements 3473 --------------------- 3474 3475 In addition to the above mentioned contributors to PKZIP and PKUNZIP, 3476 PKWARE would like to extend special thanks to Robert Mahoney for 3477 suggesting the extension .ZIP for this software. 3478 3479 12.0 References 3480 --------------- 3481 3482 Fiala, Edward R., and Greene, Daniel H., "Data compression with 3483 finite windows", Communications of the ACM, Volume 32, Number 4, 3484 April 1989, pages 490-505. 3485 3486 Held, Gilbert, "Data Compression, Techniques and Applications, 3487 Hardware and Software Considerations", John Wiley & Sons, 1987. 3488 3489 Huffman, D.A., "A method for the construction of minimum-redundancy 3490 codes", Proceedings of the IRE, Volume 40, Number 9, September 1952, 3491 pages 1098-1101. 3492 3493 Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14, 3494 Number 10, October 1989, pages 29-37. 3495 3496 Nelson, Mark, "The Data Compression Book", M&T Books, 1991. 3497 3498 Storer, James A., "Data Compression, Methods and Theory", 3499 Computer Science Press, 1988 3500 3501 Welch, Terry, "A Technique for High-Performance Data Compression", 3502 IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19. 3503 3504 Ziv, J. and Lempel, A., "A universal algorithm for sequential data 3505 compression", Communications of the ACM, Volume 30, Number 6, 3506 June 1987, pages 520-540. 3507 3508 Ziv, J. and Lempel, A., "Compression of individual sequences via 3509 variable-rate coding", IEEE Transactions on Information Theory, 3510 Volume 24, Number 5, September 1978, pages 530-536. 3511 3512 3513 APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions 3514 -------------------------------------------------------------- 3515 3516 A.1 Field Definition Structure: 3517 3518 a. field length including length 2 bytes Big Endian 3519 b. field code 2 bytes 3520 c. data x bytes 3521 3522 A.2 Field Code Description 3523 3524 4001 Source type i.e. CLP etc 3525 4002 The text description of the library 3526 4003 The text description of the file 3527 4004 The text description of the member 3528 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC 3529 4007 Database Type Code 1 byte 3530 4008 Database file and fields definition 3531 4009 GZIP file type 2 bytes 3532 400B IFS code page 2 bytes 3533 400C IFS Time of last file status change 4 bytes 3534 400D IFS Access Time 4 bytes 3535 400E IFS Modification time 4 bytes 3536 005C Length of the records in the file 2 bytes 3537 0068 GZIP two words 8 bytes 3538 3539 APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions 3540 ------------------------------------------------------------ 3541 3542 B.1 Field Definition Structure: 3543 3544 a. field length including length 2 bytes Big Endian 3545 b. field code 2 bytes 3546 c. data x bytes 3547 3548 B.2 Field Code Description 3549 3550 0001 File Type 2 bytes 3551 0002 NonVSAM Record Format 1 byte 3552 0003 Reserved 3553 0004 NonVSAM Block Size 2 bytes Big Endian 3554 0005 Primary Space Allocation 3 bytes Big Endian 3555 0006 Secondary Space Allocation 3 bytes Big Endian 3556 0007 Space Allocation Type1 byte flag 3557 0008 Modification Date Retired with PKZIP 5.0 + 3558 0009 Expiration Date Retired with PKZIP 5.0 + 3559 000A PDS Directory Block Allocation 3 bytes Big Endian binary value 3560 000B NonVSAM Volume List variable 3561 000C UNIT Reference Retired with PKZIP 5.0 + 3562 000D DF/SMS Management Class 8 bytes EBCDIC Text Value 3563 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value 3564 000F DF/SMS Data Class 8 bytes EBCDIC Text Value 3565 0010 PDS/PDSE Member Info. 30 bytes 3566 0011 VSAM sub-filetype 2 bytes 3567 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)" 3568 0013 VSAM Cluster Name Retired with PKZIP 5.0 + 3569 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)" 3570 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks 3571 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks 3572 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks 3573 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks 3574 0019 VSAM Data Name 1-44 bytes EBCDIC text string 3575 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string 3576 001B VSAM Catalog Name 1-44 bytes EBCDIC text string 3577 001C VSAM Data Space Type 9 bytes EBCDIC text string 3578 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified 3579 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified 3580 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs 3581 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified 3582 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified 3583 0022 VSAM Erase Flag 1 byte flag 3584 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified 3585 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified 3586 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs 3587 0026 VSAM Ordered Flag 1 byte flag 3588 0027 VSAM REUSE Flag 1 byte flag 3589 0028 VSAM SPANNED Flag 1 byte flag 3590 0029 VSAM Recovery Flag 1 byte flag 3591 002A VSAM WRITECHK Flag 1 byte flag 3592 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y" 3593 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y" 3594 002D VSAM Index Space Type 9 bytes EBCDIC text string 3595 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified 3596 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified 3597 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified 3598 0031 VSAM Index IMBED 1 byte flag 3599 0032 VSAM Index Ordered Flag 1 byte flag 3600 0033 VSAM REPLICATE Flag 1 byte flag 3601 0034 VSAM Index REUSE Flag 1 byte flag 3602 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 + 3603 0036 VSAM Owner 8 bytes EBCDIC text string 3604 0037 VSAM Index Owner 8 bytes EBCDIC text string 3605 0038 Reserved 3606 0039 Reserved 3607 003A Reserved 3608 003B Reserved 3609 003C Reserved 3610 003D Reserved 3611 003E Reserved 3612 003F Reserved 3613 0040 Reserved 3614 0041 Reserved 3615 0042 Reserved 3616 0043 Reserved 3617 0044 Reserved 3618 0045 Reserved 3619 0046 Reserved 3620 0047 Reserved 3621 0048 Reserved 3622 0049 Reserved 3623 004A Reserved 3624 004B Reserved 3625 004C Reserved 3626 004D Reserved 3627 004E Reserved 3628 004F Reserved 3629 0050 Reserved 3630 0051 Reserved 3631 0052 Reserved 3632 0053 Reserved 3633 0054 Reserved 3634 0055 Reserved 3635 0056 Reserved 3636 0057 Reserved 3637 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian 3638 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian 3639 005A PDS LMOD EP Rec # 4 bytes Big Endian 3640 005B Reserved 3641 005C Max Length of records 2 bytes Big Endian 3642 005D PDSE Flag 1 byte flag 3643 005E Reserved 3644 005F Reserved 3645 0060 Reserved 3646 0061 Reserved 3647 0062 Reserved 3648 0063 Reserved 3649 0064 Reserved 3650 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd" 3651 0066 Date Created 4 bytes Packed Hex "yyyymmdd" 3652 0068 GZIP two words 8 bytes 3653 0071 Extended NOTE Location 12 bytes Big Endian 3654 0072 Archive device UNIT 6 bytes EBCDIC 3655 0073 Archive 1st Volume 6 bytes EBCDIC 3656 0074 Archive 1st VOL File Seq# 2 bytes Binary 3657 0075 Native I/O Flags 2 bytes 3658 0081 Unix File Type 1 byte enumerated 3659 0082 Unix File Format 1 byte enumerated 3660 0083 Unix File Character Set Tag Info 4 bytes 3661 0090 ZIP Environmental Processing Info 4 bytes 3662 0091 EAV EATTR Flags 1 byte 3663 0092 DSNTYPE Flags 1 byte 3664 0093 Total Space Allocation (Cyls) 4 bytes Big Endian 3665 009D NONVSAM DSORG 2 bytes 3666 009E Program Virtual Object Info 3 bytes 3667 009F Encapsulated file Info 9 bytes 3668 00A2 Cluster Log 4 bytes Binary 3669 00A3 Cluster LSID Length 4 bytes Binary 3670 00A4 Cluster LSID 26 bytes EBCDIC 3671 400C Unix File Creation Time 4 bytes 3672 400D Unix File Access Time 4 bytes 3673 400E Unix File Modification time 4 bytes 3674 4101 IBMCMPSC Compression Info variable 3675 4102 IBMCMPSC Compression Size 8 bytes Big Endian 3676 3677 APPENDIX C - Zip64 Extensible Data Sector Mappings 3678 --------------------------------------------------- 3679 3680 -Z390 Extra Field: 3681 3682 The following is the general layout of the attributes for the 3683 ZIP 64 "extra" block for extended tape operations. 3684 3685 Note: some fields stored in Big Endian format. All text is 3686 in EBCDIC format unless otherwise specified. 3687 3688 Value Size Description 3689 ----- ---- ----------- 3690 (Z390) 0x0065 2 bytes Tag for this "extra" block type 3691 Size 4 bytes Size for the following data block 3692 Tag 4 bytes EBCDIC "Z390" 3693 Length71 2 bytes Big Endian 3694 Subcode71 2 bytes Enote type code 3695 FMEPos 1 byte 3696 Length72 2 bytes Big Endian 3697 Subcode72 2 bytes Unit type code 3698 Unit 1 byte Unit 3699 Length73 2 bytes Big Endian 3700 Subcode73 2 bytes Volume1 type code 3701 FirstVol 1 byte Volume 3702 Length74 2 bytes Big Endian 3703 Subcode74 2 bytes FirstVol file sequence 3704 FileSeq 2 bytes Sequence 3705 3706 APPENDIX D - Language Encoding (EFS) 3707 ------------------------------------ 3708 3709 D.1 The ZIP format has historically supported only the original IBM PC character 3710 encoding set, commonly referred to as IBM Code Page 437. This limits storing 3711 file name characters to only those within the original MS-DOS range of values 3712 and does not properly support file names in other character encodings, or 3713 languages. To address this limitation, this specification will support the 3714 following change. 3715 3716 D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform 3717 to the original ZIP character encoding. If general purpose bit 11 is set, the 3718 filename and comment MUST support The Unicode Standard, Version 4.1.0 or 3719 greater using the character encoding form defined by the UTF-8 storage 3720 specification. The Unicode Standard is published by the The Unicode 3721 Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files 3722 is expected to not include a byte order mark (BOM). 3723 3724 D.3 Applications MAY choose to supplement this file name storage through the use 3725 of the 0x0008 Extra Field. Storage for this optional field is currently 3726 undefined, however it will be used to allow storing extended information 3727 on source or target encoding that MAY further assist applications with file 3728 name, or file content encoding tasks. Please contact PKWARE with any 3729 requirements on how this field SHOULD be used. 3730 3731 D.4 The 0x0008 Extra Field storage MAY be used with either setting for general 3732 purpose bit 11. Examples of the intended usage for this field is to store 3733 whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other 3734 commonly used character encoding (code page) designations can be indicated 3735 through this field. Formalized values for use of the 0x0008 record remain 3736 undefined at this time. The definition for the layout of the 0x0008 field 3737 will be published when available. Use of the 0x0008 Extra Field provides 3738 for storing data within a ZIP file in an encoding other than IBM Code 3739 Page 437 or UTF-8. 3740 3741 D.5 General purpose bit 11 will not imply any encoding of file content or 3742 password. Values defining character encoding for file content or 3743 password MUST be stored within the 0x0008 Extended Language Encoding 3744 Extra Field. 3745 3746 D.6 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records 3747 that can be used to store UTF-8 file name and file comment fields. These 3748 records can be used for cases when the general purpose bit 11 method 3749 for storing UTF-8 data in the standard file name and comment fields is 3750 not desirable. A common case for this alternate method is if backward 3751 compatibility with older programs is required. 3752 3753 D.7 Definitions for the record structure of these fields are included above 3754 in the section on 3rd party mappings for "extra field" records. These 3755 records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment 3756 Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field). 3757 3758 D.8 The choice of which storage method to use when writing a ZIP file is left 3759 to the implementation. Developers SHOULD expect that a ZIP file MAY 3760 contain either method and SHOULD provide support for reading data in 3761 either format. Use of general purpose bit 11 reduces storage requirements 3762 for file name data by not requiring additional "extra field" data for 3763 each file, but can result in older ZIP programs not being able to extract 3764 files. Use of the 0x6375 and 0x7075 records will result in a ZIP file 3765 that SHOULD always be readable by older ZIP programs, but requires more 3766 storage per file to write file name and/or file comment fields. 3767 3768 APPENDIX E - AE-x encryption marker 3769 ----------------------------------- 3770 3771 E.1 AE-x defines an alternate password-based encryption method used 3772 in ZIP files that is based on a file encryption utility developed by 3773 Dr. Brian Gladman. Information on Dr. Gladman's method is available at 3774 3775 http://www.gladman.me.uk/cryptography_technology/fileencrypt/ 3776 3777 E.2 AE-x uses AES with CTR (counter mode) and HMAC-SHA1. It defines 3778 encryption using key sizes of 128 bits or 256 bits. It does not 3779 restrict support for decrypting 192 bits. 3780 3781 E.3 This method uses the standard ZIP encryption bit (bit 0) 3782 of the general purpose bit flag (section 4.4.4) to indicate a 3783 file is encrypted. 3784 3785 E.4 The compression method field (section 4.4.5) is set to 99 3786 to indicate a file has been encrypted using this method. 3787 3788 E.5 The actual compression method is stored in an extra field 3789 structure identified by a Header ID of 0x9901. Information on this 3790 record structure can be found at http://www.winzip.com/aes_info.htm. 3791 3792 E.6 Two versions are defined for the 0x9901 structure. 3793 3794 E.6.1 Version 1 stores the file CRC value in the CRC-32 field 3795 (section 4.4.7). 3796 3797 E.6.2 Version 2 stores a value of 0 in the CRC-32 field. 3798